How to interpret some GC logs

2015-06-01 Thread Michał Łowicki
Hi,

Normally I get logs like:

2015-06-01T09:19:50.610+: 4736.314: [GC 6505591K-4895804K(8178944K),
0.0494560 secs]

which is fine and understandable but occasionalIy I see something like:

2015-06-01T09:19:50.661+: 4736.365: [GC 4901600K(8178944K), 0.0049600
secs]

How to interpret it? Does it miss only part before - so memory occupied
before GC cycle?
-- 
BR,
Michał Łowicki


AW: check active queries on cluster

2015-06-01 Thread Sebastian Martinka
You could enable DEBUG logging for org.apache.cassandra.transport.Message and 
TRACE logging for org.apache.cassandra.cql3.QueryProcessor in the 
log4j-server.properties file:

log4j.logger.org.apache.cassandra.transport.Message=DEBUG
log4j.logger.org.apache.cassandra.cql3.QueryProcessor=TRACE

Afterwards you get the following output from all PreparedStatements in the 
system.log file:

DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,186 Message.java 
(line 302) Received: PREPARE INSERT INTO dba_test.cust_view (leid, vid, 
geoarea, ver) VALUES (?, ?, ?, ?);, v=2
TRACE [Native-Transport-Requests:167] 2015-06-01 15:56:15,187 
QueryProcessor.java (line 283) Stored prepared statement 
61956319a6d7c84c25414c96edf6e38c with 4 bind markers
DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,187 Tracing.java 
(line 159) request complete
DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,187 Message.java 
(line 309) Responding: RESULT PREPARED 61956319a6d7c84c25414c96edf6e38c 
[leid(dba_test, cust_view), 
org.apache.cassandra.db.marshal.UTF8Type][vid(dba_test, cust_view), 
org.apache.cassandra.db.marshal.UTF8Type][geoarea(dba_test, cust_view), 
org.apache.cassandra.db.marshal.UTF8Type][ver(dba_test, cust_view), 
org.apache.cassandra.db.marshal.LongType] (resultMetadata=[0 columns]), v=2


Von: Robert Coli [mailto:rc...@eventbrite.com]
Gesendet: Freitag, 17. April 2015 19:23
An: user@cassandra.apache.org
Betreff: Re: check active queries on cluster

On Thu, Apr 16, 2015 at 11:10 PM, Rahul Bhardwaj 
rahul.bhard...@indiamart.commailto:rahul.bhard...@indiamart.com wrote:
We want to track active queries on cassandra cluster. Is there any tool or way 
to find all active queries on cassandra ?

You can get a count of them with :

 https://issues.apache.org/jira/browse/CASSANDRA-5084

=Rob



Re: Minor Compactions Not Triggered

2015-06-01 Thread Robert Coli
On Sun, May 31, 2015 at 11:37 AM, Anuj Wadehra anujw_2...@yahoo.co.in
wrote:

 2. We thought that CQL compaction subproperty of *tombstone_threshold*
 will help us after major compactions. This property will ensure that even
 if we have one huge sstable, once tombstone threshold of 20% has reached,
 sstables will be compacted and tombstones will be dropped after
 gc_grace_periods (even if there no similar sized sstables as need by STCS).
 But in our initial testing, single huge sstable is not getting compacted
 even if we drop all rows in it and gc_grace_period has passed.  *Why
 tombstone_threshold is behaving like that?*


https://issues.apache.org/jira/browse/CASSANDRA-6654 ?

=Rob


Re: How to interpret some GC logs

2015-06-01 Thread Jason Wee
can you tell what jvm is that?

jason

On Mon, Jun 1, 2015 at 5:46 PM, Michał Łowicki mlowi...@gmail.com wrote:

 Hi,

 Normally I get logs like:

 2015-06-01T09:19:50.610+: 4736.314: [GC 6505591K-4895804K(8178944K),
 0.0494560 secs]

 which is fine and understandable but occasionalIy I see something like:

 2015-06-01T09:19:50.661+: 4736.365: [GC 4901600K(8178944K), 0.0049600
 secs]

 How to interpret it? Does it miss only part before - so memory occupied
 before GC cycle?
 --
 BR,
 Michał Łowicki



Re: Minor Compactions Not Triggered

2015-06-01 Thread Anuj Wadehra
Thanks Robert !!!


As per the algorithm shared in the CASSANDRA 6654, I understand that 
tombstone_threshold property only comes into picture if you have expirying 
columns and it wont have any effect if you have manually deleted rows in cf. Is 
my understanding correct?


According to you What would be the expected behavior of following steps??


I inserted x rows

I deleted x rows

Ran major compaction to make sure that one big sstable contains all tombstones

Waited for gc grace period to see whether that big sstable formed after major 
compaction is compacted on its own without finding any other sstable


Thanks

Anuj



Sent from Yahoo Mail on Android

From:Robert Coli rc...@eventbrite.com
Date:Mon, 1 Jun, 2015 at 10:56 pm
Subject:Re: Minor Compactions Not Triggered

On Sun, May 31, 2015 at 11:37 AM, Anuj Wadehra anujw_2...@yahoo.co.in wrote:

2. We thought that CQL compaction subproperty of tombstone_threshold will help 
us after major compactions. This property will ensure that even if we have one 
huge sstable, once tombstone threshold of 20% has reached, sstables will be 
compacted and tombstones will be dropped after gc_grace_periods (even if there 
no similar sized sstables as need by STCS). But in our initial testing, single 
huge sstable is not getting compacted even if we drop all rows in it and 
gc_grace_period has passed.  Why tombstone_threshold is behaving like that?


https://issues.apache.org/jira/browse/CASSANDRA-6654 ?


=Rob



ERROR Compaction Interrupted

2015-06-01 Thread Aiman Parvaiz
Hi everyone,
I am running C* 2.0.9 without vnodes and RF=2. Recently while repairing, 
rebalancing the cluster I encountered one instance of this(just one on one 
node):

ERROR CompactionExecutor: https://logentries.com/app/9f95dbd4#55472 
CassandraDaemon.uncaughtException - Exception in thread 
Thread[CompactionExecutor: https://logentries.com/app/9f95dbd4#55472,1,main]

May 30 19:31:09 cass-prod4.localdomain cassandra: 2015-05-30 19:31:09,991 ERROR 
CompactionExecutor:55472 CassandraDaemon.uncaughtException - Exception in 
thread Thread[CompactionExecutor:55472,1,main]

May 30 19:31:09 cass-prod4.localdomain 
org.apache.cassandra.db.compaction.CompactionInterruptedException: Compaction 
interrupted: Compaction@1b0b43e5-bef5-34f9-af08-405a7b58c71f(flipagram, 
home_feed_entry_index, 218409618/450008574)bytes

May 30 19:31:09 cass-prod4.localdomain at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:157)
May 30 19:31:09 cass-prod4.localdomain at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
May 30 19:31:09 cass-prod4.localdomain at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
May 30 19:31:09 cass-prod4.localdomain at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
May 30 19:31:09 cass-prod4.localdomain at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
May 30 19:31:09 cass-prod4.localdomain at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
May 30 19:31:09 cass-prod4.localdomain at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
May 30 19:31:09 cass-prod4.localdomain at 
java.util.concurrent.FutureTask.run(FutureTask.java:262)
May 30 19:31:09 cass-prod4.localdomain at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
May 30 19:31:09 cass-prod4.localdomain at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
May 30 19:31:09 cass-prod4.localdomain at 
java.lang.Thread.run(Thread.java:745)

After looking up a bit on the mailing list archives etc I understand that this 
might mean data corruption and I plan to take the node offline and replace it 
with a new one but still wanted to see if anyone can throw some light here 
about me missing out on something.
Also, if this is a case of corrupted SST should I be concerned about it getting 
replicated and take care of it on the replication too.

Thanks

Re: check active queries on cluster

2015-06-01 Thread Ben Bromhead
A warning on enabling debug and trace logging on the write path. You will
be writing information about every query to disk.

If you have any significant volume of requests going through the nodes
things will get slow pretty quickly. At least with C*  2.1 and using the
default logging config.

On 1 June 2015 at 07:34, Sebastian Martinka sebastian.marti...@mercateo.com
 wrote:

  You could enable DEBUG logging for
 org.apache.cassandra.transport.Message and TRACE logging for
 org.apache.cassandra.cql3.QueryProcessor in the log4j-server.properties
 file:

 log4j.logger.org.apache.cassandra.transport.Message=DEBUG
 log4j.logger.org.apache.cassandra.cql3.QueryProcessor=TRACE


 Afterwards you get the following output from all PreparedStatements in the
 system.log file:


 DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,186 Message.java
 (line 302) Received: PREPARE INSERT INTO dba_test.cust_view (leid, vid,
 geoarea, ver) VALUES (?, ?, ?, ?);, v=2
 TRACE [Native-Transport-Requests:167] 2015-06-01 15:56:15,187
 QueryProcessor.java (line 283) Stored prepared statement
 61956319a6d7c84c25414c96edf6e38c with 4 bind markers
 DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,187 Tracing.java
 (line 159) request complete
 DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,187 Message.java
 (line 309) Responding: RESULT PREPARED 61956319a6d7c84c25414c96edf6e38c
 [leid(dba_test, cust_view),
 org.apache.cassandra.db.marshal.UTF8Type][vid(dba_test, cust_view),
 org.apache.cassandra.db.marshal.UTF8Type][geoarea(dba_test, cust_view),
 org.apache.cassandra.db.marshal.UTF8Type][ver(dba_test, cust_view),
 org.apache.cassandra.db.marshal.LongType] (resultMetadata=[0 columns]), v=2





 *Von:* Robert Coli [mailto:rc...@eventbrite.com]
 *Gesendet:* Freitag, 17. April 2015 19:23
 *An:* user@cassandra.apache.org
 *Betreff:* Re: check active queries on cluster



 On Thu, Apr 16, 2015 at 11:10 PM, Rahul Bhardwaj 
 rahul.bhard...@indiamart.com wrote:

 We want to track active queries on cassandra cluster. Is there any tool or
 way to find all active queries on cassandra ?



 You can get a count of them with :



  https://issues.apache.org/jira/browse/CASSANDRA-5084



 =Rob






-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
http://twitter.com/instaclustr | (650) 284 9692


RE: Spark SQL JDBC Server + DSE

2015-06-01 Thread Mohammed Guller
Brian,
We haven't open sourced the REST server, but not  opposed to doing it. Just 
need to carve out some time to clean up the code and carve it out from all the 
other stuff that we do in that REST server.  Will try to do it in the next few 
weeks. If you need it sooner, let me know.

I did consider the option of writing our own Spark SQL JDBC driver for C*, but 
it is lower on the priority list right now.

Mohammed

From: Brian O'Neill [mailto:boneil...@gmail.com] On Behalf Of Brian O'Neill
Sent: Saturday, May 30, 2015 3:12 AM
To: user@cassandra.apache.org
Subject: Re: Spark SQL JDBC Server + DSE


Any chance you open-sourced, or could open-source the REST server? ;)

In thinking about it...
It doesn't feel like it would be that hard to write a Spark SQL JDBC driver 
against Cassandra, akin to what they have for hive:
https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server

I wouldn't mind collaborating on that, if you are headed in that direction.
(and then I could write the REST server on top of that)

LMK,

-brian

---
Brian O'Neill
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile * @boneill42http://www.twitter.com/boneill42

This information transmitted in this email message is for the intended 
recipient only and may contain confidential and/or privileged material. If you 
received this email in error and are not the intended recipient, or the person 
responsible to deliver it to the intended recipient, please contact the sender 
at the email above and delete this email and any attachments and destroy any 
copies thereof. Any review, retransmission, dissemination, copying or other use 
of, or taking any action in reliance upon, this information by persons or 
entities other than the intended recipient is strictly prohibited.


From: Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Friday, May 29, 2015 at 2:15 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Spark SQL JDBC Server + DSE

Brian,
I implemented a similar REST server last year and it works great. Now we have a 
requirement to support JDBC connectivity in addition to the REST API. We want 
to allow users to use tools like Tableau to connect to C* through the Spark SQL 
JDBC/Thift server.

Mohammed

From: Brian O'Neill [mailto:boneil...@gmail.com] On Behalf Of Brian O'Neill
Sent: Thursday, May 28, 2015 6:16 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Spark SQL JDBC Server + DSE

Mohammed,

This doesn't really answer your question, but I'm working on a new REST server 
that allows people to submit SQL queries over REST, which get executed via 
Spark SQL.   Based on what I started here:
http://brianoneill.blogspot.com/2015/05/spark-sql-against-cassandra-example.html

I assume you need JDBC connectivity specifically?

-brian

---
Brian O'Neill
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile * @boneill42http://www.twitter.com/boneill42

This information transmitted in this email message is for the intended 
recipient only and may contain confidential and/or privileged material. If you 
received this email in error and are not the intended recipient, or the person 
responsible to deliver it to the intended recipient, please contact the sender 
at the email above and delete this email and any attachments and destroy any 
copies thereof. Any review, retransmission, dissemination, copying or other use 
of, or taking any action in reliance upon, this information by persons or 
entities other than the intended recipient is strictly prohibited.


From: Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Thursday, May 28, 2015 at 8:26 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Spark SQL JDBC Server + DSE

Anybody out there using DSE + Spark SQL JDBC server?

Mohammed

From: Mohammed Guller [mailto:moham...@glassbeam.com]
Sent: Tuesday, May 26, 2015 6:17 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Spark SQL JDBC Server + DSE

Hi -
As I understand, the Spark SQL Thrift/JDBC server cannot be used with the open 
source C*. Only DSE supports  the Spark SQL JDBC server.

We would like to find out whether how many organizations are using this 
combination. If you do use DSE + Spark SQL JDBC server, it would be great if 
you could share your experience. For example, what kind of issues you have run 
into? How is the performance? What reporting tools you are using?

Thank  you!

Mohammed



Re: Spark SQL JDBC Server + DSE

2015-06-01 Thread Sebastian Estevez
Have you looked at job server?

https://github.com/spark-jobserver/spark-jobserver
https://www.youtube.com/watch?v=8k9ToZ4m6os
http://planetcassandra.org/blog/post/fast-spark-queries-on-in-memory-datasets/

All the best,


[image: datastax_logo.png] http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] https://www.linkedin.com/company/datastax [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax

http://cassandrasummit-datastax.com/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Mon, Jun 1, 2015 at 8:13 AM, Mohammed Guller moham...@glassbeam.com
wrote:

  Brian,

 We haven’t open sourced the REST server, but not  opposed to doing it.
 Just need to carve out some time to clean up the code and carve it out from
 all the other stuff that we do in that REST server.  Will try to do it in
 the next few weeks. If you need it sooner, let me know.



 I did consider the option of writing our own Spark SQL JDBC driver for C*,
 but it is lower on the priority list right now.



 Mohammed



 *From:* Brian O'Neill [mailto:boneil...@gmail.com] *On Behalf Of *Brian
 O'Neill
 *Sent:* Saturday, May 30, 2015 3:12 AM

 *To:* user@cassandra.apache.org
 *Subject:* Re: Spark SQL JDBC Server + DSE





 Any chance you open-sourced, or could open-source the REST server? ;)



 In thinking about it…

 It doesn’t feel like it would be that hard to write a Spark SQL JDBC
 driver against Cassandra, akin to what they have for hive:


 https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server



 I wouldn’t mind collaborating on that, if you are headed in that direction.

 (and then I could write the REST server on top of that)



 LMK,



 -brian



 ---

 *Brian O'Neill *

 Chief Technology Officer

 Health Market Science, a LexisNexis Company

 215.588.6024 Mobile • @boneill42 http://www.twitter.com/boneill42



 This information transmitted in this email message is for the intended
 recipient only and may contain confidential and/or privileged material. If
 you received this email in error and are not the intended recipient, or the
 person responsible to deliver it to the intended recipient, please contact
 the sender at the email above and delete this email and any attachments and
 destroy any copies thereof. Any review, retransmission, dissemination,
 copying or other use of, or taking any action in reliance upon, this
 information by persons or entities other than the intended recipient is
 strictly prohibited.





 *From: *Mohammed Guller moham...@glassbeam.com
 *Reply-To: *user@cassandra.apache.org
 *Date: *Friday, May 29, 2015 at 2:15 PM
 *To: *user@cassandra.apache.org user@cassandra.apache.org
 *Subject: *RE: Spark SQL JDBC Server + DSE



 Brian,

 I implemented a similar REST server last year and it works great. Now we
 have a requirement to support JDBC connectivity in addition to the REST
 API. We want to allow users to use tools like Tableau to connect to C*
 through the Spark SQL JDBC/Thift server.



 Mohammed



 *From:* Brian O'Neill [mailto:boneil...@gmail.com boneil...@gmail.com] *On
 Behalf Of *Brian O'Neill
 *Sent:* Thursday, May 28, 2015 6:16 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Spark SQL JDBC Server + DSE



 Mohammed,



 This doesn’t really answer your question, but I’m working on a new REST
 server that allows people to submit SQL queries over REST, which get
 executed via Spark SQL.   Based on what I started here:


 http://brianoneill.blogspot.com/2015/05/spark-sql-against-cassandra-example.html



 I assume you need JDBC connectivity specifically?



 -brian



 ---

 *Brian O'Neill *

 Chief Technology Officer

 Health Market Science, a LexisNexis Company

 215.588.6024 Mobile • @boneill42 http://www.twitter.com/boneill42



 This information transmitted in this email message is for the intended
 recipient only and may contain confidential and/or privileged material. If
 you received this email in error and are not the intended recipient, or the
 person responsible to deliver it to the intended recipient, please contact
 the sender at the email above and delete this email and any attachments and
 destroy any copies thereof. Any review, retransmission, dissemination,
 copying or other use of, or taking any action in reliance upon, this
 information by persons or entities other than the intended recipient is
 strictly prohibited.





 *From: 

Regarding JIRA

2015-06-01 Thread Kiran mk
Hi ,

I am using Apache Cassandra Community Edition for my learning and
practicing, can I raise the doubts,issues and clarifications using JIRA
ticket against Cassandra.

Will there be any charges for that.  As I know we can create free JIRA
account,

Can anyone suggest me on this.

-- 
Best Regards,
Kiran.M.K.


Re: Regarding JIRA

2015-06-01 Thread Russell Bradberry
Also, feel free to use any of the many other resources available.

The Documentation
Planet Cassandra
Stack Overflow
#cassandra on irc.freenode.net



From:  Daniel Compton
Reply-To:  user@cassandra.apache.org
Date:  Monday, June 1, 2015 at 3:37 PM
To:  user@cassandra.apache.org
Subject:  Re: Regarding JIRA

Hi Kiran

There's no charges for raising issues in the Apache Cassandra JIRA or emailing 
the list. However as I'm sure you're aware, members of this list and on JIRA 
are mostly volunteers, so there's also no guarantee of support or response time.

--
Daniel.

On Tue, Jun 2, 2015 at 7:32 AM Kiran mk coolkiran2...@gmail.com wrote:
Hi ,

I am using Apache Cassandra Community Edition for my learning and practicing, 
can I raise the doubts,issues and clarifications using JIRA ticket against 
Cassandra. 

Will there be any charges for that.  As I know we can create free JIRA account, 
 

Can anyone suggest me on this.


-- 
Best Regards,
Kiran.M.K.



Re: Regarding JIRA

2015-06-01 Thread Daniel Compton
Hi Kiran

There's no charges for raising issues in the Apache Cassandra JIRA or
emailing the list. However as I'm sure you're aware, members of this list
and on JIRA are mostly volunteers, so there's also no guarantee of support
or response time.

--
Daniel.

On Tue, Jun 2, 2015 at 7:32 AM Kiran mk coolkiran2...@gmail.com wrote:

 Hi ,

 I am using Apache Cassandra Community Edition for my learning and
 practicing, can I raise the doubts,issues and clarifications using JIRA
 ticket against Cassandra.

 Will there be any charges for that.  As I know we can create free JIRA
 account,

 Can anyone suggest me on this.


 --
 Best Regards,
 Kiran.M.K.



Re: Regarding JIRA

2015-06-01 Thread Dave Brosius
 

JIra should be left for issues that you have some confidence are bugs in
cassandra or items you want as feature requests. 

For general questions, try the cassandra mailing lists
user@cassandra.apache.org to subscribe -
user-subscr...@cassandra.apache.org 

or use irc #cassandra on freenode 

On 2015-06-01 15:31, Kiran mk wrote: 

 Hi , 
 
 I am using Apache Cassandra Community Edition for my learning and practicing, 
 can I raise the doubts,issues and clarifications using JIRA ticket against 
 Cassandra. 
 
 Will there be any charges for that. As I know we can create free JIRA 
 account, 
 
 Can anyone suggest me on this.
 
 -- 
 
 Best Regards,
 Kiran.M.K.
 

Re: GC pauses affecting entire cluster.

2015-06-01 Thread graham sanderson
Yes native_objects is the way to go… you can tell if memtables are you problem 
because you’ll see promotion failures of objects sized 131074 dwords.

If your h/w is fast enough make your young gen as big as possible - we can 
collect 8G in sub second always, and this gives you your best chance of 
transient objects (especially if you still have thrift clients) leaking into 
the old gen. Moving to 2.1.x (and off heap memtables) from 2.0.x we have 
reduced our old gen down from 16gig to 12gig and will keep shrinking it, but 
have had no promotion failures yet, and it’s been several months.

Note we are running a patched 2.1.3, but 2.1.5 has the equivalent important 
bugs fixed (that might have given you memory issues)

 On Jun 1, 2015, at 3:00 PM, Carl Hu m...@carlhu.com wrote:
 
 Thank you for the suggestion. After analysis of your settings, the basic 
 hypothesis here is to promote very quickly to Old Gen because of a rapid 
 accumulation of heap usage due to memtables. We happen to be running on 2.1, 
 and I thought a more conservative approach that your (quite aggressive gc 
 settings) is to try the new memtable_allocation_type with offheap_objects and 
 see if the memtable pressure is relieved sufficiently such that the standard 
 gc settings can keep up.
 
 The experiment is in progress and I will report back with the results.
 
 On Mon, Jun 1, 2015 at 10:20 AM, Anuj Wadehra anujw_2...@yahoo.co.in 
 mailto:anujw_2...@yahoo.co.in wrote:
 We have write heavy workload and used to face promotion failures/long gc 
 pauses with Cassandra 2.0.x. I am not into code yet but I think that memtable 
 and compaction related objects have mid-life and write heavy workload is not 
 suitable for generation collection by default. So, we tuned JVM to make sure 
 that minimum objects are promoted to Old Gen and achieved great success in 
 that:
 MAX_HEAP_SIZE=12G
 HEAP_NEWSIZE=3G
 -XX:SurvivorRatio=2
 -XX:MaxTenuringThreshold=20
 -XX:CMSInitiatingOccupancyFraction=70
 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=20
 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
 JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
 JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
 JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768
 JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
 JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3
 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=2000
 JVM_OPTS=$JVM_OPTS -XX:+CMSEdenChunksRecordAlways
 JVM_OPTS=$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled
 JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking
 We also think that default total_memtable_space_in_mb=1/4 heap is too much 
 for write heavy loads. By default, young gen is also 1/4 heap.We reduced it 
 to 1000mb in order to make sure that memtable related objects dont stay in 
 memory for too long. Combining this with SurvivorRatio=2 and 
 MaxTenuringThreshold=20 did the job well. GC was very consistent. No Full GC 
 observed.
 
 Environment: 3 node cluster with each node having 24cores,64G RAM and SSDs in 
 RAID5.
 We are making around 12k writes/sec in 5 cf (one with 4 sec index) and 2300 
 reads/sec on each node of 3 node cluster. 2 CFs have wide rows with max data 
 of around 100mb per row
 
 Yes. Node marking down has cascading effect. Within seconds all nodes in our 
 cluster are marked down. 
 
 Thanks
 Anuj Wadehra
 
 
 
 On Monday, 1 June 2015 7:12 PM, Carl Hu m...@carlhu.com 
 mailto:m...@carlhu.com wrote:
 
 
 We are running Cassandra version 2.1.5.469 on 15 nodes and are experiencing a 
 problem where the entire cluster slows down for 2.5 minutes when one node 
 experiences a 17 second stop-the-world gc. These gc's happen once every 2 
 hours. I did find a ticket that seems related to this: 
 https://issues.apache.org/jira/browse/CASSANDRA-3853 
 https://issues.apache.org/jira/browse/CASSANDRA-3853, but Jonathan Ellis 
 has resolved this ticket. 
 
 We are running standard gc settings, but this ticket is not so much concerned 
 with the 17 second gc on a single node (after all, we have 14 others), but 
 that the cascading performance problem.
 
 We running standard values of dynamic_snitch_badness_threshold (0.1) and 
 phi_convict_threshold (8). (These values are relevant for the dynamic snitch 
 routing requests away from the frozen node or the failure detector marking 
 the node as 'down').
 
 We use the python client in default round robin mode, so all clients hits the 
 coordinators at all nodes in round robin. One theory is that since the 
 coordinator on all nodes must hit the frozen node at some point in the 17 
 seconds, each node's request queues fills up, and the entire cluster thus 
 freezes up. That would explain a 17 second freeze but would not explain the 
 2.5 minute slowdown (10x increase in request latency @P50).
 
 I'd love your thoughts. I've provided the GC chart here.
 
 Carl
 
 d2c95dce-0848-11e5-91f7-6b223349fc14.png
 
 
 



smime.p7s
Description: S/MIME cryptographic signature


Rename User

2015-06-01 Thread Petr Malik
Hello.

I know that Cassandra does not natively provide functionality for renaming 
users, just for altering passwords.

I implemented it in my application by creating a user with the new name and 
original password, granting it the privileges of the original user and dropping 
the original user.

The above procedure seems to work as expected, but I have been wondering if the 
reason why Cassandra does not support renaming natively is because it is 
actively trying to prevent anybody from doing this.

Are there any potential (security?) issues with user renaming and/or the 
outlined procedure?


Thanks.

P.


Re: Hbase vs Cassandra

2015-06-01 Thread Otis Gospodnetic
Hi Ajay,

You won't be able to get unbiased opinion here easily.  You'll need to try
and see how each works for your use case.  We use HBase for the SPM backend
and it has worked well for us - it's stable, handles billions and billions
of rows (I lost track of the actual number many moons ago) and fast, if you
get your key design right.  For our particular use case, HBase turned out
to be a better choice, but we looked at Cassandra, too, back when we chose
HBase.
I'll answer your Q about monitoring:

I'd say both are equally well monitorable.  SPM http://sematext.com/spm can
monitor both HBase and Cassandra equally well.  Because Cassandra is a bit
simpler (vs. HBase having multiple processes one needs to run), it's a bit
simpler to add monitoring to Cassandra, but the difference is small.

SPM is at http://sematext.com/spm if you want to have a look.  We expose
our own HBase clusters in the live demo, so you can see what metrics HBase
exposes.  We don't run Cassandra, so we can't show its graphs, but you can
see some charts, metrics, and filters for Cassandra at
http://blog.sematext.com/2014/06/02/announcement-cassandra-performance-monitoring-in-spm/

I hope this helps.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr  Elasticsearch Support * http://sematext.com/


On Fri, May 29, 2015 at 3:09 PM, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 I need some info on Hbase vs Cassandra as a data store (in general plus
 specific to time series data).

 The comparison in the following helps:
 1: features
 2: deployment and monitoring
 3: performance
 4: anything else

 Thanks
 Ajay



Re: 10000+ CF support from Cassandra

2015-06-01 Thread Jonathan Haddad
 Sorry for this naive question but how important is this tuning? Can this
have a huge impact in production?

Massive.  Here's a graph of when we did some JVM tuning at my previous
company:

http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png

About an order of magnitude difference in performance.

Jon

On Mon, Jun 1, 2015 at 7:20 PM Arun Chaitanya chaitan64a...@gmail.com
wrote:

 Thanks Jon and Jack,

  I strongly advise against this approach.
 Jon, I think so too. But so you actually foresee any problems with this
 approach?
 I can think of a few. [I want to evaluate if we can live with this problem]

- No more CQL.
- No data types, everything needs to be a blob.
- Limited clustering Keys and default clustering order.

  First off, different workloads need different tuning.
 Sorry for this naive question but how important is this tuning? Can this
 have a huge impact in production?

  You might want to consider a model where you have an application layer
 that maps logical tenant tables into partition keys within a single large
 Casandra table, or at least a relatively small number of  Cassandra tables.
 It will depend on the typical size of your tenant tables - very small ones
 would make sense within a single partition, while larger ones should have
 separate partitions for a tenant's data. The key here is that tables are
 expensive, but partitions are cheap and scale very well with Cassandra.
 We are actually trying similar approach. But we don't want to expose this
 to application layer. We are attempting to hide this and provide an API.

  Finally, you said 10 clusters, but did you mean 10 nodes? You might
 want to consider a model where you do indeed have multiple clusters, where
 each handles a fraction of the tenants, since there is no need for separate
 tenants to be on the same cluster.
 I meant 10 clusters. We want to split our tables across multiple clusters
 if above approach is not possible. [But it seems to be very costly]

 Thanks,







 On Fri, May 29, 2015 at 5:49 AM, Jack Krupansky jack.krupan...@gmail.com
 wrote:

 How big is each of the tables - are they all fairly small or fairly
 large? Small as in no more than thousands of rows or large as in tens of
 millions or hundreds of millions of rows?

 Small tables are are not ideal for a Cassandra cluster since the rows
 would be spread out across the nodes, even though it might make more sense
 for each small table to be on a single node.

 You might want to consider a model where you have an application layer
 that maps logical tenant tables into partition keys within a single large
 Casandra table, or at least a relatively small number of Cassandra tables.
 It will depend on the typical size of your tenant tables - very small ones
 would make sense within a single partition, while larger ones should have
 separate partitions for a tenant's data. The key here is that tables are
 expensive, but partitions are cheap and scale very well with Cassandra.

 Finally, you said 10 clusters, but did you mean 10 nodes? You might
 want to consider a model where you do indeed have multiple clusters, where
 each handles a fraction of the tenants, since there is no need for separate
 tenants to be on the same cluster.


 -- Jack Krupansky

 On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya chaitan64a...@gmail.com
  wrote:

 Good Day Everyone,

 I am very happy with the (almost) linear scalability offered by C*. We
 had a lot of problems with RDBMS.

 But, I heard that C* has a limit on number of column families that can
 be created in a single cluster.
 The reason being each CF stores 1-2 MB on the JVM heap.

 In our use case, we have about 1+ CF and we want to support
 multi-tenancy.
 (i.e 1 * no of tenants)

 We are new to C* and being from RDBMS background, I would like to
 understand how to tackle this scenario from your advice.

 Our plan is to use Off-Heap memtable approach.
 http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1

 Each node in the cluster has following configuration
 16 GB machine (8GB Cassandra JVM + 2GB System + 6GB Off-Heap)
 IMO, this should be able to support 1000 CF with no(very less) impact on
 performance and startup time.

 We tackle multi-tenancy using different keyspaces.(Solution I found on
 the web)

 Using this approach we can have 10 clusters doing the job. (We actually
 are worried about the cost)

 Can you please help us evaluate this strategy? I want to hear
 communities opinion on this.

 My major concerns being,

 1. Is Off-Heap strategy safe and my assumption of 16 GB supporting 1000
 CF right?

 2. Can we use multiple keyspaces to solve multi-tenancy? IMO, the number
 of column families increase even when we use multiple keyspace.

 3. I understand the complexity using multi-cluster for single
 application. The code base will get tightly coupled with infrastructure. Is
 this the right approach?

 Any suggestion is appreciated.

 

JSON Cassandra 2.2 - insert syntax

2015-06-01 Thread Michel Blase
Hi all,

I'm trying to test the new JSON functionalities in C* 2.2.

I'm using this example:

https://issues.apache.org/jira/browse/CASSANDRA-7970

I believe there is a typo in the CREATE TABLE statement that requires
frozen:

CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext,
frozenaddress);

but my real problem is in the insert syntax. I've found the CQL-2.2
documentation and my best guess is this:

INSERT INTO users JSON {'id': 123,'name': 'jbellis','address': {'home':
{'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones':
[2101234567]}}};

but I get the error:

SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query]
message=line 1:23 mismatched input '{'id': 123,'name':
'jbellis','address': {'home': {'street': '123 Cassandra Dr','city':
'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' (INSERT
INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home':
{'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones':
[2101234567]}}]};)


Any idea?


Thanks,

Michael


Re: JSON Cassandra 2.2 - insert syntax

2015-06-01 Thread Zach Kurey
Looks like you have your use of single vs. double quotes inverted.  What
you want is:

INSERT INTO users JSON  '{id: 123,name: jbellis,address: {home: {
street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones: [
2101234567]}}}';

HTH

On Mon, Jun 1, 2015 at 6:03 PM, Michel Blase mblas...@gmail.com wrote:

 Hi all,

 I'm trying to test the new JSON functionalities in C* 2.2.

 I'm using this example:

 https://issues.apache.org/jira/browse/CASSANDRA-7970

 I believe there is a typo in the CREATE TABLE statement that requires
 frozen:

 CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext,
 frozenaddress);

 but my real problem is in the insert syntax. I've found the CQL-2.2
 documentation and my best guess is this:

 INSERT INTO users JSON {'id': 123,'name': 'jbellis','address': {'home':
 {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones':
 [2101234567]}}};

 but I get the error:

 SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query]
 message=line 1:23 mismatched input '{'id': 123,'name':
 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city':
 'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' (INSERT
 INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home':
 {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones':
 [2101234567]}}]};)


 Any idea?


 Thanks,

 Michael





Re: JSON Cassandra 2.2 - insert syntax

2015-06-01 Thread Michel Blase
Thanks Zach,

tried that but I get the same error:

*SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query]
message=line 1:24 mismatched input '{id: 123,name:
jbellis,address: {home: {street: 123 Cassandra Dr,city:
Austin,zip_code: 78747,phones: [2101234567]}}}' expecting ')' (INSERT
INTO users JSON  ['{id: 123,name: jbellis,address: {home:
{street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones:
[2101234567]}}]}';)*

On Mon, Jun 1, 2015 at 6:12 PM, Zach Kurey zach.ku...@datastax.com wrote:

 Looks like you have your use of single vs. double quotes inverted.  What
 you want is:

 INSERT INTO users JSON  '{id: 123,name: jbellis,address: {home:
 {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones:
 [2101234567]}}}';

 HTH

 On Mon, Jun 1, 2015 at 6:03 PM, Michel Blase mblas...@gmail.com wrote:

 Hi all,

 I'm trying to test the new JSON functionalities in C* 2.2.

 I'm using this example:

 https://issues.apache.org/jira/browse/CASSANDRA-7970

 I believe there is a typo in the CREATE TABLE statement that requires
 frozen:

 CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext,
 frozenaddress);

 but my real problem is in the insert syntax. I've found the CQL-2.2
 documentation and my best guess is this:

 INSERT INTO users JSON {'id': 123,'name': 'jbellis','address': {'home':
 {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones':
 [2101234567]}}};

 but I get the error:

 SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query]
 message=line 1:23 mismatched input '{'id': 123,'name':
 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city':
 'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' (INSERT
 INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home':
 {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones':
 [2101234567]}}]};)


 Any idea?


 Thanks,

 Michael






Re: JSON Cassandra 2.2 - insert syntax

2015-06-01 Thread Zach Kurey
Hi Michel,

My only other guess is that you actually are running Cassandra 2.1, since
thats the exact error I get if I try to execute a JSON statement against a
version earlier than 2.2.



On Mon, Jun 1, 2015 at 6:13 PM, Michel Blase mblas...@gmail.com wrote:

 Thanks Zach,

 tried that but I get the same error:

 *SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query]
 message=line 1:24 mismatched input '{id: 123,name:
 jbellis,address: {home: {street: 123 Cassandra Dr,city:
 Austin,zip_code: 78747,phones: [2101234567]}}}' expecting ')' (INSERT
 INTO users JSON  ['{id: 123,name: jbellis,address: {home:
 {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones:
 [2101234567]}}]}';)*

 On Mon, Jun 1, 2015 at 6:12 PM, Zach Kurey zach.ku...@datastax.com
 wrote:

 Looks like you have your use of single vs. double quotes inverted.  What
 you want is:

 INSERT INTO users JSON  '{id: 123,name: jbellis,address: {home:
 {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones:
 [2101234567]}}}';

 HTH

 On Mon, Jun 1, 2015 at 6:03 PM, Michel Blase mblas...@gmail.com wrote:

 Hi all,

 I'm trying to test the new JSON functionalities in C* 2.2.

 I'm using this example:

 https://issues.apache.org/jira/browse/CASSANDRA-7970

 I believe there is a typo in the CREATE TABLE statement that requires
 frozen:

 CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext,
 frozenaddress);

 but my real problem is in the insert syntax. I've found the CQL-2.2
 documentation and my best guess is this:

 INSERT INTO users JSON {'id': 123,'name': 'jbellis','address': {'home':
 {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones':
 [2101234567]}}};

 but I get the error:

 SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query]
 message=line 1:23 mismatched input '{'id': 123,'name':
 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city':
 'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' (INSERT
 INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home':
 {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones':
 [2101234567]}}]};)


 Any idea?


 Thanks,

 Michael







Re: JSON Cassandra 2.2 - insert syntax

2015-06-01 Thread Michel Blase
Zach,

this is embarrassing.you were right, I was running 2.1

shame on me! but now I'm getting the error:

*InvalidRequest: code=2200 [Invalid query] message=JSON values map
contains unrecognized column: address*
any idea? This is the sequence of commands that I'm running:

CREATE KEYSPACE json WITH REPLICATION = { 'class' :'SimpleStrategy',
'replication_factor' : 1 };

USE json;

CREATE TYPE address (street text,city text,zip_code int,phones settext);

CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext,
frozenaddress);

INSERT INTO users JSON  '{id: 123,name: jbellis,address: {home: {
street: 123 Cassandra Dr,city:Austin,zip_code: 78747,phones: [
2101234567]}}}';


Consider that I'm running a just downloaded C2.2 instance (I'm on a mac)

Thanks and sorry for the waste of time before!






On Mon, Jun 1, 2015 at 7:10 PM, Zach Kurey zach.ku...@datastax.com wrote:

 Hi Michel,

 My only other guess is that you actually are running Cassandra 2.1, since
 thats the exact error I get if I try to execute a JSON statement against a
 version earlier than 2.2.



 On Mon, Jun 1, 2015 at 6:13 PM, Michel Blase mblas...@gmail.com wrote:

 Thanks Zach,

 tried that but I get the same error:

 *SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query]
 message=line 1:24 mismatched input '{id: 123,name:
 jbellis,address: {home: {street: 123 Cassandra Dr,city:
 Austin,zip_code: 78747,phones: [2101234567]}}}' expecting ')' (INSERT
 INTO users JSON  ['{id: 123,name: jbellis,address: {home:
 {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones:
 [2101234567]}}]}';)*

 On Mon, Jun 1, 2015 at 6:12 PM, Zach Kurey zach.ku...@datastax.com
 wrote:

 Looks like you have your use of single vs. double quotes inverted.  What
 you want is:

 INSERT INTO users JSON  '{id: 123,name: jbellis,address: {home:
 {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,
 phones: [2101234567]}}}';

 HTH

 On Mon, Jun 1, 2015 at 6:03 PM, Michel Blase mblas...@gmail.com wrote:

 Hi all,

 I'm trying to test the new JSON functionalities in C* 2.2.

 I'm using this example:

 https://issues.apache.org/jira/browse/CASSANDRA-7970

 I believe there is a typo in the CREATE TABLE statement that requires
 frozen:

 CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext,
 frozenaddress);

 but my real problem is in the insert syntax. I've found the CQL-2.2
 documentation and my best guess is this:

 INSERT INTO users JSON {'id': 123,'name': 'jbellis','address':
 {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code':
 78747,'phones': [2101234567]}}};

 but I get the error:

 SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query]
 message=line 1:23 mismatched input '{'id': 123,'name':
 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city':
 'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' (INSERT
 INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home':
 {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones':
 [2101234567]}}]};)


 Any idea?


 Thanks,

 Michael








Re: 10000+ CF support from Cassandra

2015-06-01 Thread Arun Chaitanya
Thanks Jon and Jack,

 I strongly advise against this approach.
Jon, I think so too. But so you actually foresee any problems with this
approach?
I can think of a few. [I want to evaluate if we can live with this problem]

   - No more CQL.
   - No data types, everything needs to be a blob.
   - Limited clustering Keys and default clustering order.

 First off, different workloads need different tuning.
Sorry for this naive question but how important is this tuning? Can this
have a huge impact in production?

 You might want to consider a model where you have an application layer
that maps logical tenant tables into partition keys within a single large
Casandra table, or at least a relatively small number of  Cassandra tables.
It will depend on the typical size of your tenant tables - very small ones
would make sense within a single partition, while larger ones should have
separate partitions for a tenant's data. The key here is that tables are
expensive, but partitions are cheap and scale very well with Cassandra.
We are actually trying similar approach. But we don't want to expose this
to application layer. We are attempting to hide this and provide an API.

 Finally, you said 10 clusters, but did you mean 10 nodes? You might
want to consider a model where you do indeed have multiple clusters, where
each handles a fraction of the tenants, since there is no need for separate
tenants to be on the same cluster.
I meant 10 clusters. We want to split our tables across multiple clusters
if above approach is not possible. [But it seems to be very costly]

Thanks,







On Fri, May 29, 2015 at 5:49 AM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 How big is each of the tables - are they all fairly small or fairly large?
 Small as in no more than thousands of rows or large as in tens of millions
 or hundreds of millions of rows?

 Small tables are are not ideal for a Cassandra cluster since the rows
 would be spread out across the nodes, even though it might make more sense
 for each small table to be on a single node.

 You might want to consider a model where you have an application layer
 that maps logical tenant tables into partition keys within a single large
 Casandra table, or at least a relatively small number of Cassandra tables.
 It will depend on the typical size of your tenant tables - very small ones
 would make sense within a single partition, while larger ones should have
 separate partitions for a tenant's data. The key here is that tables are
 expensive, but partitions are cheap and scale very well with Cassandra.

 Finally, you said 10 clusters, but did you mean 10 nodes? You might want
 to consider a model where you do indeed have multiple clusters, where each
 handles a fraction of the tenants, since there is no need for separate
 tenants to be on the same cluster.


 -- Jack Krupansky

 On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya chaitan64a...@gmail.com
 wrote:

 Good Day Everyone,

 I am very happy with the (almost) linear scalability offered by C*. We
 had a lot of problems with RDBMS.

 But, I heard that C* has a limit on number of column families that can be
 created in a single cluster.
 The reason being each CF stores 1-2 MB on the JVM heap.

 In our use case, we have about 1+ CF and we want to support
 multi-tenancy.
 (i.e 1 * no of tenants)

 We are new to C* and being from RDBMS background, I would like to
 understand how to tackle this scenario from your advice.

 Our plan is to use Off-Heap memtable approach.
 http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1

 Each node in the cluster has following configuration
 16 GB machine (8GB Cassandra JVM + 2GB System + 6GB Off-Heap)
 IMO, this should be able to support 1000 CF with no(very less) impact on
 performance and startup time.

 We tackle multi-tenancy using different keyspaces.(Solution I found on
 the web)

 Using this approach we can have 10 clusters doing the job. (We actually
 are worried about the cost)

 Can you please help us evaluate this strategy? I want to hear communities
 opinion on this.

 My major concerns being,

 1. Is Off-Heap strategy safe and my assumption of 16 GB supporting 1000
 CF right?

 2. Can we use multiple keyspaces to solve multi-tenancy? IMO, the number
 of column families increase even when we use multiple keyspace.

 3. I understand the complexity using multi-cluster for single
 application. The code base will get tightly coupled with infrastructure. Is
 this the right approach?

 Any suggestion is appreciated.

 Thanks,
 Arun





Re: 10000+ CF support from Cassandra

2015-06-01 Thread graham sanderson
  I strongly advise against this approach.
 Jon, I think so too. But so you actually foresee any problems with this 
 approach?
 I can think of a few. [I want to evaluate if we can live with this problem]
Just to be clear, I’m not saying this is a great approach, I AM saying that it 
may be better than having 1+ CFs, which was the original question (it 
really depends on the use case which wasn’t well defined)… map size limit may 
be a problem, and then there is the CQL vs thrift question which could start a 
flame war; ideally CQL maps should give you the same flexibility as arbitrary 
thrift columns

 On Jun 1, 2015, at 9:44 PM, Jonathan Haddad j...@jonhaddad.com wrote:
 
  Sorry for this naive question but how important is this tuning? Can this 
  have a huge impact in production?
 
 Massive.  Here's a graph of when we did some JVM tuning at my previous 
 company: 
 
 http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png
  
 http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png
 
 About an order of magnitude difference in performance.
 
 Jon
 
 On Mon, Jun 1, 2015 at 7:20 PM Arun Chaitanya chaitan64a...@gmail.com 
 mailto:chaitan64a...@gmail.com wrote:
 Thanks Jon and Jack,
 
  I strongly advise against this approach.
 Jon, I think so too. But so you actually foresee any problems with this 
 approach?
 I can think of a few. [I want to evaluate if we can live with this problem]
 No more CQL. 
 No data types, everything needs to be a blob.
 Limited clustering Keys and default clustering order.
  First off, different workloads need different tuning.
 Sorry for this naive question but how important is this tuning? Can this have 
 a huge impact in production?
 
  You might want to consider a model where you have an application layer that 
  maps logical tenant tables into partition keys within a single large 
  Casandra table, or at least a relatively small number of  Cassandra tables. 
  It will depend on the typical size of your tenant tables - very small ones 
  would make sense within a single partition, while larger ones should have 
  separate partitions for a tenant's data. The key here is that tables are 
  expensive, but partitions are cheap and scale very well with Cassandra.
 We are actually trying similar approach. But we don't want to expose this to 
 application layer. We are attempting to hide this and provide an API.
 
  Finally, you said 10 clusters, but did you mean 10 nodes? You might want 
  to consider a model where you do indeed have multiple clusters, where each 
  handles a fraction of the tenants, since there is no need for separate 
  tenants to be on the same cluster.
 I meant 10 clusters. We want to split our tables across multiple clusters if 
 above approach is not possible. [But it seems to be very costly]
 
 Thanks,
 
 
 
 
 
 
 
 On Fri, May 29, 2015 at 5:49 AM, Jack Krupansky jack.krupan...@gmail.com 
 mailto:jack.krupan...@gmail.com wrote:
 How big is each of the tables - are they all fairly small or fairly large? 
 Small as in no more than thousands of rows or large as in tens of millions or 
 hundreds of millions of rows?
 
 Small tables are are not ideal for a Cassandra cluster since the rows would 
 be spread out across the nodes, even though it might make more sense for each 
 small table to be on a single node.
 
 You might want to consider a model where you have an application layer that 
 maps logical tenant tables into partition keys within a single large Casandra 
 table, or at least a relatively small number of Cassandra tables. It will 
 depend on the typical size of your tenant tables - very small ones would make 
 sense within a single partition, while larger ones should have separate 
 partitions for a tenant's data. The key here is that tables are expensive, 
 but partitions are cheap and scale very well with Cassandra.
 
 Finally, you said 10 clusters, but did you mean 10 nodes? You might want to 
 consider a model where you do indeed have multiple clusters, where each 
 handles a fraction of the tenants, since there is no need for separate 
 tenants to be on the same cluster.
 
 
 -- Jack Krupansky
 
 On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya chaitan64a...@gmail.com 
 mailto:chaitan64a...@gmail.com wrote:
 Good Day Everyone,
 
 I am very happy with the (almost) linear scalability offered by C*. We had a 
 lot of problems with RDBMS.
 
 But, I heard that C* has a limit on number of column families that can be 
 created in a single cluster.
 The reason being each CF stores 1-2 MB on the JVM heap.
 
 In our use case, we have about 1+ CF and we want to support multi-tenancy.
 (i.e 1 * no of tenants)
 
 We are new to C* and being from RDBMS background, I would like to understand 
 how to tackle this scenario from your advice.
 
 Our plan is to use Off-Heap memtable approach.
 http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1