How to interpret some GC logs
Hi, Normally I get logs like: 2015-06-01T09:19:50.610+: 4736.314: [GC 6505591K-4895804K(8178944K), 0.0494560 secs] which is fine and understandable but occasionalIy I see something like: 2015-06-01T09:19:50.661+: 4736.365: [GC 4901600K(8178944K), 0.0049600 secs] How to interpret it? Does it miss only part before - so memory occupied before GC cycle? -- BR, Michał Łowicki
AW: check active queries on cluster
You could enable DEBUG logging for org.apache.cassandra.transport.Message and TRACE logging for org.apache.cassandra.cql3.QueryProcessor in the log4j-server.properties file: log4j.logger.org.apache.cassandra.transport.Message=DEBUG log4j.logger.org.apache.cassandra.cql3.QueryProcessor=TRACE Afterwards you get the following output from all PreparedStatements in the system.log file: DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,186 Message.java (line 302) Received: PREPARE INSERT INTO dba_test.cust_view (leid, vid, geoarea, ver) VALUES (?, ?, ?, ?);, v=2 TRACE [Native-Transport-Requests:167] 2015-06-01 15:56:15,187 QueryProcessor.java (line 283) Stored prepared statement 61956319a6d7c84c25414c96edf6e38c with 4 bind markers DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,187 Tracing.java (line 159) request complete DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,187 Message.java (line 309) Responding: RESULT PREPARED 61956319a6d7c84c25414c96edf6e38c [leid(dba_test, cust_view), org.apache.cassandra.db.marshal.UTF8Type][vid(dba_test, cust_view), org.apache.cassandra.db.marshal.UTF8Type][geoarea(dba_test, cust_view), org.apache.cassandra.db.marshal.UTF8Type][ver(dba_test, cust_view), org.apache.cassandra.db.marshal.LongType] (resultMetadata=[0 columns]), v=2 Von: Robert Coli [mailto:rc...@eventbrite.com] Gesendet: Freitag, 17. April 2015 19:23 An: user@cassandra.apache.org Betreff: Re: check active queries on cluster On Thu, Apr 16, 2015 at 11:10 PM, Rahul Bhardwaj rahul.bhard...@indiamart.commailto:rahul.bhard...@indiamart.com wrote: We want to track active queries on cassandra cluster. Is there any tool or way to find all active queries on cassandra ? You can get a count of them with : https://issues.apache.org/jira/browse/CASSANDRA-5084 =Rob
Re: Minor Compactions Not Triggered
On Sun, May 31, 2015 at 11:37 AM, Anuj Wadehra anujw_2...@yahoo.co.in wrote: 2. We thought that CQL compaction subproperty of *tombstone_threshold* will help us after major compactions. This property will ensure that even if we have one huge sstable, once tombstone threshold of 20% has reached, sstables will be compacted and tombstones will be dropped after gc_grace_periods (even if there no similar sized sstables as need by STCS). But in our initial testing, single huge sstable is not getting compacted even if we drop all rows in it and gc_grace_period has passed. *Why tombstone_threshold is behaving like that?* https://issues.apache.org/jira/browse/CASSANDRA-6654 ? =Rob
Re: How to interpret some GC logs
can you tell what jvm is that? jason On Mon, Jun 1, 2015 at 5:46 PM, Michał Łowicki mlowi...@gmail.com wrote: Hi, Normally I get logs like: 2015-06-01T09:19:50.610+: 4736.314: [GC 6505591K-4895804K(8178944K), 0.0494560 secs] which is fine and understandable but occasionalIy I see something like: 2015-06-01T09:19:50.661+: 4736.365: [GC 4901600K(8178944K), 0.0049600 secs] How to interpret it? Does it miss only part before - so memory occupied before GC cycle? -- BR, Michał Łowicki
Re: Minor Compactions Not Triggered
Thanks Robert !!! As per the algorithm shared in the CASSANDRA 6654, I understand that tombstone_threshold property only comes into picture if you have expirying columns and it wont have any effect if you have manually deleted rows in cf. Is my understanding correct? According to you What would be the expected behavior of following steps?? I inserted x rows I deleted x rows Ran major compaction to make sure that one big sstable contains all tombstones Waited for gc grace period to see whether that big sstable formed after major compaction is compacted on its own without finding any other sstable Thanks Anuj Sent from Yahoo Mail on Android From:Robert Coli rc...@eventbrite.com Date:Mon, 1 Jun, 2015 at 10:56 pm Subject:Re: Minor Compactions Not Triggered On Sun, May 31, 2015 at 11:37 AM, Anuj Wadehra anujw_2...@yahoo.co.in wrote: 2. We thought that CQL compaction subproperty of tombstone_threshold will help us after major compactions. This property will ensure that even if we have one huge sstable, once tombstone threshold of 20% has reached, sstables will be compacted and tombstones will be dropped after gc_grace_periods (even if there no similar sized sstables as need by STCS). But in our initial testing, single huge sstable is not getting compacted even if we drop all rows in it and gc_grace_period has passed. Why tombstone_threshold is behaving like that? https://issues.apache.org/jira/browse/CASSANDRA-6654 ? =Rob
ERROR Compaction Interrupted
Hi everyone, I am running C* 2.0.9 without vnodes and RF=2. Recently while repairing, rebalancing the cluster I encountered one instance of this(just one on one node): ERROR CompactionExecutor: https://logentries.com/app/9f95dbd4#55472 CassandraDaemon.uncaughtException - Exception in thread Thread[CompactionExecutor: https://logentries.com/app/9f95dbd4#55472,1,main] May 30 19:31:09 cass-prod4.localdomain cassandra: 2015-05-30 19:31:09,991 ERROR CompactionExecutor:55472 CassandraDaemon.uncaughtException - Exception in thread Thread[CompactionExecutor:55472,1,main] May 30 19:31:09 cass-prod4.localdomain org.apache.cassandra.db.compaction.CompactionInterruptedException: Compaction interrupted: Compaction@1b0b43e5-bef5-34f9-af08-405a7b58c71f(flipagram, home_feed_entry_index, 218409618/450008574)bytes May 30 19:31:09 cass-prod4.localdomain at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:157) May 30 19:31:09 cass-prod4.localdomain at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) May 30 19:31:09 cass-prod4.localdomain at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) May 30 19:31:09 cass-prod4.localdomain at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) May 30 19:31:09 cass-prod4.localdomain at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) May 30 19:31:09 cass-prod4.localdomain at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) May 30 19:31:09 cass-prod4.localdomain at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) May 30 19:31:09 cass-prod4.localdomain at java.util.concurrent.FutureTask.run(FutureTask.java:262) May 30 19:31:09 cass-prod4.localdomain at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) May 30 19:31:09 cass-prod4.localdomain at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) May 30 19:31:09 cass-prod4.localdomain at java.lang.Thread.run(Thread.java:745) After looking up a bit on the mailing list archives etc I understand that this might mean data corruption and I plan to take the node offline and replace it with a new one but still wanted to see if anyone can throw some light here about me missing out on something. Also, if this is a case of corrupted SST should I be concerned about it getting replicated and take care of it on the replication too. Thanks
Re: check active queries on cluster
A warning on enabling debug and trace logging on the write path. You will be writing information about every query to disk. If you have any significant volume of requests going through the nodes things will get slow pretty quickly. At least with C* 2.1 and using the default logging config. On 1 June 2015 at 07:34, Sebastian Martinka sebastian.marti...@mercateo.com wrote: You could enable DEBUG logging for org.apache.cassandra.transport.Message and TRACE logging for org.apache.cassandra.cql3.QueryProcessor in the log4j-server.properties file: log4j.logger.org.apache.cassandra.transport.Message=DEBUG log4j.logger.org.apache.cassandra.cql3.QueryProcessor=TRACE Afterwards you get the following output from all PreparedStatements in the system.log file: DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,186 Message.java (line 302) Received: PREPARE INSERT INTO dba_test.cust_view (leid, vid, geoarea, ver) VALUES (?, ?, ?, ?);, v=2 TRACE [Native-Transport-Requests:167] 2015-06-01 15:56:15,187 QueryProcessor.java (line 283) Stored prepared statement 61956319a6d7c84c25414c96edf6e38c with 4 bind markers DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,187 Tracing.java (line 159) request complete DEBUG [Native-Transport-Requests:167] 2015-06-01 15:56:15,187 Message.java (line 309) Responding: RESULT PREPARED 61956319a6d7c84c25414c96edf6e38c [leid(dba_test, cust_view), org.apache.cassandra.db.marshal.UTF8Type][vid(dba_test, cust_view), org.apache.cassandra.db.marshal.UTF8Type][geoarea(dba_test, cust_view), org.apache.cassandra.db.marshal.UTF8Type][ver(dba_test, cust_view), org.apache.cassandra.db.marshal.LongType] (resultMetadata=[0 columns]), v=2 *Von:* Robert Coli [mailto:rc...@eventbrite.com] *Gesendet:* Freitag, 17. April 2015 19:23 *An:* user@cassandra.apache.org *Betreff:* Re: check active queries on cluster On Thu, Apr 16, 2015 at 11:10 PM, Rahul Bhardwaj rahul.bhard...@indiamart.com wrote: We want to track active queries on cassandra cluster. Is there any tool or way to find all active queries on cassandra ? You can get a count of them with : https://issues.apache.org/jira/browse/CASSANDRA-5084 =Rob -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr http://twitter.com/instaclustr | (650) 284 9692
RE: Spark SQL JDBC Server + DSE
Brian, We haven't open sourced the REST server, but not opposed to doing it. Just need to carve out some time to clean up the code and carve it out from all the other stuff that we do in that REST server. Will try to do it in the next few weeks. If you need it sooner, let me know. I did consider the option of writing our own Spark SQL JDBC driver for C*, but it is lower on the priority list right now. Mohammed From: Brian O'Neill [mailto:boneil...@gmail.com] On Behalf Of Brian O'Neill Sent: Saturday, May 30, 2015 3:12 AM To: user@cassandra.apache.org Subject: Re: Spark SQL JDBC Server + DSE Any chance you open-sourced, or could open-source the REST server? ;) In thinking about it... It doesn't feel like it would be that hard to write a Spark SQL JDBC driver against Cassandra, akin to what they have for hive: https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server I wouldn't mind collaborating on that, if you are headed in that direction. (and then I could write the REST server on top of that) LMK, -brian --- Brian O'Neill Chief Technology Officer Health Market Science, a LexisNexis Company 215.588.6024 Mobile * @boneill42http://www.twitter.com/boneill42 This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. From: Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Friday, May 29, 2015 at 2:15 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Spark SQL JDBC Server + DSE Brian, I implemented a similar REST server last year and it works great. Now we have a requirement to support JDBC connectivity in addition to the REST API. We want to allow users to use tools like Tableau to connect to C* through the Spark SQL JDBC/Thift server. Mohammed From: Brian O'Neill [mailto:boneil...@gmail.com] On Behalf Of Brian O'Neill Sent: Thursday, May 28, 2015 6:16 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Spark SQL JDBC Server + DSE Mohammed, This doesn't really answer your question, but I'm working on a new REST server that allows people to submit SQL queries over REST, which get executed via Spark SQL. Based on what I started here: http://brianoneill.blogspot.com/2015/05/spark-sql-against-cassandra-example.html I assume you need JDBC connectivity specifically? -brian --- Brian O'Neill Chief Technology Officer Health Market Science, a LexisNexis Company 215.588.6024 Mobile * @boneill42http://www.twitter.com/boneill42 This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. From: Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Thursday, May 28, 2015 at 8:26 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Spark SQL JDBC Server + DSE Anybody out there using DSE + Spark SQL JDBC server? Mohammed From: Mohammed Guller [mailto:moham...@glassbeam.com] Sent: Tuesday, May 26, 2015 6:17 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Spark SQL JDBC Server + DSE Hi - As I understand, the Spark SQL Thrift/JDBC server cannot be used with the open source C*. Only DSE supports the Spark SQL JDBC server. We would like to find out whether how many organizations are using this combination. If you do use DSE + Spark SQL JDBC server, it would be great if you could share your experience. For example, what kind of issues you have run into? How is the performance? What reporting tools you are using? Thank you! Mohammed
Re: Spark SQL JDBC Server + DSE
Have you looked at job server? https://github.com/spark-jobserver/spark-jobserver https://www.youtube.com/watch?v=8k9ToZ4m6os http://planetcassandra.org/blog/post/fast-spark-queries-on-in-memory-datasets/ All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Mon, Jun 1, 2015 at 8:13 AM, Mohammed Guller moham...@glassbeam.com wrote: Brian, We haven’t open sourced the REST server, but not opposed to doing it. Just need to carve out some time to clean up the code and carve it out from all the other stuff that we do in that REST server. Will try to do it in the next few weeks. If you need it sooner, let me know. I did consider the option of writing our own Spark SQL JDBC driver for C*, but it is lower on the priority list right now. Mohammed *From:* Brian O'Neill [mailto:boneil...@gmail.com] *On Behalf Of *Brian O'Neill *Sent:* Saturday, May 30, 2015 3:12 AM *To:* user@cassandra.apache.org *Subject:* Re: Spark SQL JDBC Server + DSE Any chance you open-sourced, or could open-source the REST server? ;) In thinking about it… It doesn’t feel like it would be that hard to write a Spark SQL JDBC driver against Cassandra, akin to what they have for hive: https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server I wouldn’t mind collaborating on that, if you are headed in that direction. (and then I could write the REST server on top of that) LMK, -brian --- *Brian O'Neill * Chief Technology Officer Health Market Science, a LexisNexis Company 215.588.6024 Mobile • @boneill42 http://www.twitter.com/boneill42 This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. *From: *Mohammed Guller moham...@glassbeam.com *Reply-To: *user@cassandra.apache.org *Date: *Friday, May 29, 2015 at 2:15 PM *To: *user@cassandra.apache.org user@cassandra.apache.org *Subject: *RE: Spark SQL JDBC Server + DSE Brian, I implemented a similar REST server last year and it works great. Now we have a requirement to support JDBC connectivity in addition to the REST API. We want to allow users to use tools like Tableau to connect to C* through the Spark SQL JDBC/Thift server. Mohammed *From:* Brian O'Neill [mailto:boneil...@gmail.com boneil...@gmail.com] *On Behalf Of *Brian O'Neill *Sent:* Thursday, May 28, 2015 6:16 PM *To:* user@cassandra.apache.org *Subject:* Re: Spark SQL JDBC Server + DSE Mohammed, This doesn’t really answer your question, but I’m working on a new REST server that allows people to submit SQL queries over REST, which get executed via Spark SQL. Based on what I started here: http://brianoneill.blogspot.com/2015/05/spark-sql-against-cassandra-example.html I assume you need JDBC connectivity specifically? -brian --- *Brian O'Neill * Chief Technology Officer Health Market Science, a LexisNexis Company 215.588.6024 Mobile • @boneill42 http://www.twitter.com/boneill42 This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. *From:
Regarding JIRA
Hi , I am using Apache Cassandra Community Edition for my learning and practicing, can I raise the doubts,issues and clarifications using JIRA ticket against Cassandra. Will there be any charges for that. As I know we can create free JIRA account, Can anyone suggest me on this. -- Best Regards, Kiran.M.K.
Re: Regarding JIRA
Also, feel free to use any of the many other resources available. The Documentation Planet Cassandra Stack Overflow #cassandra on irc.freenode.net From: Daniel Compton Reply-To: user@cassandra.apache.org Date: Monday, June 1, 2015 at 3:37 PM To: user@cassandra.apache.org Subject: Re: Regarding JIRA Hi Kiran There's no charges for raising issues in the Apache Cassandra JIRA or emailing the list. However as I'm sure you're aware, members of this list and on JIRA are mostly volunteers, so there's also no guarantee of support or response time. -- Daniel. On Tue, Jun 2, 2015 at 7:32 AM Kiran mk coolkiran2...@gmail.com wrote: Hi , I am using Apache Cassandra Community Edition for my learning and practicing, can I raise the doubts,issues and clarifications using JIRA ticket against Cassandra. Will there be any charges for that. As I know we can create free JIRA account, Can anyone suggest me on this. -- Best Regards, Kiran.M.K.
Re: Regarding JIRA
Hi Kiran There's no charges for raising issues in the Apache Cassandra JIRA or emailing the list. However as I'm sure you're aware, members of this list and on JIRA are mostly volunteers, so there's also no guarantee of support or response time. -- Daniel. On Tue, Jun 2, 2015 at 7:32 AM Kiran mk coolkiran2...@gmail.com wrote: Hi , I am using Apache Cassandra Community Edition for my learning and practicing, can I raise the doubts,issues and clarifications using JIRA ticket against Cassandra. Will there be any charges for that. As I know we can create free JIRA account, Can anyone suggest me on this. -- Best Regards, Kiran.M.K.
Re: Regarding JIRA
JIra should be left for issues that you have some confidence are bugs in cassandra or items you want as feature requests. For general questions, try the cassandra mailing lists user@cassandra.apache.org to subscribe - user-subscr...@cassandra.apache.org or use irc #cassandra on freenode On 2015-06-01 15:31, Kiran mk wrote: Hi , I am using Apache Cassandra Community Edition for my learning and practicing, can I raise the doubts,issues and clarifications using JIRA ticket against Cassandra. Will there be any charges for that. As I know we can create free JIRA account, Can anyone suggest me on this. -- Best Regards, Kiran.M.K.
Re: GC pauses affecting entire cluster.
Yes native_objects is the way to go… you can tell if memtables are you problem because you’ll see promotion failures of objects sized 131074 dwords. If your h/w is fast enough make your young gen as big as possible - we can collect 8G in sub second always, and this gives you your best chance of transient objects (especially if you still have thrift clients) leaking into the old gen. Moving to 2.1.x (and off heap memtables) from 2.0.x we have reduced our old gen down from 16gig to 12gig and will keep shrinking it, but have had no promotion failures yet, and it’s been several months. Note we are running a patched 2.1.3, but 2.1.5 has the equivalent important bugs fixed (that might have given you memory issues) On Jun 1, 2015, at 3:00 PM, Carl Hu m...@carlhu.com wrote: Thank you for the suggestion. After analysis of your settings, the basic hypothesis here is to promote very quickly to Old Gen because of a rapid accumulation of heap usage due to memtables. We happen to be running on 2.1, and I thought a more conservative approach that your (quite aggressive gc settings) is to try the new memtable_allocation_type with offheap_objects and see if the memtable pressure is relieved sufficiently such that the standard gc settings can keep up. The experiment is in progress and I will report back with the results. On Mon, Jun 1, 2015 at 10:20 AM, Anuj Wadehra anujw_2...@yahoo.co.in mailto:anujw_2...@yahoo.co.in wrote: We have write heavy workload and used to face promotion failures/long gc pauses with Cassandra 2.0.x. I am not into code yet but I think that memtable and compaction related objects have mid-life and write heavy workload is not suitable for generation collection by default. So, we tuned JVM to make sure that minimum objects are promoted to Old Gen and achieved great success in that: MAX_HEAP_SIZE=12G HEAP_NEWSIZE=3G -XX:SurvivorRatio=2 -XX:MaxTenuringThreshold=20 -XX:CMSInitiatingOccupancyFraction=70 JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=20 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=2000 JVM_OPTS=$JVM_OPTS -XX:+CMSEdenChunksRecordAlways JVM_OPTS=$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking We also think that default total_memtable_space_in_mb=1/4 heap is too much for write heavy loads. By default, young gen is also 1/4 heap.We reduced it to 1000mb in order to make sure that memtable related objects dont stay in memory for too long. Combining this with SurvivorRatio=2 and MaxTenuringThreshold=20 did the job well. GC was very consistent. No Full GC observed. Environment: 3 node cluster with each node having 24cores,64G RAM and SSDs in RAID5. We are making around 12k writes/sec in 5 cf (one with 4 sec index) and 2300 reads/sec on each node of 3 node cluster. 2 CFs have wide rows with max data of around 100mb per row Yes. Node marking down has cascading effect. Within seconds all nodes in our cluster are marked down. Thanks Anuj Wadehra On Monday, 1 June 2015 7:12 PM, Carl Hu m...@carlhu.com mailto:m...@carlhu.com wrote: We are running Cassandra version 2.1.5.469 on 15 nodes and are experiencing a problem where the entire cluster slows down for 2.5 minutes when one node experiences a 17 second stop-the-world gc. These gc's happen once every 2 hours. I did find a ticket that seems related to this: https://issues.apache.org/jira/browse/CASSANDRA-3853 https://issues.apache.org/jira/browse/CASSANDRA-3853, but Jonathan Ellis has resolved this ticket. We are running standard gc settings, but this ticket is not so much concerned with the 17 second gc on a single node (after all, we have 14 others), but that the cascading performance problem. We running standard values of dynamic_snitch_badness_threshold (0.1) and phi_convict_threshold (8). (These values are relevant for the dynamic snitch routing requests away from the frozen node or the failure detector marking the node as 'down'). We use the python client in default round robin mode, so all clients hits the coordinators at all nodes in round robin. One theory is that since the coordinator on all nodes must hit the frozen node at some point in the 17 seconds, each node's request queues fills up, and the entire cluster thus freezes up. That would explain a 17 second freeze but would not explain the 2.5 minute slowdown (10x increase in request latency @P50). I'd love your thoughts. I've provided the GC chart here. Carl d2c95dce-0848-11e5-91f7-6b223349fc14.png smime.p7s Description: S/MIME cryptographic signature
Rename User
Hello. I know that Cassandra does not natively provide functionality for renaming users, just for altering passwords. I implemented it in my application by creating a user with the new name and original password, granting it the privileges of the original user and dropping the original user. The above procedure seems to work as expected, but I have been wondering if the reason why Cassandra does not support renaming natively is because it is actively trying to prevent anybody from doing this. Are there any potential (security?) issues with user renaming and/or the outlined procedure? Thanks. P.
Re: Hbase vs Cassandra
Hi Ajay, You won't be able to get unbiased opinion here easily. You'll need to try and see how each works for your use case. We use HBase for the SPM backend and it has worked well for us - it's stable, handles billions and billions of rows (I lost track of the actual number many moons ago) and fast, if you get your key design right. For our particular use case, HBase turned out to be a better choice, but we looked at Cassandra, too, back when we chose HBase. I'll answer your Q about monitoring: I'd say both are equally well monitorable. SPM http://sematext.com/spm can monitor both HBase and Cassandra equally well. Because Cassandra is a bit simpler (vs. HBase having multiple processes one needs to run), it's a bit simpler to add monitoring to Cassandra, but the difference is small. SPM is at http://sematext.com/spm if you want to have a look. We expose our own HBase clusters in the live demo, so you can see what metrics HBase exposes. We don't run Cassandra, so we can't show its graphs, but you can see some charts, metrics, and filters for Cassandra at http://blog.sematext.com/2014/06/02/announcement-cassandra-performance-monitoring-in-spm/ I hope this helps. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Fri, May 29, 2015 at 3:09 PM, Ajay ajay.ga...@gmail.com wrote: Hi, I need some info on Hbase vs Cassandra as a data store (in general plus specific to time series data). The comparison in the following helps: 1: features 2: deployment and monitoring 3: performance 4: anything else Thanks Ajay
Re: 10000+ CF support from Cassandra
Sorry for this naive question but how important is this tuning? Can this have a huge impact in production? Massive. Here's a graph of when we did some JVM tuning at my previous company: http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png About an order of magnitude difference in performance. Jon On Mon, Jun 1, 2015 at 7:20 PM Arun Chaitanya chaitan64a...@gmail.com wrote: Thanks Jon and Jack, I strongly advise against this approach. Jon, I think so too. But so you actually foresee any problems with this approach? I can think of a few. [I want to evaluate if we can live with this problem] - No more CQL. - No data types, everything needs to be a blob. - Limited clustering Keys and default clustering order. First off, different workloads need different tuning. Sorry for this naive question but how important is this tuning? Can this have a huge impact in production? You might want to consider a model where you have an application layer that maps logical tenant tables into partition keys within a single large Casandra table, or at least a relatively small number of Cassandra tables. It will depend on the typical size of your tenant tables - very small ones would make sense within a single partition, while larger ones should have separate partitions for a tenant's data. The key here is that tables are expensive, but partitions are cheap and scale very well with Cassandra. We are actually trying similar approach. But we don't want to expose this to application layer. We are attempting to hide this and provide an API. Finally, you said 10 clusters, but did you mean 10 nodes? You might want to consider a model where you do indeed have multiple clusters, where each handles a fraction of the tenants, since there is no need for separate tenants to be on the same cluster. I meant 10 clusters. We want to split our tables across multiple clusters if above approach is not possible. [But it seems to be very costly] Thanks, On Fri, May 29, 2015 at 5:49 AM, Jack Krupansky jack.krupan...@gmail.com wrote: How big is each of the tables - are they all fairly small or fairly large? Small as in no more than thousands of rows or large as in tens of millions or hundreds of millions of rows? Small tables are are not ideal for a Cassandra cluster since the rows would be spread out across the nodes, even though it might make more sense for each small table to be on a single node. You might want to consider a model where you have an application layer that maps logical tenant tables into partition keys within a single large Casandra table, or at least a relatively small number of Cassandra tables. It will depend on the typical size of your tenant tables - very small ones would make sense within a single partition, while larger ones should have separate partitions for a tenant's data. The key here is that tables are expensive, but partitions are cheap and scale very well with Cassandra. Finally, you said 10 clusters, but did you mean 10 nodes? You might want to consider a model where you do indeed have multiple clusters, where each handles a fraction of the tenants, since there is no need for separate tenants to be on the same cluster. -- Jack Krupansky On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya chaitan64a...@gmail.com wrote: Good Day Everyone, I am very happy with the (almost) linear scalability offered by C*. We had a lot of problems with RDBMS. But, I heard that C* has a limit on number of column families that can be created in a single cluster. The reason being each CF stores 1-2 MB on the JVM heap. In our use case, we have about 1+ CF and we want to support multi-tenancy. (i.e 1 * no of tenants) We are new to C* and being from RDBMS background, I would like to understand how to tackle this scenario from your advice. Our plan is to use Off-Heap memtable approach. http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1 Each node in the cluster has following configuration 16 GB machine (8GB Cassandra JVM + 2GB System + 6GB Off-Heap) IMO, this should be able to support 1000 CF with no(very less) impact on performance and startup time. We tackle multi-tenancy using different keyspaces.(Solution I found on the web) Using this approach we can have 10 clusters doing the job. (We actually are worried about the cost) Can you please help us evaluate this strategy? I want to hear communities opinion on this. My major concerns being, 1. Is Off-Heap strategy safe and my assumption of 16 GB supporting 1000 CF right? 2. Can we use multiple keyspaces to solve multi-tenancy? IMO, the number of column families increase even when we use multiple keyspace. 3. I understand the complexity using multi-cluster for single application. The code base will get tightly coupled with infrastructure. Is this the right approach? Any suggestion is appreciated.
JSON Cassandra 2.2 - insert syntax
Hi all, I'm trying to test the new JSON functionalities in C* 2.2. I'm using this example: https://issues.apache.org/jira/browse/CASSANDRA-7970 I believe there is a typo in the CREATE TABLE statement that requires frozen: CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext, frozenaddress); but my real problem is in the insert syntax. I've found the CQL-2.2 documentation and my best guess is this: INSERT INTO users JSON {'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}}; but I get the error: SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query] message=line 1:23 mismatched input '{'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' (INSERT INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}]};) Any idea? Thanks, Michael
Re: JSON Cassandra 2.2 - insert syntax
Looks like you have your use of single vs. double quotes inverted. What you want is: INSERT INTO users JSON '{id: 123,name: jbellis,address: {home: { street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones: [ 2101234567]}}}'; HTH On Mon, Jun 1, 2015 at 6:03 PM, Michel Blase mblas...@gmail.com wrote: Hi all, I'm trying to test the new JSON functionalities in C* 2.2. I'm using this example: https://issues.apache.org/jira/browse/CASSANDRA-7970 I believe there is a typo in the CREATE TABLE statement that requires frozen: CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext, frozenaddress); but my real problem is in the insert syntax. I've found the CQL-2.2 documentation and my best guess is this: INSERT INTO users JSON {'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}}; but I get the error: SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query] message=line 1:23 mismatched input '{'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' (INSERT INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}]};) Any idea? Thanks, Michael
Re: JSON Cassandra 2.2 - insert syntax
Thanks Zach, tried that but I get the same error: *SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query] message=line 1:24 mismatched input '{id: 123,name: jbellis,address: {home: {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones: [2101234567]}}}' expecting ')' (INSERT INTO users JSON ['{id: 123,name: jbellis,address: {home: {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones: [2101234567]}}]}';)* On Mon, Jun 1, 2015 at 6:12 PM, Zach Kurey zach.ku...@datastax.com wrote: Looks like you have your use of single vs. double quotes inverted. What you want is: INSERT INTO users JSON '{id: 123,name: jbellis,address: {home: {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones: [2101234567]}}}'; HTH On Mon, Jun 1, 2015 at 6:03 PM, Michel Blase mblas...@gmail.com wrote: Hi all, I'm trying to test the new JSON functionalities in C* 2.2. I'm using this example: https://issues.apache.org/jira/browse/CASSANDRA-7970 I believe there is a typo in the CREATE TABLE statement that requires frozen: CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext, frozenaddress); but my real problem is in the insert syntax. I've found the CQL-2.2 documentation and my best guess is this: INSERT INTO users JSON {'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}}; but I get the error: SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query] message=line 1:23 mismatched input '{'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' (INSERT INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}]};) Any idea? Thanks, Michael
Re: JSON Cassandra 2.2 - insert syntax
Hi Michel, My only other guess is that you actually are running Cassandra 2.1, since thats the exact error I get if I try to execute a JSON statement against a version earlier than 2.2. On Mon, Jun 1, 2015 at 6:13 PM, Michel Blase mblas...@gmail.com wrote: Thanks Zach, tried that but I get the same error: *SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query] message=line 1:24 mismatched input '{id: 123,name: jbellis,address: {home: {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones: [2101234567]}}}' expecting ')' (INSERT INTO users JSON ['{id: 123,name: jbellis,address: {home: {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones: [2101234567]}}]}';)* On Mon, Jun 1, 2015 at 6:12 PM, Zach Kurey zach.ku...@datastax.com wrote: Looks like you have your use of single vs. double quotes inverted. What you want is: INSERT INTO users JSON '{id: 123,name: jbellis,address: {home: {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones: [2101234567]}}}'; HTH On Mon, Jun 1, 2015 at 6:03 PM, Michel Blase mblas...@gmail.com wrote: Hi all, I'm trying to test the new JSON functionalities in C* 2.2. I'm using this example: https://issues.apache.org/jira/browse/CASSANDRA-7970 I believe there is a typo in the CREATE TABLE statement that requires frozen: CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext, frozenaddress); but my real problem is in the insert syntax. I've found the CQL-2.2 documentation and my best guess is this: INSERT INTO users JSON {'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}}; but I get the error: SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query] message=line 1:23 mismatched input '{'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' (INSERT INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}]};) Any idea? Thanks, Michael
Re: JSON Cassandra 2.2 - insert syntax
Zach, this is embarrassing.you were right, I was running 2.1 shame on me! but now I'm getting the error: *InvalidRequest: code=2200 [Invalid query] message=JSON values map contains unrecognized column: address* any idea? This is the sequence of commands that I'm running: CREATE KEYSPACE json WITH REPLICATION = { 'class' :'SimpleStrategy', 'replication_factor' : 1 }; USE json; CREATE TYPE address (street text,city text,zip_code int,phones settext); CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext, frozenaddress); INSERT INTO users JSON '{id: 123,name: jbellis,address: {home: { street: 123 Cassandra Dr,city:Austin,zip_code: 78747,phones: [ 2101234567]}}}'; Consider that I'm running a just downloaded C2.2 instance (I'm on a mac) Thanks and sorry for the waste of time before! On Mon, Jun 1, 2015 at 7:10 PM, Zach Kurey zach.ku...@datastax.com wrote: Hi Michel, My only other guess is that you actually are running Cassandra 2.1, since thats the exact error I get if I try to execute a JSON statement against a version earlier than 2.2. On Mon, Jun 1, 2015 at 6:13 PM, Michel Blase mblas...@gmail.com wrote: Thanks Zach, tried that but I get the same error: *SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query] message=line 1:24 mismatched input '{id: 123,name: jbellis,address: {home: {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones: [2101234567]}}}' expecting ')' (INSERT INTO users JSON ['{id: 123,name: jbellis,address: {home: {street: 123 Cassandra Dr,city: Austin,zip_code: 78747,phones: [2101234567]}}]}';)* On Mon, Jun 1, 2015 at 6:12 PM, Zach Kurey zach.ku...@datastax.com wrote: Looks like you have your use of single vs. double quotes inverted. What you want is: INSERT INTO users JSON '{id: 123,name: jbellis,address: {home: {street: 123 Cassandra Dr,city: Austin,zip_code: 78747, phones: [2101234567]}}}'; HTH On Mon, Jun 1, 2015 at 6:03 PM, Michel Blase mblas...@gmail.com wrote: Hi all, I'm trying to test the new JSON functionalities in C* 2.2. I'm using this example: https://issues.apache.org/jira/browse/CASSANDRA-7970 I believe there is a typo in the CREATE TABLE statement that requires frozen: CREATE TABLE users (id int PRIMARY KEY,name text,addresses maptext, frozenaddress); but my real problem is in the insert syntax. I've found the CQL-2.2 documentation and my best guess is this: INSERT INTO users JSON {'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}}; but I get the error: SyntaxException: ErrorMessage code=2000 [Syntax error in CQL query] message=line 1:23 mismatched input '{'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}}' expecting ')' (INSERT INTO users JSON [{'id': 123,'name': 'jbellis','address': {'home': {'street': '123 Cassandra Dr','city': 'Austin','zip_code': 78747,'phones': [2101234567]}}]};) Any idea? Thanks, Michael
Re: 10000+ CF support from Cassandra
Thanks Jon and Jack, I strongly advise against this approach. Jon, I think so too. But so you actually foresee any problems with this approach? I can think of a few. [I want to evaluate if we can live with this problem] - No more CQL. - No data types, everything needs to be a blob. - Limited clustering Keys and default clustering order. First off, different workloads need different tuning. Sorry for this naive question but how important is this tuning? Can this have a huge impact in production? You might want to consider a model where you have an application layer that maps logical tenant tables into partition keys within a single large Casandra table, or at least a relatively small number of Cassandra tables. It will depend on the typical size of your tenant tables - very small ones would make sense within a single partition, while larger ones should have separate partitions for a tenant's data. The key here is that tables are expensive, but partitions are cheap and scale very well with Cassandra. We are actually trying similar approach. But we don't want to expose this to application layer. We are attempting to hide this and provide an API. Finally, you said 10 clusters, but did you mean 10 nodes? You might want to consider a model where you do indeed have multiple clusters, where each handles a fraction of the tenants, since there is no need for separate tenants to be on the same cluster. I meant 10 clusters. We want to split our tables across multiple clusters if above approach is not possible. [But it seems to be very costly] Thanks, On Fri, May 29, 2015 at 5:49 AM, Jack Krupansky jack.krupan...@gmail.com wrote: How big is each of the tables - are they all fairly small or fairly large? Small as in no more than thousands of rows or large as in tens of millions or hundreds of millions of rows? Small tables are are not ideal for a Cassandra cluster since the rows would be spread out across the nodes, even though it might make more sense for each small table to be on a single node. You might want to consider a model where you have an application layer that maps logical tenant tables into partition keys within a single large Casandra table, or at least a relatively small number of Cassandra tables. It will depend on the typical size of your tenant tables - very small ones would make sense within a single partition, while larger ones should have separate partitions for a tenant's data. The key here is that tables are expensive, but partitions are cheap and scale very well with Cassandra. Finally, you said 10 clusters, but did you mean 10 nodes? You might want to consider a model where you do indeed have multiple clusters, where each handles a fraction of the tenants, since there is no need for separate tenants to be on the same cluster. -- Jack Krupansky On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya chaitan64a...@gmail.com wrote: Good Day Everyone, I am very happy with the (almost) linear scalability offered by C*. We had a lot of problems with RDBMS. But, I heard that C* has a limit on number of column families that can be created in a single cluster. The reason being each CF stores 1-2 MB on the JVM heap. In our use case, we have about 1+ CF and we want to support multi-tenancy. (i.e 1 * no of tenants) We are new to C* and being from RDBMS background, I would like to understand how to tackle this scenario from your advice. Our plan is to use Off-Heap memtable approach. http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1 Each node in the cluster has following configuration 16 GB machine (8GB Cassandra JVM + 2GB System + 6GB Off-Heap) IMO, this should be able to support 1000 CF with no(very less) impact on performance and startup time. We tackle multi-tenancy using different keyspaces.(Solution I found on the web) Using this approach we can have 10 clusters doing the job. (We actually are worried about the cost) Can you please help us evaluate this strategy? I want to hear communities opinion on this. My major concerns being, 1. Is Off-Heap strategy safe and my assumption of 16 GB supporting 1000 CF right? 2. Can we use multiple keyspaces to solve multi-tenancy? IMO, the number of column families increase even when we use multiple keyspace. 3. I understand the complexity using multi-cluster for single application. The code base will get tightly coupled with infrastructure. Is this the right approach? Any suggestion is appreciated. Thanks, Arun
Re: 10000+ CF support from Cassandra
I strongly advise against this approach. Jon, I think so too. But so you actually foresee any problems with this approach? I can think of a few. [I want to evaluate if we can live with this problem] Just to be clear, I’m not saying this is a great approach, I AM saying that it may be better than having 1+ CFs, which was the original question (it really depends on the use case which wasn’t well defined)… map size limit may be a problem, and then there is the CQL vs thrift question which could start a flame war; ideally CQL maps should give you the same flexibility as arbitrary thrift columns On Jun 1, 2015, at 9:44 PM, Jonathan Haddad j...@jonhaddad.com wrote: Sorry for this naive question but how important is this tuning? Can this have a huge impact in production? Massive. Here's a graph of when we did some JVM tuning at my previous company: http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png About an order of magnitude difference in performance. Jon On Mon, Jun 1, 2015 at 7:20 PM Arun Chaitanya chaitan64a...@gmail.com mailto:chaitan64a...@gmail.com wrote: Thanks Jon and Jack, I strongly advise against this approach. Jon, I think so too. But so you actually foresee any problems with this approach? I can think of a few. [I want to evaluate if we can live with this problem] No more CQL. No data types, everything needs to be a blob. Limited clustering Keys and default clustering order. First off, different workloads need different tuning. Sorry for this naive question but how important is this tuning? Can this have a huge impact in production? You might want to consider a model where you have an application layer that maps logical tenant tables into partition keys within a single large Casandra table, or at least a relatively small number of Cassandra tables. It will depend on the typical size of your tenant tables - very small ones would make sense within a single partition, while larger ones should have separate partitions for a tenant's data. The key here is that tables are expensive, but partitions are cheap and scale very well with Cassandra. We are actually trying similar approach. But we don't want to expose this to application layer. We are attempting to hide this and provide an API. Finally, you said 10 clusters, but did you mean 10 nodes? You might want to consider a model where you do indeed have multiple clusters, where each handles a fraction of the tenants, since there is no need for separate tenants to be on the same cluster. I meant 10 clusters. We want to split our tables across multiple clusters if above approach is not possible. [But it seems to be very costly] Thanks, On Fri, May 29, 2015 at 5:49 AM, Jack Krupansky jack.krupan...@gmail.com mailto:jack.krupan...@gmail.com wrote: How big is each of the tables - are they all fairly small or fairly large? Small as in no more than thousands of rows or large as in tens of millions or hundreds of millions of rows? Small tables are are not ideal for a Cassandra cluster since the rows would be spread out across the nodes, even though it might make more sense for each small table to be on a single node. You might want to consider a model where you have an application layer that maps logical tenant tables into partition keys within a single large Casandra table, or at least a relatively small number of Cassandra tables. It will depend on the typical size of your tenant tables - very small ones would make sense within a single partition, while larger ones should have separate partitions for a tenant's data. The key here is that tables are expensive, but partitions are cheap and scale very well with Cassandra. Finally, you said 10 clusters, but did you mean 10 nodes? You might want to consider a model where you do indeed have multiple clusters, where each handles a fraction of the tenants, since there is no need for separate tenants to be on the same cluster. -- Jack Krupansky On Tue, May 26, 2015 at 11:32 PM, Arun Chaitanya chaitan64a...@gmail.com mailto:chaitan64a...@gmail.com wrote: Good Day Everyone, I am very happy with the (almost) linear scalability offered by C*. We had a lot of problems with RDBMS. But, I heard that C* has a limit on number of column families that can be created in a single cluster. The reason being each CF stores 1-2 MB on the JVM heap. In our use case, we have about 1+ CF and we want to support multi-tenancy. (i.e 1 * no of tenants) We are new to C* and being from RDBMS background, I would like to understand how to tackle this scenario from your advice. Our plan is to use Off-Heap memtable approach. http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1