Re: Rhombus - A time-series object store for Cassandra
Thanks for the pointer Aaron. Regards, Ananth On 15-Jul-2013, at 8:30 AM, aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote: For those following along at home, recently another project in this space was announced https://github.com/deanhiller/databus Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 13/07/2013, at 4:01 PM, Ananth Gundabattula agundabatt...@threatmetrix.commailto:agundabatt...@threatmetrix.com wrote: Hello Rob, Thanks for the pointer. I have a couple of queries: How does this project compare to the KairosDb project on github ( For one I see that Rhombus supports multi column query which is cool whereas kairos time series DB/OpenTSDB do not seem to have such a feature - although we can use the tags to achieve something similar ? ) Are there any roll ups performed automatically by Rhombus ? Can we control the TTL of the data being inserted ? I am looking at the some of the time series based projects for production use preferably running on top of cassandra and was wondering if Rhombus can be seen as a pure time series optimized schema or something more than that ? Regards, Ananth On 7/12/13 7:15 AM, Rob Righter rob.righ...@pardot.commailto:rob.righ...@pardot.com wrote: Hello, Just wanted to share a project that we have been working on. It's a time-series object store for Cassandra. We tried to generalize the common use cases for storing time-series data in Cassandra and automatically handle the denormalization, indexing, and wide row sharding. It currently exists as a Java Library. We have it deployed as a web service in a Dropwizard app server with a REST style interface. The plan is to eventually release that Dropwizard app too. The project and explanation is available on Github at: https://github.com/Pardot/Rhombus I would love to hear feedback. Many Thanks, Rob
Re: Rhombus - A time-series object store for Cassandra
Hello Rob, Thanks for the pointer. I have a couple of queries: How does this project compare to the KairosDb project on github ( For one I see that Rhombus supports multi column query which is cool whereas kairos time series DB/OpenTSDB do not seem to have such a feature - although we can use the tags to achieve something similar ? ) Are there any roll ups performed automatically by Rhombus ? Can we control the TTL of the data being inserted ? I am looking at the some of the time series based projects for production use preferably running on top of cassandra and was wondering if Rhombus can be seen as a pure time series optimized schema or something more than that ? Regards, Ananth On 7/12/13 7:15 AM, Rob Righter rob.righ...@pardot.com wrote: Hello, Just wanted to share a project that we have been working on. It's a time-series object store for Cassandra. We tried to generalize the common use cases for storing time-series data in Cassandra and automatically handle the denormalization, indexing, and wide row sharding. It currently exists as a Java Library. We have it deployed as a web service in a Dropwizard app server with a REST style interface. The plan is to eventually release that Dropwizard app too. The project and explanation is available on Github at: https://github.com/Pardot/Rhombus I would love to hear feedback. Many Thanks, Rob
Re: Migrating data from 2 node cluster to a 3 node cluster
Hello everybody, The thread below makes me wonder Does RF matter when using sstable loader.? My assumption was that stable loader will take care of RF when the streaming is done but just wanted to cross check. We are currently moving data from a RF=1 to RF=3 cluster by using sstable loader tool. We will of course be running repair on the destination nodes but was wondering how is the following issue resolved using a repair if my understanding is wrong? If the above assumption is wrong and since we are using Sstableloader which streams relevant parts to of each table to the destination cluster, it means the destination folder will only get one copy only (because origin RF =1 ) ? If that is the case, how will a repair resolve when a data chunk from an empty folder is used as the chosen replica to perform repair ( as it possible that two nearest neighbors are empty in the first place ) . Regards, Ananth From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, July 9, 2013 3:24 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Migrating data from 2 node cluster to a 3 node cluster Without vnodes the initial_token is stored in the yaml file, as well as the system LocationInfo CF. With vnodes the only place the tokens are stored is in the system KS. So moving a node without it's system KS will cause it to generate new ones which will mean data is moved around. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 9/07/2013, at 11:23 AM, sankalp kohli kohlisank...@gmail.commailto:kohlisank...@gmail.com wrote: Leaving the system keyspaces behind is OK if you are not using vnodes. Why is it different for vnodes? On Mon, Jul 8, 2013 at 3:37 PM, aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote: This might work for user created keyspaces but might not work for system keyspace Leaving the system keyspaces behind is OK if you are not using vnodes. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.comhttp://www.thelastpickle.com/ On 9/07/2013, at 10:03 AM, sankalp kohli kohlisank...@gmail.commailto:kohlisank...@gmail.com wrote: If RF=N or RFN, you can just copy all SStables to all nodes, watching out for name collision. This might work for user created keyspaces but might not work for system keyspace On Mon, Jul 8, 2013 at 2:07 PM, Robert Coli rc...@eventbrite.commailto:rc...@eventbrite.com wrote: On Fri, Jul 5, 2013 at 7:54 PM, srmore comom...@gmail.commailto:comom...@gmail.com wrote: RF of old and new cluster is the same RF=3. Keyspaces and schema info is also same. You have a cluster where RF=3 and N=2? Does it.. work? What are the tokens of old and new nodes? tokens for old cluster ( 2-node ) If RF=N or RFN, you can just copy all SStables to all nodes, watching out for name collision. =Rob
Re: Errors while upgrading from 1.1.10 version to 1.2.4 version
Thanks for the pointer Fabien. From: Fabien Rousseau fab...@yakaz.commailto:fab...@yakaz.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Friday, June 28, 2013 6:35 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Errors while upgrading from 1.1.10 version to 1.2.4 version Hello, Have a look at : https://issues.apache.org/jira/browse/CASSANDRA-5476 2013/6/28 Ananth Gundabattula agundabatt...@threatmetrix.commailto:agundabatt...@threatmetrix.com Hello Everybody, We were performing an upgrade of our cluster from 1.1.10 version to 1.2.4 . We tested the upgrade process in a QA environment and found no issues. However in the production node, we faced loads of errors and had to abort the upgrade process. I was wondering how we ran into such a situation. The main difference between the QA environment and the production environments is the Replication Factor. In QA , RF=1 and in production RF=3. Example stack traces are as seen on the other nodes are : http://pastebin.com/fSnMAd8q The other observation is that the node which was being upgraded is a seed node in the 1.1.10. We aborted right after the first node gave the above issues. Does this mean that there will be an application downtime required if we go for rolling upgrade on a live cluster from 1.1.10 version to 1.2.4 version ? Regards, Ananth -- Fabien Rousseau [http://www.yakaz.com/img/logo_yakaz_small.png] mailto:aur...@yakaz.comwww.yakaz.comhttp://www.yakaz.com/
Errors while upgrading from 1.1.10 version to 1.2.4 version
Hello Everybody, We were performing an upgrade of our cluster from 1.1.10 version to 1.2.4 . We tested the upgrade process in a QA environment and found no issues. However in the production node, we faced loads of errors and had to abort the upgrade process. I was wondering how we ran into such a situation. The main difference between the QA environment and the production environments is the Replication Factor. In QA , RF=1 and in production RF=3. Example stack traces are as seen on the other nodes are : http://pastebin.com/fSnMAd8q The other observation is that the node which was being upgraded is a seed node in the 1.1.10. We aborted right after the first node gave the above issues. Does this mean that there will be an application downtime required if we go for rolling upgrade on a live cluster from 1.1.10 version to 1.2.4 version ? Regards, Ananth
Re: Upgrade from 1.1.10 to 1.2.4
Hello Rob, I ran into the stack trace when the situation was : num_tokens unset ( by this I mean not specifying anything ) and initial_token set to some value. I was initially under the impression that specifying num_tokens will over ride the initial_token value and hence left num_tokens blank. I was able to get past that exception only when num_token was specified with a value of 1. Regards, Ananth On 6/25/13 3:27 AM, Robert Coli rc...@eventbrite.com wrote: On Sun, Jun 23, 2013 at 2:31 AM, Ananth Gundabattula agundabatt...@threatmetrix.com wrote: Looks like the cause of the error was because of not specifying num_tokens in the cassandra.yaml file. I was under the impression that setting a value of num_tokens will override the initial_token value . Looks like we need to set num_tokens to 1 to get around this error. Not specifying anything causes the above error. My understanding is that the 1.2.x behavior here is : 1) initial_token set, num_tokens set = cassandra picks the num_tokens value, ignores initial_token 2) initial_token unset, num_tokens unset = cassandra (until 2.0) picks a single token via range bisection 3) initial_token unset, num_tokens set = cassandra uses num_tokens number of vnodes Are you saying this is not the behavior you saw? =Rob
Re: Upgrade from 1.1.10 to 1.2.4
Looks like the cause of the error was because of not specifying num_tokens in the cassandra.yaml file. I was under the impression that setting a value of num_tokens will override the initial_token value . Looks like we need to set num_tokens to 1 to get around this error. Not specifying anything causes the above error. Regards, Ananth From: Ananth Gundabattula agundabatt...@threatmetrix.commailto:agundabatt...@threatmetrix.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Sunday, June 23, 2013 1:25 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Upgrade from 1.1.10 to 1.2.4 Hello everybody, I am trying to perform a rolling upgrade from 1.1.10 to 1.2.4 ( with two patches to 1.2.4 https://issues.apache.org/jira/browse/CASSANDRA-5554 https://issues.apache.org/jira/browse/CASSANDRA-5418 as they might effect us in production ) I was wondering if anyone was able to perform a successful rolling upgrade from 1.1.10 to 1.2.4? I tried both a rolling upgrade while other nodes are on 1.1.10 version and also while all nodes in the cluster were shutdown and just the new version cassandra node coming up. The 1.1.10 version nodes see the 1.2.4 version node up but the 1.2.4 version node crashes a few seconds after the start up. I see the following exception in the logs when the 1.2.4 node starts up. …… INFO 03:03:26,399 Log replay complete, 13 replayed mutations INFO 03:03:26,631 Cassandra version: 1.2.4-SNAPSHOT INFO 03:03:26,631 Thrift API version: 19.35.0 INFO 03:03:26,632 CQL supported versions: 2.0.0,3.0.1 (default: 3.0.1) INFO 03:03:26,660 Starting up server gossip INFO 03:03:26,671 Enqueuing flush of Memtable-local@1284117703(253/253 serialized/live bytes, 9 ops) INFO 03:03:26,672 Writing Memtable-local@1284117703(253/253 serialized/live bytes, 9 ops) INFO 03:03:26,676 Completed flushing /data/cassandra/data/system/local/system-local-ib-4-Data.db (250 bytes) for commitlog position ReplayPosition(segmentId=1371956606055, position=50387) INFO 03:03:26,684 Compacting [SSTableReader(path='/data/cassandra/data/system/local/system-local-ib-3-Data.db'), SSTableReader(path='/data/cassandra/data/system/local/system-local-ib-2-Data.db'), SSTableReader(path='/data/cassandra/data/system/local/system-local-ib-4-Data.db'), SSTableReader(path='/data/cassandra/data/system/local/system-local-ib-1-Data.db')] INFO 03:03:26,706 Compacted 4 sstables to [/data/cassandra/data/system/local/system-local-ib-5,]. 852 bytes to 457 (~53% of original) in 19ms = 0.022938MB/s. 4 total rows, 1 unique. Row merge counts were {1:0, 2:0, 3:0, 4:1, } INFO 03:03:26,769 Starting Messaging Service on port 7000 ERROR 03:03:26,842 Exception encountered during startup java.lang.NullPointerException at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:716) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:542) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:439) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:323) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:411) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:454) java.lang.NullPointerException at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:716) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:542) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:439) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:323) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:411) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:454) Exception encountered during startup: null ERROR 03:03:26,848 Exception in thread Thread[StorageServiceShutdownHook,5,main] java.lang.NullPointerException at org.apache.cassandra.service.StorageService.stopRPCServer(StorageService.java:321) at org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:507) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.lang.Thread.run(Thread.java:722) Regards, Ananth
Merge two clusters into one - renaming an existing cluster
Hello Everybody, I am trying to merge two clusters into a single cluster ( the rationale being easier administration apart from better load balancing etc) The plan is to rename a cluster (QAPERF1) as the same name as the second cluster (QAPERF2). Then alter the cassandra-toppology.properties and make them appear as different Dcs. Then finally alter replication settings and rebuild nodes of course after changing the seeds. It is made sure that the schema is same across the two clusters. This is a test on apache cassandra 1.2.4. In the process of renaming an existing cluster, I have followed the instructions here : http://wiki.apache.org/cassandra/FAQ#clustername_mismatch I get the following when restarting the node after restarting the first node after cluster name change ( The other nodes are yet to be restarted). It looks like the old cluster name has not taken into effect in spite of completing the flush as mentioned in the wiki. ERROR [main] 2013-06-24 04:44:35,812 CassandraDaemon.java (line 222) Fatal exception during initialization org.apache.cassandra.exceptions.ConfigurationException: Saved cluster name QAPERF1 != configured name QAPERF2 at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:447) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:218) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:411) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:454) In the process of reverting it back, I changed the configuration file back to have the old cluster name and now I get this exception. ERROR [main] 2013-06-24 04:48:34,746 CassandraDaemon.java (line 428) Exception encountered during startup java.util.NoSuchElementException at java.util.ArrayList$Itr.next(ArrayList.java:794) at org.apache.cassandra.db.SystemTable.upgradeSystemData(SystemTable.java:164) at org.apache.cassandra.db.SystemTable.finishStartup(SystemTable.java:98) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:317) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:411) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:454) Can experts please advise what is the best way to rename a cluster in case I want to change it for version 1.2.4 ? Thanks for your time. Regards, Ananth
Upgrade from 1.1.10 to 1.2.4
Hello everybody, I am trying to perform a rolling upgrade from 1.1.10 to 1.2.4 ( with two patches to 1.2.4 https://issues.apache.org/jira/browse/CASSANDRA-5554 https://issues.apache.org/jira/browse/CASSANDRA-5418 as they might effect us in production ) I was wondering if anyone was able to perform a successful rolling upgrade from 1.1.10 to 1.2.4? I tried both a rolling upgrade while other nodes are on 1.1.10 version and also while all nodes in the cluster were shutdown and just the new version cassandra node coming up. The 1.1.10 version nodes see the 1.2.4 version node up but the 1.2.4 version node crashes a few seconds after the start up. I see the following exception in the logs when the 1.2.4 node starts up. …… INFO 03:03:26,399 Log replay complete, 13 replayed mutations INFO 03:03:26,631 Cassandra version: 1.2.4-SNAPSHOT INFO 03:03:26,631 Thrift API version: 19.35.0 INFO 03:03:26,632 CQL supported versions: 2.0.0,3.0.1 (default: 3.0.1) INFO 03:03:26,660 Starting up server gossip INFO 03:03:26,671 Enqueuing flush of Memtable-local@1284117703(253/253 serialized/live bytes, 9 ops) INFO 03:03:26,672 Writing Memtable-local@1284117703(253/253 serialized/live bytes, 9 ops) INFO 03:03:26,676 Completed flushing /data/cassandra/data/system/local/system-local-ib-4-Data.db (250 bytes) for commitlog position ReplayPosition(segmentId=1371956606055, position=50387) INFO 03:03:26,684 Compacting [SSTableReader(path='/data/cassandra/data/system/local/system-local-ib-3-Data.db'), SSTableReader(path='/data/cassandra/data/system/local/system-local-ib-2-Data.db'), SSTableReader(path='/data/cassandra/data/system/local/system-local-ib-4-Data.db'), SSTableReader(path='/data/cassandra/data/system/local/system-local-ib-1-Data.db')] INFO 03:03:26,706 Compacted 4 sstables to [/data/cassandra/data/system/local/system-local-ib-5,]. 852 bytes to 457 (~53% of original) in 19ms = 0.022938MB/s. 4 total rows, 1 unique. Row merge counts were {1:0, 2:0, 3:0, 4:1, } INFO 03:03:26,769 Starting Messaging Service on port 7000 ERROR 03:03:26,842 Exception encountered during startup java.lang.NullPointerException at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:716) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:542) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:439) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:323) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:411) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:454) java.lang.NullPointerException at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:716) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:542) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:439) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:323) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:411) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:454) Exception encountered during startup: null ERROR 03:03:26,848 Exception in thread Thread[StorageServiceShutdownHook,5,main] java.lang.NullPointerException at org.apache.cassandra.service.StorageService.stopRPCServer(StorageService.java:321) at org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:507) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.lang.Thread.run(Thread.java:722) Regards, Ananth
What is the effect of reducing the thrift message sizes on GC
We are currently running on 1.1.10 and planning to migrate to a higher version 1.2.4. The question pertains to tweaking all the knobs to reduce GC related issues ( we have been fighting a lot of really bad GC issues on 1.1.10 and met with little success all the way using 1.1.10) Taking into consideration GC tuning is a black art, I was wondering if we can have some good effect on the GC by tweaking the following settings: *thrift_framed_transport_size_in_mb thrift_max_message_length_in_mb* * * Our system is a very short column (both in number of columns and data sizes ) tables but having millions/billions of rows in each column family. The typical number of columns in each column family is 4. The typical lookup involves specifying the row key and fetching one column most of the times. The writes are also similar except for one keyspace where the number of columns are 50 but very small data sizes per column. Assuming we can tweak the config values : * * * thrift_framed_transport_size_in_mb * * thrift_max_message_length_in_mb * to lower values in the above context, I was wondering if it helps in the GC being invoked less if the thrift settings reflect our data model reads and writes ? For example: What is the impact by reducing the above config values on the GC to say 1 mb rather than say 15 or 16 ? Thanks a lot for your inputs and thoughts. Regards, Ananth
Re: What is the effect of reducing the thrift message sizes on GC
Thanks Aaron for the insight. One quick question: The buffers are not pre allocated, but once they are allocated they are not returned. So it's only an issue if have lots of clients connecting and reading a lot of data. So to understand you correctly, the buffer is allocated per client connection and remains all the while during the JVM and is reused for each request ? If that is the case, then I am presuming there is no much gain by playing around with this config with respect to optimizing for Gcs. reduce bloom filters, index intervals Š. Well we have tried all the configs as advised below (and others like key cache sizes etc ) and hit a dead end and that is the reason for a 1.2.4 move. Thanks for all your thoughts and advice on this. Regards, Ananth On 6/18/13 5:56 PM, aaron morton aa...@thelastpickle.com wrote: *thrift_framed_transport_size_in_mb thrift_max_message_length_in_mb* This control the max size of a bugger allocated by thrift when processing requests / responses. The buffers are not pre allocated, but once they are allocated they are not returned. So it's only an issue if have lots of clients connecting and reading a lot of data. Our system is a very short column (both in number of columns and data sizes ) tables but having millions/billions of rows in each column family. If you have over 500 million rows per node you may be running into issues with the bloom filters and index samples. This typically looks like the heap usage does not reduce after CMS compaction has completed. Ensure the bloom_file_fp_chance on the CF's is set to 0.01 for size tiered compaction and 0.1 for levelled compaction. If you need to change it run nodetool upgradesstables Then consider increasing the index_interval in the yaml file, see the comments. Note that v 1.2 moves the bloom filters off heap, so if you upgrade to 1.2 it will probably resolve your issues. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 18/06/2013, at 7:30 PM, Ananth Gundabattula agundabatt...@threatmetrix.com wrote: We are currently running on 1.1.10 and planning to migrate to a higher version 1.2.4. The question pertains to tweaking all the knobs to reduce GC related issues ( we have been fighting a lot of really bad GC issues on 1.1.10 and met with little success all the way using 1.1.10) Taking into consideration GC tuning is a black art, I was wondering if we can have some good effect on the GC by tweaking the following settings: *thrift_framed_transport_size_in_mb thrift_max_message_length_in_mb* * * Our system is a very short column (both in number of columns and data sizes ) tables but having millions/billions of rows in each column family. The typical number of columns in each column family is 4. The typical lookup involves specifying the row key and fetching one column most of the times. The writes are also similar except for one keyspace where the number of columns are 50 but very small data sizes per column. Assuming we can tweak the config values : * * * thrift_framed_transport_size_in_mb * * thrift_max_message_length_in_mb * to lower values in the above context, I was wondering if it helps in the GC being invoked less if the thrift settings reflect our data model reads and writes ? For example: What is the impact by reducing the above config values on the GC to say 1 mb rather than say 15 or 16 ? Thanks a lot for your inputs and thoughts. Regards, Ananth
Re: Query regarding SSTable timestamps and counts
Thanks a lot Aaron and Edward. The mail thread clarifies some things for me. For letting others know on this thread, running an upgradesstables did decrease our bloom filter false positive ratios a lot. ( upgradesstables was run not to upgrade from a casasndra version to a higher cassandra version but because of all the node movement we had done to upgrade our cluster in a staggered way with aborted attempts in between and I understand that upgradesstables was not necessarily required for the high bloom filter false positives rates we were seeing ) Regards, Ananth On Wed, Nov 21, 2012 at 9:45 AM, Edward Capriolo edlinuxg...@gmail.comwrote: On Tue, Nov 20, 2012 at 5:23 PM, aaron morton aa...@thelastpickle.com wrote: My understanding of the compaction process was that since data files keep continuously merging we should not have data files with very old last modified timestamps It is perfectly OK to have very old SSTables. But performing an upgradesstables did decrease the number of data files and removed all the data files with the old timestamps. upgradetables re-writes every sstable to have the same contents in the newest format. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 19/11/2012, at 4:57 PM, Ananth Gundabattula agundabatt...@gmail.com wrote: Hello Aaron, Thanks a lot for the reply. Looks like the documentation is confusing. Here is the link I am referring to: http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction It does not disable compaction. As per the above url, After running a major compaction, automatic minor compactions are no longer triggered, frequently requiring you to manually run major compactions on a routine basis. ( Just before the heading Tuning Column Family compression in the above link) With respect to the replies below : it creates one big file, which will not be compacted until there are (by default) 3 other very big files. This is for the minor compaction and major compaction should theoretically result in one large file irrespective of the number of data files initially? This is not something you have to worry about. Unless you are seeing 1,000's of files using the default compaction. Well my worry has been because of the large amount of node movements we have done in the ring. We started off with 6 nodes and increased the capacity to 12 with disproportionate increases every time which resulted in a lot of clean of data folders except system, run repair and then a cleanup with an aborted attempt in between. There were some data.db files older by more than 2 weeks and were not modified since then. My understanding of the compaction process was that since data files keep continuously merging we should not have data files with very old last modified timestamps (assuming there is a good amount of writes to the table continuously) I did not have a for sure way of telling if everything is alright with the compaction looking at the last modified timestamps of all the data.db files. What are the compaction issues you are having ? Your replies confirm that the timestamps should not be an issue to worry about. So I guess I should not be calling them as issues any more. But performing an upgradesstables did decrease the number of data files and removed all the data files with the old timestamps. Regards, Ananth On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.com wrote: As per datastax documentation, a manual compaction forces the admin to start compaction manually and disables the automated compaction (atleast for major compactions but not minor compactions ) It does not disable compaction. it creates one big file, which will not be compacted until there are (by default) 3 other very big files. 1. Does a nodetool stop compaction also force the admin to manually run major compaction ( I.e. disable automated major compactions ? ) No. Stop just stops the current compaction. Nothing is disabled. 2. Can a node restart reset the automated major compaction if a node gets into a manual mode compaction for whatever reason ? Major compaction is not automatic. It is the manual nodetool compact command. Automatic (minor) compaction is controlled by min_compaction_threshold and max_compaction_threshold (for the default compaction strategy). 3. What is the ideal number of SSTables for a table in a keyspace ( I mean are there any indicators as to whether my compaction is alright or not ? ) This is not something you have to worry about. Unless you are seeing 1,000's of files using the default compaction. For example, I have seen SSTables on the disk more than 10 days old wherein there were other SSTables belonging to the same table but much
Re: Query regarding SSTable timestamps and counts
Hello Aaron, Thanks a lot for the reply. Looks like the documentation is confusing. Here is the link I am referring to: http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction It does not disable compaction. As per the above url, After running a major compaction, automatic minor compactions are no longer triggered, frequently requiring you to manually run major compactions on a routine basis. ( Just before the heading Tuning Column Family compression in the above link) With respect to the replies below : it creates one big file, which will not be compacted until there are (by default) 3 other very big files. This is for the minor compaction and major compaction should theoretically result in one large file irrespective of the number of data files initially? This is not something you have to worry about. Unless you are seeing 1,000's of files using the default compaction. Well my worry has been because of the large amount of node movements we have done in the ring. We started off with 6 nodes and increased the capacity to 12 with disproportionate increases every time which resulted in a lot of clean of data folders except system, run repair and then a cleanup with an aborted attempt in between. There were some data.db files older by more than 2 weeks and were not modified since then. My understanding of the compaction process was that since data files keep continuously merging we should not have data files with very old last modified timestamps (assuming there is a good amount of writes to the table continuously) I did not have a for sure way of telling if everything is alright with the compaction looking at the last modified timestamps of all the data.db files. What are the compaction issues you are having ? Your replies confirm that the timestamps should not be an issue to worry about. So I guess I should not be calling them as issues any more. But performing an upgradesstables did decrease the number of data files and removed all the data files with the old timestamps. Regards, Ananth On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.comwrote: As per datastax documentation, a manual compaction forces the admin to start compaction manually and disables the automated compaction (atleast for major compactions but not minor compactions ) It does not disable compaction. it creates one big file, which will not be compacted until there are (by default) 3 other very big files. 1. Does a nodetool stop compaction also force the admin to manually run major compaction ( I.e. disable automated major compactions ? ) No. Stop just stops the current compaction. Nothing is disabled. 2. Can a node restart reset the automated major compaction if a node gets into a manual mode compaction for whatever reason ? Major compaction is not automatic. It is the manual nodetool compact command. Automatic (minor) compaction is controlled by min_compaction_threshold and max_compaction_threshold (for the default compaction strategy). 3. What is the ideal number of SSTables for a table in a keyspace ( I mean are there any indicators as to whether my compaction is alright or not ? ) This is not something you have to worry about. Unless you are seeing 1,000's of files using the default compaction. For example, I have seen SSTables on the disk more than 10 days old wherein there were other SSTables belonging to the same table but much younger than the older SSTables ( No problems. 4. Does a upgradesstables fix any compaction issues ? What are the compaction issues you are having ? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 18/11/2012, at 1:18 AM, Ananth Gundabattula agundabatt...@gmail.com wrote: We have a cluster running cassandra 1.1.4. On this cluster, 1. We had to move the nodes around a bit when we were adding new nodes (there was quite a good amount of node movement ) 2. We had to stop compactions during some of the days to save some disk space on some of the nodes when they were running very very low on disk spaces. (via nodetool stop COMPACTION) As per datastax documentation, a manual compaction forces the admin to start compaction manually and disables the automated compaction (atleast for major compactions but not minor compactions ) Here are the questions I have regarding compaction: 1. Does a nodetool stop compaction also force the admin to manually run major compaction ( I.e. disable automated major compactions ? ) 2. Can a node restart reset the automated major compaction if a node gets into a manual mode compaction for whatever reason ? 3. What is the ideal number of SSTables for a table in a keyspace ( I mean are there any indicators as to whether my compaction is alright or not ? ) . For example, I have seen SSTables on the disk more than 10 days old wherein there were other SSTables belonging
Re: read request distribution
Hi all, On an unrelated observation of the below readings, it looks like all the 3 nodes own 100% of the data. This confuses me a bit. We have a 12 node cluster with RF=3 but the effective ownership is shown as 8.33 % . So here is my question. How is the ownership calculated : Is Replica factor considered in the ownership calculation ? ( If yes , then 8.33 % ownership of a cluster seems wrong to me . If not 100% ownership for a node cluster seems wrong to me. Am I missing something in the calculation? Regards, Ananth On Fri, Nov 9, 2012 at 4:37 PM, Wei Zhu wz1...@yahoo.com wrote: Hi All, I am doing a benchmark on a Cassandra. I have a three node cluster with RF=3. I generated 6M rows with sequence number from 1 to 6m, so the rows should be evenly distributed among the three nodes disregarding the replicates. I am doing a benchmark with read only requests, I generate read request for randomly generated keys from 1 to 6M. Oddly, nodetool cfstats, reports that one node has only half the requests as the other one and the third node sits in the middle. So the ratio is like 2:3:4. The node with the most read requests actually has the smallest latency and the one with the least read requests reports the largest latency. The difference is pretty big, the fastest is almost double the slowest. All three nodes have the exactly the same hardware and the data size on each node are the same since the RF is three and all of them have the complete data. I am using Hector as client and the random read request are in millions. I can't think of a reasonable explanation. Can someone please shed some lights? Thanks. -Wei
Re: configure KeyCahce to use Non-Heap memory ?
Hello Aaron, Thanks a lot for the response. Raised a request https://issues.apache.org/jira/browse/CASSANDRA-4619 Here is the nodetool dump: (from one of the two nodes in the cluster) Token: 0 Gossip active: true Thrift active: true Load : 147.64 GB Generation No: 1346635362 Uptime (seconds) : 182707 Heap Memory (MB) : 4884.33 / 8032.00 Data Center : datacenter1 Rack : rack1 Exceptions : 0 Key Cache: size 777651120 (bytes), capacity 777651120 (bytes), 44354999 hits, 98275175 requests, 0.451 recent hit rate, 14400 save period in seconds Row Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds Number of rows in the 2 node cluster is 74+ Million Regards, Ananth From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, September 5, 2012 11:33 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: configure KeyCahce to use Non-Heap memory ? Is there any way I can configure KeyCahce to use Non-Heap memory ? No. You could add a feature request here https://issues.apache.org/jira/browse/CASSANDRA Could you post some stats on the current key cache size and hit rate ? (from nodetool info) It would be interesting to know how many keys it contains Vs the number of rows on the box and the hit rate. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/09/2012, at 3:01 PM, Ananth Gundabattula agundabatt...@threatmetrix.commailto:agundabatt...@threatmetrix.com wrote: Is there any way I can configure KeyCahce to use Non-Heap memory ? We have large memory nodes : ~96GB memory per node and effectively using only 8 GB configured for heap ( to avoid GC issues because of a large heap) We have a constraint with respect to : 1. Row cache models don't reflect our data query patterns and hence can only optimize on the key cache 2. Time constrained to change our schema to be more NO-SQL specific Regards, Ananth
configure KeyCahce to use Non-Heap memory ?
Is there any way I can configure KeyCahce to use Non-Heap memory ? We have large memory nodes : ~96GB memory per node and effectively using only 8 GB configured for heap ( to avoid GC issues because of a large heap) We have a constraint with respect to : 1. Row cache models don't reflect our data query patterns and hence can only optimize on the key cache 2. Time constrained to change our schema to be more NO-SQL specific Regards, Ananth