Re: tuning concurrent_reads param
On Wed, Nov 5, 2014 at 11:00 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: Sorry I have late follow up question In the Cassandra.yaml file the concurrent_read section has the following comment: What does it mean by the operations to enqueue low enough in the stack that the OS and drives can reorder them. ? how does it help making the system healthy? The operating system, disk controllers, and disks themselves can merge and reorder requests to optimize performance. Here's a relevant page with some details if you're interested in more http://www.makelinux.net/books/lkd2/ch13lev1sec5 What really happen if we increase it to a too high value? (maybe affecting other read or write operation as it eat up all disk IO resource?) Yes -Bryan
Re: new data not flushed to sstables
On Mon, Nov 3, 2014 at 7:44 AM, Sebastian Martinka sebastian.marti...@mercateo.com wrote: System and Keyspace Information: 4 Nodes CREATE KEYSPACE restore_test WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '3'}; I assumed, that a flush write all data in the sstables and we can use it for backup and restore. Did I forget something or is my understanding wrong? I think you forgot that with N=4 and RF=3 that each node will contain approximately 75% of the data. From a quick eyeball check of the json-dump you provided, it looks like partition-key values are contained on 3 nodes and are absent from 1 which is exactly as expected. -Bryan
Re: OldGen saturation
On Tue, Oct 28, 2014 at 9:02 AM, Adria Arcarons adria.arcar...@greenpowermonitor.com wrote: Hi, Hi We have about 50.000 CFs of varying size The writing test consists of a continuous flow of inserts. The inserts are done inside BATCH statements in groups of 1.000 to a single CF at a time to make them faster. The problem I’m experiencing is that, eventually, when the script has been running for almost 40mins, the heap gets saturated. OldGen gets full and then there is an intensive GC activity trying to free OldGen objects, but it can only free very little space in each pass. Then GC saturates the CPU. Here are the graphs obtained with VisualVM that show this behavior: My total heap size is 1GB and the the NewGen region of 256MB. The C* node has 4GB RAM. Intel Xeon CPU E5520 @ Without looking at your VM graphs, I'm going to go out on a limb here and say that your host is woefully underpowered to host fifty-thousand column families and batch writes of one-thousand statements. A 1 GB java heap size is sometimes acceptable for a unit test or playing around with but you can't actually expect it to be adequate for a load test can you? Every CF consumes some permanent heap space for its metadata. Too many CF are a bad thing. You probably have ten times more CF than would be recommended as an upper limit. -Bryan
Re: Repair taking long time
With a 4.5 TB table and just 4 nodes, repair will likely take forever for any version. -Bryan On Fri, Sep 26, 2014 at 10:40 AM, Jonathan Haddad j...@jonhaddad.com wrote: Are you using Cassandra 2.0 vnodes? If so, repair takes forever. This problem is addressed in 2.1. On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux gene.robich...@match.com wrote: I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in another. Running a repair on a large column family seems to be moving much slower than I expect. Looking at nodetool compaction stats it indicates the Validation phase is running that the total bytes is 4.5T (4505336278756). This is a very large CF. The process has been running for 2.5 hours and has processed 71G (71950433062). That rate is about 28.4 GB per hour. At this rate it will take 158 hours, just shy of 1 week. Is this reasonable? This is my first large repair and I am wondering if this is normal for a CF of this size. Seems like a long time to me. Is it possible to tune this process to speed it up? Is there something in my configuration that could be causing this slow performance? I am running HDDs, not SSDs in a JBOD configuration. Gene Robichaux Manager, Database Operations Match.com 8300 Douglas Avenue I Suite 800 I Dallas, TX 75225 Phone: 214-576-3273 -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade
Re: Query first 1 columns for each partitioning keys in CQL?
I think there are several issues in your schema and queries. First, the schema can't efficiently return the single newest post for every author. It can efficiently return the newest N posts for a particular author. On Fri, May 16, 2014 at 11:53 PM, 後藤 泰陽 matope@gmail.com wrote: But I consider LIMIT to be a keyword to limits result numbers from WHOLE results retrieved by the SELECT statement. This is happening due to the incorrect use of minTimeuuid() function. All of your created_at values are equal so you're essentially getting 2 (order not defined) values that have the lowest created_at value. The minTimeuuid() function is mean to be used in the WHERE clause of a SELECT statement often with maxTimeuuid() to do BETWEEN sort of queries on timeuuid values. The result with SELECT.. LIMIT is below. Unfortunately, This is not what I wanted. I wante latest posts of each authors. (Now I doubt if CQL3 can't represent it) cqlsh:blog_test create table posts( ... author ascii, ... created_at timeuuid, ... entry text, ... primary key(author,created_at) ... )WITH CLUSTERING ORDER BY (created_at DESC); cqlsh:blog_test cqlsh:blog_test insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by john'); cqlsh:blog_test insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by john'); cqlsh:blog_test insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by mike'); cqlsh:blog_test insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by mike'); cqlsh:blog_test select * from posts limit 2; author | created_at | entry +--+-- mike | 1c4d9000-83e9-11e2-8080-808080808080 | This is a new entry by mike mike | 4e52d000-6d1f-11e2-8080-808080808080 | This is an old entry by mike To get most recent posts by a particular author, you'll need statements more like this: cqlsh:test insert into posts(author,created_at,entry) values ('john',now(),'This is an old entry by john'); cqlsh:test insert into posts(author,created_at,entry) values ('john',now(),'This is a new entry by john'); cqlsh:test insert into posts(author,created_at,entry) values ('mike',now(),'This is an old entry by mike'); cqlsh:test insert into posts(author,created_at,entry) values ('mike',now(),'This is a new entry by mike'); and then you can get posts by 'john' ordered by newest to oldest as: cqlsh:test select author, created_at, dateOf(created_at), entry from posts where author = 'john' limit 2 ; author | created_at | dateOf(created_at) | entry +--+--+-- john | 7cb1ac30-df85-11e3-bb46-4d2d68f17aa6 | 2014-05-19 11:43:36-0700 | This is a new entry by john john | 74bb6750-df85-11e3-bb46-4d2d68f17aa6 | 2014-05-19 11:43:23-0700 | This is an old entry by john -Bryan
Re: Best partition type for Cassandra with JBOD
For XFS, using noatime and nodirtime isn't really useful either. http://xfs.org/index.php/XFS_FAQ#Q:_Is_using_noatime_or.2Fand_nodiratime_at_mount_time_giving_any_performance_benefits_in_xfs_.28or_not_using_them_performance_decrease.29.3F On Sat, May 17, 2014 at 7:52 AM, James Campbell ja...@breachintelligence.com wrote: Thanks for the thoughts! On May 16, 2014 4:23 PM, Ariel Weisberg ar...@weisberg.ws wrote: Hi, Recommending nobarrier (mount option barrier=0) when you don't know if a non-volatile cache in play is probably not the way to go. A non-volatile cache will typically ignore write barriers if a given block device is configured to cache writes anyways. I am also skeptical you will see a boost in performance. Applications that want to defer and batch writes won't emit write barriers frequently and when they do it's because the data has to be there. Filesystems depend on write barriers although it is surprisingly hard to get a reordering that is really bad because of the way journals are managed. Cassandra uses log structured storage and supports asynchronous periodic group commit so it doesn't need to emit write barriers frequently. Setting read ahead to zero on an SSD is necessary to get the maximum number of random reads, but will also disable prefetching for sequential reads. You need a lot less prefetching with an SSD due to the much faster response time, but it's still many microseconds. Someone with more Cassandra specific knowledge can probably give better advice as to when a non-zero read ahead make sense with Cassandra. This is something may be workload specific as well. Regards, Ariel On Fri, May 16, 2014, at 01:55 PM, Kevin Burton wrote: That and nobarrier… and probably noop for the scheduler if using SSD and setting readahead to zero... On Fri, May 16, 2014 at 10:29 AM, James Campbell ja...@breachintelligence.com wrote: Hi all— What partition type is best/most commonly used for a multi-disk JBOD setup running Cassandra on CentOS 64bit? The datastax production server guidelines recommend XFS for data partitions, saying, “Because Cassandra can use almost half your disk space for a single file, use XFS when using large disks, particularly if using a 32-bit kernel. XFS file size limits are 16TB max on a 32-bit kernel, and essentially unlimited on 64-bit.” However, the same document also notes that “Maximum recommended capacity for Cassandra 1.2 and later is 3 to 5TB per node,” which makes me think 16TB file sizes would be irrelevant (especially when not using RAID to create a single large volume). What has been the experience of this group? I also noted that the guidelines don’t mention setting noatime and nodiratime flags in the fstab for data volumes, but I wonder if that’s a common practice. James -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- Bryan Talbot Architect / Platform team lead, Aeria Games and Entertainment Silicon Valley | Berlin | Tokyo | Sao Paulo
Re: Index with same Name but different keyspace
On Mon, May 19, 2014 at 6:39 AM, mahesh rajamani rajamani.mah...@gmail.comwrote: Sorry I just realized the table name in 2 schema are slightly different, but still i am not sure why i should not use same index name across different schema. Below is the instruction to reproduce. Created 2 keyspace using cassandra-cli [default@unknown] create keyspace keyspace1 with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options={replication_factor:1}; [default@unknown] create keyspace keyspace2 with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options={replication_factor:1}; Create table index using cqlsh as below: cqlsh use keyspace1; cqlsh:keyspace1 CREATE TABLE table1 (version text, flag boolean, primary key (version)); cqlsh:keyspace1 create index sversionindex on table1(flag); cqlsh:keyspace1 use keyspace2; cqlsh:keyspace2 CREATE TABLE table2 (version text, flag boolean, primary key (version)); cqlsh:keyspace2 create index sversionindex on table2(flag); *Bad Request: Duplicate index name sversionindex* Since index name is optional in the create index statement, you could just omit it and let the system give it a unique name for you. -Bryan
Failed to mkdirs $HOME/.cassandra
How should nodetool command be run as the user nobody? The nodetool command fails with an exception if it cannot create a .cassandra directory in the current user's home directory. I'd like to schedule some nodetool commands to run with least privilege as cron jobs. I'd like to run them as the nobody user -- which typically has / as the home directory -- since that's what the user is typically used for (minimum privileges). None of the methods described in this JIRA actually seem to work (with 2.0.7 anyway) https://issues.apache.org/jira/browse/CASSANDRA-6475 Testing as a normal user with no write permissions to the home directory (to simulate the nobody user) [vagrant@local-dev ~]$ nodetool version ReleaseVersion: 2.0.7 [vagrant@local-dev ~]$ rm -rf .cassandra/ [vagrant@local-dev ~]$ chmod a-w . [vagrant@local-dev ~]$ nodetool flush my_ks my_cf Exception in thread main FSWriteError in /home/vagrant/.cassandra at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) at org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690) at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra ... 4 more [vagrant@local-dev ~]$ HOME=/tmp nodetool flush my_ks my_cf Exception in thread main FSWriteError in /home/vagrant/.cassandra at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) at org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690) at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra ... 4 more [vagrant@local-dev ~]$ env HOME=/tmp nodetool flush my_ks my_cf Exception in thread main FSWriteError in /home/vagrant/.cassandra at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) at org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690) at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra ... 4 more [vagrant@local-dev ~]$ env user.home=/tmp nodetool flush my_ks my_cf Exception in thread main FSWriteError in /home/vagrant/.cassandra at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) at org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690) at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra ... 4 more [vagrant@local-dev ~]$ nodetool -Duser.home=/tmp flush my_ks my_cf Unrecognized option: -Duser.home=/tmp usage: java org.apache.cassandra.tools.NodeCmd --host arg command ...
Re: Cassandra 2.0.7 always failes due to 'too may open files' error
Running # cat /proc/$(cat /var/run/cassandra.pid)/limits as root or your cassandra user will tell you what limits it's actually running with. On Sun, May 4, 2014 at 10:12 PM, Yatong Zhang bluefl...@gmail.com wrote: I am running 'repair' when the error occurred. And just a few days before I changed the compaction strategy to 'leveled'. don know if this helps On Mon, May 5, 2014 at 1:10 PM, Yatong Zhang bluefl...@gmail.com wrote: Cassandra is running as root [root@storage5 ~]# ps aux | grep java root 1893 42.0 24.0 7630664 3904000 ? Sl 10:43 60:01 java -ea -javaagent:/mydb/cassandra/bin/../lib/jamm-0.2.5.jar -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms3959M -Xmx3959M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-pidfile=/var/run/cassandra.pid -cp /mydb/cassandra/bin/../conf:/mydb/cassandra/bin/../build/classes/main:/mydb/cassandra/bin/../build/classes/thrift:/mydb/cassandra/bin/../lib/antlr-3.2.jar:/mydb/cassandra/bin/../lib/apache-cassandra-2.0.7.jar:/mydb/cassandra/bin/../lib/apache-cassandra-clientutil-2.0.7.jar:/mydb/cassandra/bin/../lib/apache-cassandra-thrift-2.0.7.jar:/mydb/cassandra/bin/../lib/commons-cli-1.1.jar:/mydb/cassandra/bin/../lib/commons-codec-1.2.jar:/mydb/cassandra/bin/../lib/commons-lang3-3.1.jar:/mydb/cassandra/bin/../lib/compress-lzf-0.8.4.jar:/mydb/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.3.jar:/mydb/cassandra/bin/../lib/disruptor-3.0.1.jar:/mydb/cassandra/bin/../lib/guava-15.0.jar:/mydb/cassandra/bin/../lib/high-scale-lib-1.1.2.jar:/mydb/cassandra/bin/../lib/jackson-core-asl-1.9.2.jar:/mydb/cassandra/bin/../lib/jackson-mapper-asl-1.9.2.jar:/mydb/cassandra/bin/../lib/jamm-0.2.5.jar:/mydb/cassandra/bin/../lib/jbcrypt-0.3m.jar:/mydb/cassandra/bin/../lib/jline-1.0.jar:/mydb/cassandra/bin/../lib/json-simple-1.1.jar:/mydb/cassandra/bin/../lib/libthrift-0.9.1.jar:/mydb/cassandra/bin/../lib/log4j-1.2.16.jar:/mydb/cassandra/bin/../lib/lz4-1.2.0.jar:/mydb/cassandra/bin/../lib/metrics-core-2.2.0.jar:/mydb/cassandra/bin/../lib/netty-3.6.6.Final.jar:/mydb/cassandra/bin/../lib/reporter-config-2.1.0.jar:/mydb/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/mydb/cassandra/bin/../lib/slf4j-api-1.7.2.jar:/mydb/cassandra/bin/../lib/slf4j-log4j12-1.7.2.jar:/mydb/cassandra/bin/../lib/snakeyaml-1.11.jar:/mydb/cassandra/bin/../lib/snappy-java-1.0.5.jar:/mydb/cassandra/bin/../lib/snaptree-0.1.jar:/mydb/cassandra/bin/../lib/super-csv-2.1.0.jar:/mydb/cassandra/bin/../lib/thrift-server-0.3.3.jar org.apache.cassandra.service.CassandraDaemon On Mon, May 5, 2014 at 1:02 PM, Philip Persad philip.per...@gmail.comwrote: Have you tried running ulimit -a as the Cassandra user instead of as root? It is possible that your configured a high file limit for root but not for the user running the Cassandra process. On Sun, May 4, 2014 at 6:07 PM, Yatong Zhang bluefl...@gmail.comwrote: [root@storage5 ~]# lsof -n | grep java | wc -l 5103 [root@storage5 ~]# lsof | wc -l 6567 It's mentioned in previous mail:) On Mon, May 5, 2014 at 9:03 AM, nash nas...@gmail.com wrote: The lsof command or /proc can tell you how many open files it has. How many is it? --nash
Re: using cssandra cql with php
I think the options for using CQL from PHP pretty much don't exist. Those that do are very old, haven't been updated in months, and don't support newer CQL features. Also I don't think any of them use the binary protocol but use thrift instead. From what I can tell, you'll be stuck using old CQL features from unmaintained client drivers -- probably better to not be using CQL and PHP together since mixing them seems pretty bad right now. -Bryan On Sun, Jan 12, 2014 at 11:27 PM, Jason Wee peich...@gmail.com wrote: Hi, operating system should not be a matter right? You just need the cassandra client downloaded and use it to access cassandra node. PHP? http://wiki.apache.org/cassandra/ClientOptions perhaps you can package cassandra pdo driver into rpm? Jason On Mon, Jan 13, 2014 at 3:02 PM, Tim Dunphy bluethu...@gmail.com wrote: Hey all, I'd like to be able to make calls to the cassandra database using PHP. I've taken a look around but I've only found solutions out there for Ubuntu and other distros. But my environment is CentOS. Are there any packages out there I can install that would allow me to use CQL in my PHP code? Thanks Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: Heap is not released and streaming hangs at 0%
bloom_filter_fp_chance = 0.7 is probably way too large to be effective and you'll probably have issues compacting deleted rows and get poor read performance with a value that high. I'd guess that anything larger than 0.1 might as well be 1.0. -Bryan On Fri, Jun 21, 2013 at 5:58 AM, srmore comom...@gmail.com wrote: On Fri, Jun 21, 2013 at 2:53 AM, aaron morton aa...@thelastpickle.comwrote: nodetool -h localhost flush didn't do much good. Do you have 100's of millions of rows ? If so see recent discussions about reducing the bloom_filter_fp_chance and index_sampling. Yes, I have 100's of millions of rows. If this is an old schema you may be using the very old setting of 0.000744 which creates a lot of bloom filters. bloom_filter_fp_chance value that was changed from default to 0.1, looked at the filters and they are about 2.5G on disk and I have around 8G of heap. I will try increasing the value to 0.7 and report my results. It also appears to be a case of hard GC failure (as Rob mentioned) as the heap is never released, even after 24+ hours of idle time, the JVM needs to be restarted to reclaim the heap. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 20/06/2013, at 6:36 AM, Wei Zhu wz1...@yahoo.com wrote: If you want, you can try to force the GC through Jconsole. Memory-Perform GC. It theoretically triggers a full GC and when it will happen depends on the JVM -Wei -- *From: *Robert Coli rc...@eventbrite.com *To: *user@cassandra.apache.org *Sent: *Tuesday, June 18, 2013 10:43:13 AM *Subject: *Re: Heap is not released and streaming hangs at 0% On Tue, Jun 18, 2013 at 10:33 AM, srmore comom...@gmail.com wrote: But then shouldn't JVM C G it eventually ? I can still see Cassandra alive and kicking but looks like the heap is locked up even after the traffic is long stopped. No, when GC system fails this hard it is often a permanent failure which requires a restart of the JVM. nodetool -h localhost flush didn't do much good. This adds support to the idea that your heap is too full, and not full of memtables. You could try nodetool -h localhost invalidatekeycache, but that probably will not free enough memory to help you. =Rob
Re: Compaction not running
Manual compaction for LCS doesn't really do much. It certainly doesn't compact all those little files into bigger files. What makes you think that compactions are not occurring? -Bryan On Tue, Jun 18, 2013 at 3:59 PM, Franc Carter franc.car...@sirca.org.auwrote: On Sat, Jun 15, 2013 at 11:49 AM, Franc Carter franc.car...@sirca.org.auwrote: On Sat, Jun 15, 2013 at 8:48 AM, Robert Coli rc...@eventbrite.comwrote: On Wed, Jun 12, 2013 at 3:26 PM, Franc Carter franc.car...@sirca.org.au wrote: We are running a test system with Leveled compaction on Cassandra-1.2.4. While doing an initial load of the data one of the nodes ran out of file descriptors and since then it hasn't been automatically compacting. You have (at least) two options : 1) increase file descriptors available to Cassandra with ulimit, if possible 2) increase the size of your sstables with levelled compaction, such that you have fewer of them Oops, I wasn't clear enough. I have increased the number of file descriptors and no longer have a file descriptor issue. However the node still doesn't compact automatically. If I run a 'nodetool compact' it will do a small amount of compaction and then stop. The Column Family is using LCS Any ideas on this - compaction is still not automatically running for one of my nodes thanks cheers =Rob -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215
Re: [Cassandra] Conflict resolution in Cassandra
For generic questions like this, google is your friend: http://lmgtfy.com/?q=cassandra+conflict+resolution -Bryan On Thu, Jun 6, 2013 at 11:23 AM, Emalayan Vairavanathan svemala...@yahoo.com wrote: Hi All, Can someone tell me about the conflict resolution mechanisms provided by Cassandra? More specifically does Cassandra provides a way to define application specific conflict resolution mechanisms (per row basis / column basis)? or Does it automatically manage the conflicts based on some synchronization algorithms ? Thank you Emalayan
Re: Multiple JBOD data directory
If you're using cassandra 1.2 then you have a choice specified in the yaml # policy for data disk failures: # stop: shut down gossip and Thrift, leaving the node effectively dead, but # can still be inspected via JMX. # best_effort: stop using the failed disk and respond to requests based on # remaining available sstables. This means you WILL see obsolete # data at CL.ONE! # ignore: ignore fatal errors a -Bryan On Wed, Jun 5, 2013 at 6:11 AM, Christopher Wirt chris.w...@struq.comwrote: I would hope so. Just trying to get some confirmation from someone with production experience. ** ** Thanks for your reply ** ** *From:* Shahab Yunus [mailto:shahab.yu...@gmail.com] *Sent:* 05 June 2013 13:31 *To:* user@cassandra.apache.org *Subject:* Re: Multiple JBOD data directory ** ** Though, I am a newbie bust just had a thought regarding your question 'How will it handle requests for data which unavailable?', wouldn't the data be served in that case from other nodes where it has been replicated? ** ** Regards, Shahab ** ** On Wed, Jun 5, 2013 at 5:32 AM, Christopher Wirt chris.w...@struq.com wrote: Hello, We’re thinking about using multiple data directories each with its own disk and are currently testing this against a RAID0 config. I’ve seen that there is failure handling with multiple JBOD. e.g. We have two data directories mounted to separate drives /disk1 /disk2 One of the drives fails Will Cassandra continue to work? How will it handle requests for data which unavailable? If I want to add an additional drive what is the best way to go about redistributing the data? Thanks, Chris ** **
Re: Multiple JBOD data directory
... sorry, message got cut off # policy for data disk failures: # stop: shut down gossip and Thrift, leaving the node effectively dead, but # can still be inspected via JMX. # best_effort: stop using the failed disk and respond to requests based on # remaining available sstables. This means you WILL see obsolete # data at CL.ONE! # ignore: ignore fatal errors and let requests fail, as in pre-1.2 Cassandra disk_failure_policy: stop On Wed, Jun 5, 2013 at 2:59 PM, Bryan Talbot btal...@aeriagames.com wrote: If you're using cassandra 1.2 then you have a choice specified in the yaml # policy for data disk failures: # stop: shut down gossip and Thrift, leaving the node effectively dead, but # can still be inspected via JMX. # best_effort: stop using the failed disk and respond to requests based on # remaining available sstables. This means you WILL see obsolete # data at CL.ONE! # ignore: ignore fatal errors a -Bryan On Wed, Jun 5, 2013 at 6:11 AM, Christopher Wirt chris.w...@struq.comwrote: I would hope so. Just trying to get some confirmation from someone with production experience. ** ** Thanks for your reply ** ** *From:* Shahab Yunus [mailto:shahab.yu...@gmail.com] *Sent:* 05 June 2013 13:31 *To:* user@cassandra.apache.org *Subject:* Re: Multiple JBOD data directory ** ** Though, I am a newbie bust just had a thought regarding your question 'How will it handle requests for data which unavailable?', wouldn't the data be served in that case from other nodes where it has been replicated? ** ** Regards, Shahab ** ** On Wed, Jun 5, 2013 at 5:32 AM, Christopher Wirt chris.w...@struq.com wrote: Hello, We’re thinking about using multiple data directories each with its own disk and are currently testing this against a RAID0 config. I’ve seen that there is failure handling with multiple JBOD. e.g. We have two data directories mounted to separate drives /disk1 /disk2 One of the drives fails Will Cassandra continue to work? How will it handle requests for data which unavailable? If I want to add an additional drive what is the best way to go about redistributing the data? Thanks, Chris ** **
Re: Cassandra performance decreases drastically with increase in data size.
One or more of these might be effective depending on your particular usage - remove data (rows especially) - add nodes - add ram (has limitations) - reduce bloom filter space used by increasing fp chance - reduce row and key cache sizes - increase index sample ratio - reduce compaction concurrency and throughput - upgrade to cassandra 1.2 which does some of these things for you -Bryan On Thu, May 30, 2013 at 2:31 PM, srmore comom...@gmail.com wrote: You are right, it looks like I am doing a lot of GC. Is there any short-term solution for this other than bumping up the heap ? because, even if I increase the heap I will run into the same issue. Only the time before I hit OOM will be lengthened. It will be while before we go to latest and greatest Cassandra. Thanks ! On Thu, May 30, 2013 at 12:05 AM, Jonathan Ellis jbel...@gmail.comwrote: Sounds like you're spending all your time in GC, which you can verify by checking what GCInspector and StatusLogger say in the log. Fix is increase your heap size or upgrade to 1.2: http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 On Wed, May 29, 2013 at 11:32 PM, srmore comom...@gmail.com wrote: Hello, I am observing that my performance is drastically decreasing when my data size grows. I have a 3 node cluster with 64 GB of ram and my data size is around 400GB on all the nodes. I also see that when I re-start Cassandra the performance goes back to normal and then again starts decreasing after some time. Some hunting landed me to this page http://wiki.apache.org/cassandra/LargeDataSetConsiderations which talks about the large data sets and explains that it might be because I am going through multiple layers of OS cache, but does not tell me how to tune it. So, my question is, are there any optimizations that I can do to handle these large datatasets ? and why does my performance go back to normal when I restart Cassandra ? Thanks ! -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: data clean up problem
I think what you're asking for (efficient removal of TTL'd write-once data) is already in the works but not until 2.0 it seems. https://issues.apache.org/jira/browse/CASSANDRA-5228 -Bryan On Tue, May 28, 2013 at 1:26 PM, Hiller, Dean dean.hil...@nrel.gov wrote: Oh and yes, astyanax uses client side response latency and cassandra does the same as a client of the other nodes. Dean On 5/28/13 2:23 PM, Hiller, Dean dean.hil...@nrel.gov wrote: Actually, we did a huge investigation into this on astyanax and cassandra. Astyanax if I remember worked if configured correctly but casasndra did not so we patched cassandraŠfor some reason cassandra once on the co-ordinator who had one copy fo the data would wait for both other nodes to respond even though we are CL=QUOROM on RF=3Šwe put in patch for that which my teammate is still supposed to submit. Cassandra should only wait for one nodeŠat least I think that is how I remember itŠ.We have it in our backlog to get the patch into cassandra. Previously one slow node would slow down our website but this no longer happens to us such that when compaction kicks off on a single node, our cluster keeps going strong. Dean On 5/28/13 2:12 PM, Dwight Smith dwight.sm...@genesyslab.com wrote: How do you determine the slow node, client side response latency? -Original Message- From: Hiller, Dean [mailto:dean.hil...@nrel.gov] Sent: Tuesday, May 28, 2013 1:10 PM To: user@cassandra.apache.org Subject: Re: data clean up problem How much disk used on each node? We run the suggested 300G per node as above that compactions can have trouble keeping up. Ps. We run compactions during peak hours just fine because our client reroutes to the 2 of 3 nodes not running compactions based on seeing the slow node so performance stays fast. The easy route is to of course double your cluster and halve the data sizes per node so compaction can keep up. Dean From: cem cayiro...@gmail.commailto:cayiro...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, May 28, 2013 1:45 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: data clean up problem Thanks for the answer. Sorry for the misunderstanding. I tried to say I don't send delete request from the client so it safe to set gc_grace to 0. TTL is used for data clean up. I am not running a manual compaction. I tried that ones but it took a lot of time finish and I will not have this amount of off-peek time in the production to run this. I even set the compaction throughput to unlimited and it didnt help that much. Disk size just keeps on growing but I know that there is enough space to store 1 day data. What do you think about time rage partitioning? Creating new column family for each partition and drop when you know that all records are expired. I have 5 nodes. Cem. On Tue, May 28, 2013 at 9:37 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: Also, how many nodes are you running? From: cem cayiro...@gmail.commailto:cayiro...@gmail.commailto: cayiroglu@gmail.c o mmailto:cayiro...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@ c assandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@ c assandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, May 28, 2013 1:17 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@ c assandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto: user@ c assandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: data clean up problem Thanks for the answer but it is already set to 0 since I don't do any delete. Cem On Tue, May 28, 2013 at 9:03 PM, Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.commailto: edlinuxguru@g m ail.commailto:edlinuxg...@gmail.com wrote: You need to change the gc_grace time of the column family. It defaults to 10 days. By default the tombstones will not go away for 10 days. On Tue, May 28, 2013 at 2:46 PM, cem cayiro...@gmail.commailto:cayiro...@gmail.commailto: cayiroglu@gmail.c o mmailto:cayiro...@gmail.com wrote: Hi Experts, We have general problem about cleaning up data from the disk. I need to free the disk space after retention period and the customer wants to dimension the disk space base on that. After running multiple performance tests with TTL of 1 day we saw that the compaction couldn't keep up with the request rate. Disks were getting full after 3 days. There were also a lot of sstables that are older than 1 day after 3 days. Things that we tried: -Change the compaction strategy to
Re: In a multiple data center setup, do all the data centers have complete data irrespective of RF?
Option #3 since it depends on the placement strategy and not the partitioner. -Bryan On Mon, May 20, 2013 at 6:24 AM, Pinak Pani nishant.has.a.quest...@gmail.com wrote: I just wanted to verify the fact that if I happen to setup a multi data-center Cassandra setup, will each data center have the complete data-set with it? Say, I have two data-center each with two nodes, and a partitioner that ranges from 0 to 100. Initial token assigned this way DC1:N1 = 00 DC2:N1 = 25 DC1:N2 = 50 DC2:N2 = 75 where DCX is data center X, NX is node X. *Which one the following options is true?* *Option #1: *DC1 and DC2, each will hold complete dataset with keys bucketed as follows DC1:N1 = (50, 00] = 50 keys DC1:N2 = (00, 50] = 50 keys Complete data set mirrored at DC1 DC2:N1 = (75, 25] = 50 keys DC2:N2 = (25, 75] = 50 keys Complete data set mirrored at DC2 *Option #2: *DC1 and DC2, each will hold 50% of the data with keys bucketed as follows (much the same way in a single C setup) DC1:N1 = (75, 00] = 25 keys DC2:N1 = (00, 25] = 25 keys DC1:N2 = (25, 50] = 25 keys DC2:N2 = (50, 75] = 25 keys data is divided into the two data centers. Thanks, PP
Re: In a multiple data center setup, do all the data centers have complete data irrespective of RF?
On Mon, May 20, 2013 at 10:01 AM, Pinak Pani nishant.has.a.quest...@gmail.com wrote: Assume NetworkTopologyStrategy. So, I wanted to know whether a data-center will contain all the keys? This is the case: CREATE KEYSPACE appKS WITH placement_strategy = 'NetworkTopologyStrategy' AND strategy_options={DC1:3, DC2:3}; Does DC1 and DC2 each contain complete database corpus? That is, if DC1 blows, will I get all the data from DC2? Assume RF = 1. Your config sample isn't RF=1 it's RF=3. That's what the DC1:3 and DC2:3 mean -- set RF=3 for DC1 and RF=3 for DC2 for all rows of all CFs in this keyspace. Sorry, for the very elementary question. This is the post that made me ask this question: http://www.onsip.com/blog/2011/07/15/intro-to-cassandra-and-networktopologystrategy It says, NTS creates an iterator for EACH datacenter and places writes discretely for each. The result is that NTS basically breaks each datacenter into it's own logical ring when it places writes. A lot of things change in fast moving projects in 2 years, so you'll have to take anything written 2 years ago with a grain of salt and figure out if it's still true with whatever version you're using. That seems to mean that each data-center behaves as an independent ring with initial_token. So, If I have 2 data centers and NTS, I am basically mirroring the database. Right? Depending on how you've configured your placement strategy, but if you're using DC1:3 and DC2:3 like you have above, then yes, you'd expect to have 3 copies of every row in both data centers for that keyspace. -Bryan
Re: update does not apply to any replica if consistency = ALL and one replica is down
I think you're conflating may with must. That article says that updates may still be applied to some replicas when there is a failure and I believe that still is the case. However, if the coordinator knows that the CL can't be met before even attempting the write, I don't think it will attempt the write. -Bryan On Fri, May 17, 2013 at 1:48 AM, Sergey Naumov sknau...@gmail.com wrote: As described here ( http://maxgrinev.com/2010/07/12/update-idempotency-why-it-is-important-in-cassandra-applications-2/), if consistency level couldn't be met, updates are applied anyway on functional replicas, and they could be propagated later to other replicas using repair mechanisms or by issuing the same request later, as update operations are idempotent in Cassandra. But... on my configuration (Cassandra 1.2.4, python CQL 1.0.4, DC1 - 3 nodes, DC2 - 3 nodes, DC3 - 1 node, RF={DC1:3, DC2:2, DC3:1}, Random Partitioner, GossipingPropertyFileSnitch, one node in DC1 is deliberately down - and, as RF for DC1 is 3, this down node is a replica node for 100% of records), when I try to insert one record with consistency level of ALL, this insert does not appear on any replica (-s30 - is a serial of UUID1: 001e--1000--x (30 is 1e in hex), -n1 mean that we will insert/update a single record with first id from this series - 001e--1000--): *write with consistency ALL:* cassandra@host11:~/Cassandra$ ./insert.sh -s30 -n1 -cALL Traceback (most recent call last): File ./aux/fastinsert.py, line 54, in insert curs.execute(cmd, consistency_level=p.conlvl) OperationalError: Unable to complete request: one or more nodes were unavailable. Last record UUID is 001e--1000--* * about 10 seconds passed... * read with consistency ONE:* cassandra@host11:~/Cassandra$ ./select.sh -s30 -n1 -cONE Total records read: *0* Last record UUID is 001e--1000-- *read with consistency QUORUM:* cassandra@host11:~/Cassandra$ ./select.sh -s30 -n1 -cQUORUM Total records read: *0* Last record UUID is 001e--1000-- *write with consistency QUORUM:* cassandra@host11:~/Cassandra$ ./insert.sh -s30 -n1 -cQUORUM Last record UUID is 001e--1000-- *read with consistency QUORUM:* cassandra@host11:~/Cassandra$ ./select.sh -s30 -n1 -cQUORUM Total records read: *1* Last record UUID is 001e--1000-- Is it a new feature of Cassandra that it does not perform a write to any replica if consistency couldn't be satisfied? If so, then is it true for all cases, for example when returned error is OperationalError: Request did not complete within rpc_timeout? Thanks in advance, Sergey Naumov.
Re: SSTable size versus read performance
512 sectors for read-ahead. Are your new fancy SSD drives using large sectors? If your read-ahead is really reading 512 x 4KB per random IO, then that 2 MB per read seems like a lot of extra overhead. -Bryan On Thu, May 16, 2013 at 12:35 PM, Keith Wright kwri...@nanigans.com wrote: We actually have it set to 512. I have tried decreasing my SSTable size to 5 MB and changing the chunk size to 8 kb From: Igor i...@4friends.od.ua Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Thursday, May 16, 2013 1:55 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: SSTable size versus read performance My 5 cents: I'd check blockdev --getra for data drives - too high values for readahead (default to 256 for debian) can hurt read performance.
Re: index_interval
So will cassandra provide a way to limit its off-heap usage to avoid unexpected OOM kills? I'd much rather have performance degrade when 100% of the index samples no longer fit in memory rather than the process being killed with no way to stabilize it without adding hardware or removing data. -Bryan On Fri, May 10, 2013 at 7:44 PM, Edward Capriolo edlinuxg...@gmail.comwrote: If you use your off heap memory linux has an OOM killer, that will kill a random tasks. On Fri, May 10, 2013 at 11:34 AM, Bryan Talbot btal...@aeriagames.comwrote: If off-heap memory (for indes samples, bloom filters, row caches, key caches, etc) is exhausted, will cassandra experience a memory allocation error and quit? If so, are there plans to make the off-heap usage more dynamic to allow less used pages to be replaced with hot data and the paged-out / cold data read back in again on demand?
Re: index_interval
If off-heap memory (for indes samples, bloom filters, row caches, key caches, etc) is exhausted, will cassandra experience a memory allocation error and quit? If so, are there plans to make the off-heap usage more dynamic to allow less used pages to be replaced with hot data and the paged-out / cold data read back in again on demand? -Bryan On Wed, May 8, 2013 at 4:24 PM, Jonathan Ellis jbel...@gmail.com wrote: index_interval won't be going away, but you won't need to change it as often in 2.0: https://issues.apache.org/jira/browse/CASSANDRA-5521 On Mon, May 6, 2013 at 12:27 PM, Hiller, Dean dean.hil...@nrel.gov wrote: I heard a rumor that index_interval is going away? What is the replacement for this? (we have been having to play with this setting a lot lately as too big and it gets slow yet too small and cassandra uses way too much RAM…we are still trying to find the right balance with this setting). Thanks, Dean -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced
Re: Cassandra running High Load with no one using the cluster
On Sat, May 4, 2013 at 9:22 PM, Aiman Parvaiz ai...@grapheffect.com wrote: When starting this cluster we set JVM_OPTS=$JVM_OPTS -Xss1000k Why did you increase the stack-size to 5.5 times greater than recommended? Since each threads now uses 1000KB minimum just for the stack, a large number of threads will use a large amount of memory. -Bryan
Re: Adding nodes in 1.2 with vnodes requires huge disks
I believe that nodetool rebuild is used to add a new datacenter, not just a new host to an existing cluster. Is that what you ran to add the node? -Bryan On Fri, Apr 26, 2013 at 1:27 PM, John Watson j...@disqus.com wrote: Small relief we're not the only ones that had this issue. We're going to try running a shuffle before adding a new node again... maybe that will help - John On Fri, Apr 26, 2013 at 5:07 AM, Francisco Nogueira Calmon Sobral fsob...@igcorp.com.br wrote: I am using the same version and observed something similar. I've added a new node, but the instructions from Datastax did not work for me. Then I ran nodetool rebuild on the new node. After finished this command, it contained two times the load of the other nodes. Even when I ran nodetool cleanup on the older nodes, the situation was the same. The problem only seemed to disappear when nodetool repair was applied to all nodes. Regards, Francisco Sobral. On Apr 25, 2013, at 4:57 PM, John Watson j...@disqus.com wrote: After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and running upgradesstables, I figured it would be safe to start adding nodes to the cluster. Guess not? It seems when new nodes join, they are streamed *all* sstables in the cluster. https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png The gray the line machine ran out disk space and for some reason cascaded into errors in the cluster about 'no host id' when trying to store hints for it (even though it hadn't joined yet). The purple line machine, I just stopped the joining process because the main cluster was dropping mutation messages at this point on a few nodes (and it still had dozens of sstables to stream.) I followed this: http://www.datastax.com/docs/1.2/operations/add_replace_nodes Is there something missing in that documentation? Thanks, John
Re: Cassandra services down frequently [Version 1.1.4]
On Thu, Apr 4, 2013 at 1:27 AM, adeel.ak...@panasiangroup.com wrote: After some time (1 hour / 2 hour) cassandra shut services on one or two nodes with follwoing errors; Wonder what the workload and schema is like ... We can see from below that you've tweaked and disabled many of the memory safety valve and other memory related settings. Those could be causing issues too. hinted_handoff_throttle_delay_**in_ms: 0 flush_largest_memtables_at: 1.0 reduce_cache_sizes_at: 1.0 reduce_cache_capacity_to: 0.6 rpc_keepalive: true rpc_server_type: sync rpc_min_threads: 16 rpc_max_threads: 2147483647 in_memory_compaction_limit_in_**mb: 256 compaction_throughput_mb_per_**sec: 16 rpc_timeout_in_ms: 15000 dynamic_snitch_badness_**threshold: 0.0
Re: Timeseries data
In the worst case, that is possible, but compaction strategies try to minimize the number of SSTables that a row appears in so a row being in ALL SStables is not likely for most cases. -Bryan On Wed, Mar 27, 2013 at 12:17 PM, Kanwar Sangha kan...@mavenir.com wrote: Hi – I have a query on Read with Cassandra. We are planning to have dynamic column family and each column would be on based a timeseries. ** ** Inserting data — key = ‘xxx′, {column_name = TimeUUID(now), :column_value = ‘value’ }, {column_name = TimeUUID(now), :column_value = ‘value’ },.. ** ** Now this key might be spread across multiple SSTables over a period of days. When we do a READ query to fetch say a slice of data from this row based on time X-Y , would it need to get data from ALL sstables ? ** ** Thanks, Kanwar ** **
Re: old data / tombstones are not deleted after ttl
Those older files won't be included in a compaction until there are min_compaction_threshold (4) files of that size. When you get another SS table -Data.db file that is about 12-18GB then you'll have 4 and they will be compacted together into one new file. At that time, if there are any rows with only tombstones that are all older than gc_grace the row will be removed (assuming the row exists exclusively in the 4 input SS tables). Columns with data that is more than TTL seconds old will be written with a tombstone. If the row does have column values in SS tables that are not being compacted, the row will not be removed. -Bryan On Sun, Mar 3, 2013 at 11:07 PM, Matthias Zeilinger matthias.zeilin...@bwinparty.com wrote: Hi, ** ** I´m running Cassandra 1.1.5 and have following issue. ** ** I´m using a 10 days TTL on my CF. I can see a lot of tombstones in there, but they aren´t deleted after compaction. ** ** I have tried a nodetool –cleanup and also a restart of Cassandra, but nothing happened. ** ** total 61G drwxr-xr-x 2 cassandra dba 20K Mar 4 06:35 . drwxr-xr-x 10 cassandra dba 4.0K Dec 10 13:05 .. -rw-r--r-- 1 cassandra dba 15M Dec 15 22:04 whatever-he-1398-CompressionInfo.db -rw-r--r-- 1 cassandra dba 19G Dec 15 22:04 whatever-he-1398-Data.db -rw-r--r-- 1 cassandra dba 15M Dec 15 22:04 whatever-he-1398-Filter.db** ** -rw-r--r-- 1 cassandra dba 357M Dec 15 22:04 whatever-he-1398-Index.db*** * -rw-r--r-- 1 cassandra dba 4.3K Dec 15 22:04 whatever-he-1398-Statistics.db -rw-r--r-- 1 cassandra dba 9.5M Feb 6 15:45 whatever-he-5464-CompressionInfo.db -rw-r--r-- 1 cassandra dba 12G Feb 6 15:45 whatever-he-5464-Data.db -rw-r--r-- 1 cassandra dba 48M Feb 6 15:45 whatever-he-5464-Filter.db** ** -rw-r--r-- 1 cassandra dba 736M Feb 6 15:45 whatever-he-5464-Index.db*** * -rw-r--r-- 1 cassandra dba 4.3K Feb 6 15:45 whatever-he-5464-Statistics.db -rw-r--r-- 1 cassandra dba 9.7M Feb 21 19:13 whatever-he-6829-CompressionInfo.db -rw-r--r-- 1 cassandra dba 12G Feb 21 19:13 whatever-he-6829-Data.db -rw-r--r-- 1 cassandra dba 47M Feb 21 19:13 whatever-he-6829-Filter.db** ** -rw-r--r-- 1 cassandra dba 792M Feb 21 19:13 whatever-he-6829-Index.db*** * -rw-r--r-- 1 cassandra dba 4.3K Feb 21 19:13 whatever-he-6829-Statistics.db -rw-r--r-- 1 cassandra dba 3.7M Mar 1 10:46 whatever-he-7578-CompressionInfo.db -rw-r--r-- 1 cassandra dba 4.3G Mar 1 10:46 whatever-he-7578-Data.db -rw-r--r-- 1 cassandra dba 12M Mar 1 10:46 whatever-he-7578-Filter.db** ** -rw-r--r-- 1 cassandra dba 274M Mar 1 10:46 whatever-he-7578-Index.db*** * -rw-r--r-- 1 cassandra dba 4.3K Mar 1 10:46 whatever-he-7578-Statistics.db -rw-r--r-- 1 cassandra dba 3.6M Mar 1 11:21 whatever-he-7582-CompressionInfo.db -rw-r--r-- 1 cassandra dba 4.3G Mar 1 11:21 whatever-he-7582-Data.db -rw-r--r-- 1 cassandra dba 9.7M Mar 1 11:21 whatever-he-7582-Filter.db** ** -rw-r--r-- 1 cassandra dba 236M Mar 1 11:21 whatever-he-7582-Index.db*** * -rw-r--r-- 1 cassandra dba 4.3K Mar 1 11:21 whatever-he-7582-Statistics.db -rw-r--r-- 1 cassandra dba 3.7M Mar 3 12:13 whatever-he-7869-CompressionInfo.db -rw-r--r-- 1 cassandra dba 4.3G Mar 3 12:13 whatever-he-7869-Data.db -rw-r--r-- 1 cassandra dba 9.8M Mar 3 12:13 whatever-he-7869-Filter.db** ** -rw-r--r-- 1 cassandra dba 239M Mar 3 12:13 whatever-he-7869-Index.db*** * -rw-r--r-- 1 cassandra dba 4.3K Mar 3 12:13 whatever-he-7869-Statistics.db -rw-r--r-- 1 cassandra dba 924K Mar 3 18:02 whatever-he-7953-CompressionInfo.db -rw-r--r-- 1 cassandra dba 1.1G Mar 3 18:02 whatever-he-7953-Data.db -rw-r--r-- 1 cassandra dba 2.1M Mar 3 18:02 whatever-he-7953-Filter.db** ** -rw-r--r-- 1 cassandra dba 51M Mar 3 18:02 whatever-he-7953-Index.db*** * -rw-r--r-- 1 cassandra dba 4.3K Mar 3 18:02 whatever-he-7953-Statistics.db -rw-r--r-- 1 cassandra dba 231K Mar 3 20:06 whatever-he-7974-CompressionInfo.db -rw-r--r-- 1 cassandra dba 268M Mar 3 20:06 whatever-he-7974-Data.db -rw-r--r-- 1 cassandra dba 483K Mar 3 20:06 whatever-he-7974-Filter.db** ** -rw-r--r-- 1 cassandra dba 12M Mar 3 20:06 whatever-he-7974-Index.db*** * -rw-r--r-- 1 cassandra dba 4.3K Mar 3 20:06 whatever-he-7974-Statistics.db -rw-r--r-- 1 cassandra dba 116K Mar 4 06:28 whatever-he-8002-CompressionInfo.db -rw-r--r-- 1 cassandra dba 146M Mar 4 06:28 whatever-he-8002-Data.db -rw-r--r-- 1 cassandra dba 646K Mar 4 06:28 whatever-he-8002-Filter.db** ** -rw-r--r-- 1 cassandra dba 16M Mar 4 06:28 whatever-he-8002-Index.db*** * -rw-r--r-- 1 cassandra dba 4.3K Mar 4 06:28 whatever-he-8002-Statistics.db -rw-r--r-- 1 cassandra dba 58K Mar 4 06:28 whatever-he-8003-CompressionInfo.db -rw-r--r-- 1 cassandra
Re: Reading old data problem
On Thu, Feb 28, 2013 at 5:08 PM, Víctor Hugo Oliveira Molinar vhmoli...@gmail.com wrote: Ok guys let me try to ask it in a different way: Will repair totally ensure a data synchronism among nodes? If there are no writes happening on the cluster then yes. Otherwise, the answer is it depends since all the normal things that lead to inconsistencies can still happen. Extra question: Once I write at CL=All, will C* ensure that I can read from ANY node without an inconsistency? The reverse state, writing at CL=One but reading at CL=All will also ensure that? You can get consistent behavior if CL.read + CL.write RF. So since you have just 2 nodes and RF=2, you'd need to have at least CL.read=2 and CL.write=1 or CL.read=1 and CL.write=2. -Bryan On Wed, Feb 27, 2013 at 11:24 PM, Víctor Hugo Oliveira Molinar vhmoli...@gmail.com wrote: Hello, I need some help to manage my live cluster! I'm currently running a cluster with 2 nodes, RF:2, CL:1. Since I'm limited to hardware upgrade issues, I'm not able to increase my ConsitencyLevel for now. Anyway, * *I ran a full repair on each node of the cluster followed by a flush. Although I'm still reading old data when performing queries. Well it's know that I might read old data during normal operations, but shouldnt it be sync after the full antientropy repair? What I'm missing? Thanks in advance!
Re: heap usage
Aren't bloom filters kept off heap in 1.2? https://issues.apache.org/jira/browse/CASSANDRA-4865 Disabling bloom filters also disables tombstone removal as well, so don't disable them if you delete anything. https://issues.apache.org/jira/browse/CASSANDRA-5182 I believe that the index samples (by default every 128th entry) are still kept in in memory so your JVM memory will scale with the number of rows stored. Additional memory is used for every keyspace and CF too so if you have thousands of CF that could be an issue. -Bryan On Fri, Feb 15, 2013 at 8:16 AM, Edward Capriolo edlinuxg...@gmail.comwrote: It is not going to be true for long that LCS does not require bloom filters. https://issues.apache.org/jira/browse/CASSANDRA-5029 Apparently, without bloom filters there are issues. On Fri, Feb 15, 2013 at 7:29 AM, Blake Manders bl...@crosspixel.net wrote: You probably want to look at your bloom filters. Be forewarned though, they're difficult to change; changes to bloom filter settings only apply to new SSTables, so they might not be noticeable until a few compactions have taken place. If that is your issue, and your usage model fits it, a good alternative to the slow propagation of higher miss rates is to switch to LCS (which doesn't use bloom filters), which won't require you to make the jump to 1.2. On Fri, Feb 15, 2013 at 4:06 AM, Reik Schatz reik.sch...@gmail.com wrote: Hi, recently we are hitting some OOM: Java heap space, so I was investigating how the heap is used in Cassandra 1.2+ We use the calculated 4G heap. Our cluster is 6 nodes, around 750 GB data and a replication factor of 3. Row cache is disabled. All key cache and memtable settings are left at default. Is the primary key index kept in heap memory? We have a bunch of keyspaces and column families. Thanks, Rik -- Blake Manders | CTO Cross Pixel, Inc. | 494 8th Ave, Penthouse | NYC 10001 Website: crosspixel.net Twitter: twitter.com/CrossPix
Re: Deletion consistency
With a RF and CL of one, there is no replication so there can be no issue with distributed deletes. Writes (and reads) can only go to the one host that has the data and will be refused if that node is down. I'd guess that your app isn't deleting records when you think that it is, or that the delete is failing but not being detected as failed. -Bryan On Fri, Feb 15, 2013 at 10:21 AM, Mike mthero...@yahoo.com wrote: If you increase the number of nodes to 3, with an RF of 3, then you should be able to read/delete utilizing a quorum consistency level, which I believe will help here. Also, make sure the time of your servers are in sync, utilizing NTP, as drifting time between you client and server could cause updates to be mistakenly dropped for being old. Also, make sure you are running with a gc_grace period that is high enough. The default is 10 days. Hope this helps, -Mike On 2/15/2013 1:13 PM, Víctor Hugo Oliveira Molinar wrote: hello everyone! I have a column family filled with event objects which need to be processed by query threads. Once each thread query for those objects(spread among columns bellow a row), it performs a delete operation for each object in cassandra. It's done in order to ensure that these events wont be processed again. Some tests has showed me that it works, but sometimes i'm not getting those events deleted. I checked it through cassandra-cli,etc. So, reading it (http://wiki.apache.org/**cassandra/DistributedDeleteshttp://wiki.apache.org/cassandra/DistributedDeletes) I came to a conclusion that I may be reading old data. My cluster is currently configured as: 2 nodes, RF1, CL 1. In that case, what should I do? - Increase the consistency level for the write operations( in that case, the deletions ). In order to ensure that those deletions are stored in all nodes. or - Increase the consistency level for the read operations. In order to ensure that I'm reading only those yet processed events(deleted). ? - Thanks in advance
Re: Cluster not accepting insert while one node is down
Generally data isn't written to whatever node the client connects to. In your case, a row is written to one of the nodes based on the hash of the row key. If that one replica node is down, it won't matter which coordinator node you attempt a write with CL.ONE: the write will fail. If you want the write to succeed, you could do any one of: write with CL.ANY, increase RF to 2+, write using a row key that hashes to an UP node. -Bryan On Thu, Feb 14, 2013 at 2:06 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: I will let commiters or anyone that has knowledge on Cassandra internal answer this. From what I understand, you should be able to insert data on any up node with your configuration... Alain 2013/2/14 Traian Fratean traian.frat...@gmail.com You're right as regarding data availability on that node. And my config, being the default one, is not suited for a cluster. What I don't get is that my 67 node was down and I was trying to insert in 66 node, as can be seen from the stacktrace. Long story short: when node 67 was down I could not insert into any machine in the cluster. Not what I was expecting. Thank you for the reply! Traian. 2013/2/14 Alain RODRIGUEZ arodr...@gmail.com Hi Traian, There is your problem. You are using RF=1, meaning that each node is responsible for its range, and nothing more. So when a node goes down, do the math, you just can't read 1/5 of your data. This is very cool for performances since each node owns its own part of the data and any write or read need to reach only one node, but it removes the SPOF, which is a main point of using C*. So you have poor availability and poor consistency. An usual configuration with 5 nodes would be RF=3 and both CL (RW) = QUORUM. This will replicate your data to 2 nodes + the natural endpoints (total of 3/5 nodes owning any data) and any read or write would need to reach at least 2 nodes before being considered as being successful ensuring a strong consistency. This configuration allow you to shut down a node (crash or configuration update/rolling restart) without degrading the service (at least allowing you to reach any data) but at cost of more data on each node. Alain 2013/2/14 Traian Fratean traian.frat...@gmail.com I am using defaults for both RF and CL. As the keyspace was created using cassandra-cli the default RF should be 1 as I get it from below: [default@TestSpace] describe; Keyspace: TestSpace: Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy Durable Writes: true Options: [datacenter1:1] As for the CL it the Astyanax default, which is 1 for both reads and writes. Traian. 2013/2/13 Alain RODRIGUEZ arodr...@gmail.com We probably need more info like the RF of your cluster and CL of your reads and writes. Maybe could you also tell us if you use vnodes or not. I heard that Astyanax was not running very smoothly on 1.2.0, but a bit better on 1.2.1. Yet, Netflix didn't release a version of Astyanax for C*1.2. Alain 2013/2/13 Traian Fratean traian.frat...@gmail.com Hi, I have a cluster of 5 nodes running Cassandra 1.2.0 . I have a Java client with Astyanax 1.56.21. When a node(10.60.15.67 - *diiferent* from the one in the stacktrace below) went down I get TokenRandeOfflineException and no other data gets inserted into *any other* node from the cluster. Am I having a configuration issue or this is supposed to happen? com.netflix.astyanax.connectionpool.impl.CountingConnectionPoolMonitor.trackError(CountingConnectionPoolMonitor.java:81) - com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160, latency=2057(2057), attempts=1]UnavailableException() com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: TokenRangeOfflineException: [host=10.60.15.66(10.60.15.66):9160, latency=2057(2057), attempts=1]UnavailableException() at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165) at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60) at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:27) at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:140) at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69) at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:255) Thank you, Traian.
Re: Upgrade from 0.6.x to 1.2.x
Wow, that's pretty ambitions expecting an upgrade which skips 4 major versions (0.7, 0.8, 1.0, 1.1) to work. I think you're going to have to follow the upgrade path for each of those intermediate steps and not upgrade in one big jump. -Bryan On Thu, Feb 7, 2013 at 3:41 AM, Sergey Leschenko sergle...@gmail.comwrote: Hi, all I'm trying to update our old version 0.6.5 to current 1.2.1 All nodes has been drained and stopped. Proper cassandra.yaml created, schema file prepared. Trying to start version 1.2.1 on the one node (full output attached to email): ... ERROR 11:12:44,530 Exception encountered during startup java.lang.NullPointerException at org.apache.cassandra.db.SystemTable.upgradeSystemData(SystemTable.java:161) at org.apache.cassandra.db.SystemTable.finishStartup(SystemTable.java:107) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:276) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:370) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:413) java.lang.NullPointerException at org.apache.cassandra.db.SystemTable.upgradeSystemData(SystemTable.java:161) at org.apache.cassandra.db.SystemTable.finishStartup(SystemTable.java:107) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:276) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:370) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:413) Exception encountered during startup: null On the next attempts daemon started, but still with AssertionErrors Question 1 - is it possible start the new version from the first attempt? Then I loaded schema via cassandra-cli, and run nodetool scrub - which caused a big number of warnings in log: OutputHandler.java (line 52) Index file contained a different key or row size; using key from data file storage-conf.xml from 0.6.5 has column family defined as ColumnFamily Name=Invoices CompareWith=BytesType/ for 1.2.1 I used create column family Invoices with column_type = 'Standard' and comparator = 'BytesType'; Question 2 - how to get rid of these warnings? Are they connected to column family definition? Thanks -- Sergey
Re: too many warnings of Heap is full
My guess is that those one or two nodes with the gc pressure also have more rows in your big CF. More rows could be due to imbalanced distribution if your'e not using a random partitioner or from those nodes not yet removing deleted rows which other nodes may have done. JVM heap space is used for a few things which scale with key count including: - bloom filter (for C* 1.2) - index samples Other space is used but can be more easily controlled by tuning for - memtable - compaction - key cache - row cache So, if those nodes have more rows (check using nodetool ring or nodetool cfstats) than the others you can try to: - reduce the number of rows by adding nodes, run manual / tune compactions to remove rows with expired tombstones, etc. - increase bloom filter fp chance - increase jvm heap size (don't go too big) - disable key or row cache - increase index sample interval Not all of those things are generally good especially to the extreme so don't go setting a 20 GB jvm heap without understanding the consequences for example. -Bryan On Wed, Jan 30, 2013 at 3:47 AM, Guillermo Barbero guillermo.barb...@spotbros.com wrote: Hi, I'm viewing a weird behaviour in my cassandra cluster. Most of the warning messages are due to Heap is % full. According to this link ( http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassndra-1-0-6-GC-query-tt7323457.html ) there are two ways to reduce pressure: 1. Decrease the cache sizes 2. Increase the index interval size Most of the flushes are in two column families (users and messages), I guess that's because the most mutations are there. I still have not applied those changes to the production environment. Do you recommend any other meassure? Should I set specific tunning for these two CFs? Should I check another metric? Additionally, the distribution of warning messages is not uniform along the cluster. Why could cassandra be doing this? What should I do to find out how to fix this? cassandra runs on a 6 node cluster of m1.xlarge machines (Amazon EC2) the java version is the following: java version 1.6.0_37 Java(TM) SE Runtime Environment (build 1.6.0_37-b06) Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode) The cassandra system.log is resumed here (numer of messages, cassandra node, class that reports the message, first word of the message) 2013-01-26 5 cassNode0: GCInspector.java Heap 5 cassNode0: StorageService.java Flushing 232 cassNode2: GCInspector.java Heap 232 cassNode2: StorageService.java Flushing 104 cassNode3: GCInspector.java Heap 104 cassNode3: StorageService.java Flushing 3 cassNode4: GCInspector.java Heap 3 cassNode4: StorageService.java Flushing 3 cassNode5: GCInspector.java Heap 3 cassNode5: StorageService.java Flushing 2013-01-27 2 cassNode0: GCInspector.java Heap 2 cassNode0: StorageService.java Flushing 3 cassNode1: GCInspector.java Heap 3 cassNode1: StorageService.java Flushing 189 cassNode2: GCInspector.java Heap 189 cassNode2: StorageService.java Flushing 104 cassNode3: GCInspector.java Heap 104 cassNode3: StorageService.java Flushing 1 cassNode4: GCInspector.java Heap 1 cassNode4: StorageService.java Flushing 1 cassNode5: GCInspector.java Heap 1 cassNode5: StorageService.java Flushing 2013-01-28 2 cassNode0: GCInspector.java Heap 2 cassNode0: StorageService.java Flushing 1 cassNode1: GCInspector.java Heap 1 cassNode1: StorageService.java Flushing 1 cassNode2: AutoSavingCache.java Reducing 343 cassNode2: GCInspector.java Heap 342 cassNode2: StorageService.java Flushing 181 cassNode3: GCInspector.java Heap 181 cassNode3: StorageService.java Flushing 4 cassNode4: GCInspector.java Heap 4 cassNode4: StorageService.java Flushing 3 cassNode5: GCInspector.java Heap 3 cassNode5: StorageService.java Flushing 2013-01-29 2 cassNode0: GCInspector.java Heap 2 cassNode0: StorageService.java Flushing 3 cassNode1: GCInspector.java Heap 3 cassNode1: StorageService.java Flushing 156 cassNode2: GCInspector.java Heap 156 cassNode2: StorageService.java Flushing 71 cassNode3: GCInspector.java Heap 71 cassNode3: StorageService.java Flushing 2 cassNode4: GCInspector.java Heap 2 cassNode4: StorageService.java Flushing 2 cassNode5: GCInspector.java Heap 1 cassNode5: Memtable.java setting 2 cassNode5: StorageService.java Flushing -- Guillermo Barbero - Backend Team Spotbros Technologies
Re: LCS not removing rows with all TTL expired columns
It turns out that having gc_grace=0 isn't required to produce the problem. My colleague did a lot of digging into the compaction code and we think he's found the issue. It's detailed in https://issues.apache.org/jira/browse/CASSANDRA-5182 Basically tombstones for a row will not be removed from an SSTable during compaction if the row appears in other SSTables; however, the compaction code checks the bloom filters to make this determination. Since this data is rarely read we had the bloom_filter_fp_ratio set to 1.0 which makes rows seem to appear in every SSTable as far as compaction is concerned. This caused our data to essentially never be removed when using either STSC or LCS and will probably affect anyone else running 1.1 with high bloom filter fp ratios. Setting our fp ratio to 0.1, running upgradesstables and running the application as it was before seems to have stabilized the load as desired at the expense of additional jvm memory. -Bryan On Thu, Jan 17, 2013 at 6:50 PM, Bryan Talbot btal...@aeriagames.comwrote: Bleh, I rushed out the email before some meetings and I messed something up. Working on reproducing now with better notes this time. -Bryan On Thu, Jan 17, 2013 at 4:45 PM, Derek Williams de...@fyrie.net wrote: When you ran this test, is that the exact schema you used? I'm not seeing where you are setting gc_grace to 0 (although I could just be blind, it happens). On Thu, Jan 17, 2013 at 5:01 PM, Bryan Talbot btal...@aeriagames.comwrote: I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7, 1.1.8, a trivial schema, and a simple script that just inserts rows. If the TTL is small enough so that all LCS data fits in generation 0 then the rows seem to be removed with TTL expires as desired. However, if the insertion rate is high enough or the TTL long enough then the data keep accumulating for far longer than expected. Using 120 second TTL and a single threaded php insertion script my MBP with SSD retained almost all of the data. 120 seconds should accumulate 5-10 MB of data. I would expect that TTL rows to be removed eventually and for the cassandra load to level off at some reasonable value near 10 MB. After running for 2 hours and with a cassandra load of ~550 MB I stopped the test. The schema is create keyspace test with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor : 1} and durable_writes = true; use test; create column family test with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'TimeUUIDType' and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'NONE' and bloom_filter_fp_chance = 1.0 and column_metadata = [ {column_name : 'a', validation_class : LongType}]; and the insert script is ?php require_once('phpcassa/1.0.a.5/autoload.php'); use phpcassa\Connection\ConnectionPool; use phpcassa\ColumnFamily; use phpcassa\SystemManager; use phpcassa\UUID; // Connect to test keyspace and column family $sys = new SystemManager('127.0.0.1'); // Start a connection pool, create our ColumnFamily instance $pool = new ConnectionPool('test', array('127.0.0.1')); $testCf = new ColumnFamily($pool, 'test'); // Insert records while( 1 ) { $testCf-insert(UUID::uuid1(), array(a = 1), null, 120); } // Close our connections $pool-close(); $sys-close(); ? -Bryan On Thu, Jan 17, 2013 at 10:11 AM, Bryan Talbot btal...@aeriagames.comwrote: We are using LCS and the particular row I've referenced has been involved in several compactions after all columns have TTL expired. The most recent one was again this morning and the row is still there -- TTL expired for several days now with gc_grace=0 and several compactions later ... $ ./bin/nodetool -h localhost getsstables metrics request_summary 459fb460-5ace-11e2-9b92-11d67b6163b4 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db $ ls -alF /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db -rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db $ ./bin/sstable2json /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump -e '36/1 %x') { 34353966623436302d356163652d313165322d396239322d313164363762363136336234: [[app_name,50f21d3d,1357785277207001,d], [client_ip,50f21d3d,1357785277207001,d], [client_req_id,50f21d3d,1357785277207001,d], [mysql_call_cnt,50f21d3d,1357785277207001,d], [mysql_duration_us,50f21d3d,1357785277207001,d], [mysql_failure_call_cnt,50f21d3d,1357785277207001,d], [mysql_success_call_cnt,50f21d3d,1357785277207001,d], [req_duration_us,50f21d3d
Re: LCS not removing rows with all TTL expired columns
** ** On 17/01/2013, at 2:55 PM, Bryan Talbot btal...@aeriagames.com wrote:*** * According to the timestamps (see original post) the SSTable was written (thus compacted compacted) 3 days after all columns for that row had expired and 6 days after the row was created; yet all columns are still showing up in the SSTable. Note that the column shows now rows when a get for that key is run so that's working correctly, but the data is lugged around far longer than it should be -- maybe forever. ** ** ** ** -Bryan ** ** On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh ailin...@gmail.com wrote: To get column removed you have to meet two requirements 1. column should be expired 2. after that CF gets compacted ** ** I guess your expired columns are propagated to high tier CF, which gets compacted rarely. So, you have to wait when high tier CF gets compacted. ** ** Andrey ** ** ** ** On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot btal...@aeriagames.com wrote: On cassandra 1.1.5 with a write heavy workload, we're having problems getting rows to be compacted away (removed) even though all columns have expired TTL. We've tried size tiered and now leveled and are seeing the same symptom: the data stays around essentially forever. ** ** Currently we write all columns with a TTL of 72 hours (259200 seconds) and expect to add 10 GB of data to this CF per day per node. Each node currently has 73 GB for the affected CF and shows no indications that old rows will be removed on their own. ** ** Why aren't rows being removed? Below is some data from a sample row which should have been removed several days ago but is still around even though it has been involved in numerous compactions since being expired. ** ** $ ./bin/nodetool -h localhost getsstables metrics request_summary 459fb460-5ace-11e2-9b92-11d67b6163b4 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db ** ** $ ls -alF /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db ** ** $ ./bin/sstable2json /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump -e '36/1 %x') { 34353966623436302d356163652d313165322d396239322d313164363762363136336234: [[app_name,50f21d3d,1357785277207001,d], [client_ip,50f21d3d,1357785277207001,d], [client_req_id,50f21d3d,1357785277207001,d], [mysql_call_cnt,50f21d3d,1357785277207001,d], [mysql_duration_us,50f21d3d,1357785277207001,d], [mysql_failure_call_cnt,50f21d3d,1357785277207001,d], [mysql_success_call_cnt,50f21d3d,1357785277207001,d], [req_duration_us,50f21d3d,1357785277207001,d], [req_finish_time_us,50f21d3d,1357785277207001,d], [req_method,50f21d3d,1357785277207001,d], [req_service,50f21d3d,1357785277207001,d], [req_start_time_us,50f21d3d,1357785277207001,d], [success,50f21d3d,1357785277207001,d]] } ** ** ** ** Decoding the column timestamps to shows that the columns were written at Thu, 10 Jan 2013 02:34:37 GMT and that their TTL expired at Sun, 13 Jan 2013 02:34:37 GMT. The date of the SSTable shows that it was generated on Jan 16 which is 3 days after all columns have TTL-ed out. ** ** ** ** The schema shows that gc_grace is set to 0 since this data is write-once, read-seldom and is never updated or deleted. ** ** create column family request_summary with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 0 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'NONE' and bloom_filter_fp_chance = 1.0 and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'}; ** ** ** ** Thanks in advance for help in understanding why rows such as this are not removed! ** ** -Bryan ** ** ** ** ** ** ** ** signature-best-employer-logo4823.pngsignature-logo29.png
Re: LCS not removing rows with all TTL expired columns
I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7, 1.1.8, a trivial schema, and a simple script that just inserts rows. If the TTL is small enough so that all LCS data fits in generation 0 then the rows seem to be removed with TTL expires as desired. However, if the insertion rate is high enough or the TTL long enough then the data keep accumulating for far longer than expected. Using 120 second TTL and a single threaded php insertion script my MBP with SSD retained almost all of the data. 120 seconds should accumulate 5-10 MB of data. I would expect that TTL rows to be removed eventually and for the cassandra load to level off at some reasonable value near 10 MB. After running for 2 hours and with a cassandra load of ~550 MB I stopped the test. The schema is create keyspace test with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor : 1} and durable_writes = true; use test; create column family test with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'TimeUUIDType' and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'NONE' and bloom_filter_fp_chance = 1.0 and column_metadata = [ {column_name : 'a', validation_class : LongType}]; and the insert script is ?php require_once('phpcassa/1.0.a.5/autoload.php'); use phpcassa\Connection\ConnectionPool; use phpcassa\ColumnFamily; use phpcassa\SystemManager; use phpcassa\UUID; // Connect to test keyspace and column family $sys = new SystemManager('127.0.0.1'); // Start a connection pool, create our ColumnFamily instance $pool = new ConnectionPool('test', array('127.0.0.1')); $testCf = new ColumnFamily($pool, 'test'); // Insert records while( 1 ) { $testCf-insert(UUID::uuid1(), array(a = 1), null, 120); } // Close our connections $pool-close(); $sys-close(); ? -Bryan On Thu, Jan 17, 2013 at 10:11 AM, Bryan Talbot btal...@aeriagames.comwrote: We are using LCS and the particular row I've referenced has been involved in several compactions after all columns have TTL expired. The most recent one was again this morning and the row is still there -- TTL expired for several days now with gc_grace=0 and several compactions later ... $ ./bin/nodetool -h localhost getsstables metrics request_summary 459fb460-5ace-11e2-9b92-11d67b6163b4 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db $ ls -alF /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db -rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db $ ./bin/sstable2json /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump -e '36/1 %x') { 34353966623436302d356163652d313165322d396239322d313164363762363136336234: [[app_name,50f21d3d,1357785277207001,d], [client_ip,50f21d3d,1357785277207001,d], [client_req_id,50f21d3d,1357785277207001,d], [mysql_call_cnt,50f21d3d,1357785277207001,d], [mysql_duration_us,50f21d3d,1357785277207001,d], [mysql_failure_call_cnt,50f21d3d,1357785277207001,d], [mysql_success_call_cnt,50f21d3d,1357785277207001,d], [req_duration_us,50f21d3d,1357785277207001,d], [req_finish_time_us,50f21d3d,1357785277207001,d], [req_method,50f21d3d,1357785277207001,d], [req_service,50f21d3d,1357785277207001,d], [req_start_time_us,50f21d3d,1357785277207001,d], [success,50f21d3d,1357785277207001,d]] } My experience with TTL columns so far has been pretty similar to Viktor's in that the only way to keep them row count under control is to force major compactions. In real world use, STCS and LCS both leave TTL expired rows around forever as far as I can tell. When testing with minimal data, removal of TTL expired rows seem to work as expected but in this case there seems to be some divergence from real life work and test samples. -Bryan On Thu, Jan 17, 2013 at 1:47 AM, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote: @Bryan, ** ** To keep data size as low as possible with TTL columns we still use STCS and nightly major compactions. ** ** Experience with LCS was not successful in our case, data size keeps too high along with amount of compactions. ** ** IMO, before 1.2, LCS was good for CFs without TTL or high delete rate. I have not tested 1.2 LCS behavior, we’re still on 1.0.x ** ** ** ** Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform
Re: LCS not removing rows with all TTL expired columns
Bleh, I rushed out the email before some meetings and I messed something up. Working on reproducing now with better notes this time. -Bryan On Thu, Jan 17, 2013 at 4:45 PM, Derek Williams de...@fyrie.net wrote: When you ran this test, is that the exact schema you used? I'm not seeing where you are setting gc_grace to 0 (although I could just be blind, it happens). On Thu, Jan 17, 2013 at 5:01 PM, Bryan Talbot btal...@aeriagames.comwrote: I'm able to reproduce this behavior on my laptop using 1.1.5, 1.1.7, 1.1.8, a trivial schema, and a simple script that just inserts rows. If the TTL is small enough so that all LCS data fits in generation 0 then the rows seem to be removed with TTL expires as desired. However, if the insertion rate is high enough or the TTL long enough then the data keep accumulating for far longer than expected. Using 120 second TTL and a single threaded php insertion script my MBP with SSD retained almost all of the data. 120 seconds should accumulate 5-10 MB of data. I would expect that TTL rows to be removed eventually and for the cassandra load to level off at some reasonable value near 10 MB. After running for 2 hours and with a cassandra load of ~550 MB I stopped the test. The schema is create keyspace test with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor : 1} and durable_writes = true; use test; create column family test with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'TimeUUIDType' and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'NONE' and bloom_filter_fp_chance = 1.0 and column_metadata = [ {column_name : 'a', validation_class : LongType}]; and the insert script is ?php require_once('phpcassa/1.0.a.5/autoload.php'); use phpcassa\Connection\ConnectionPool; use phpcassa\ColumnFamily; use phpcassa\SystemManager; use phpcassa\UUID; // Connect to test keyspace and column family $sys = new SystemManager('127.0.0.1'); // Start a connection pool, create our ColumnFamily instance $pool = new ConnectionPool('test', array('127.0.0.1')); $testCf = new ColumnFamily($pool, 'test'); // Insert records while( 1 ) { $testCf-insert(UUID::uuid1(), array(a = 1), null, 120); } // Close our connections $pool-close(); $sys-close(); ? -Bryan On Thu, Jan 17, 2013 at 10:11 AM, Bryan Talbot btal...@aeriagames.comwrote: We are using LCS and the particular row I've referenced has been involved in several compactions after all columns have TTL expired. The most recent one was again this morning and the row is still there -- TTL expired for several days now with gc_grace=0 and several compactions later ... $ ./bin/nodetool -h localhost getsstables metrics request_summary 459fb460-5ace-11e2-9b92-11d67b6163b4 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db $ ls -alF /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db -rw-rw-r-- 1 sandra sandra 5246509 Jan 17 06:54 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db $ ./bin/sstable2json /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-448955-Data.db -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump -e '36/1 %x') { 34353966623436302d356163652d313165322d396239322d313164363762363136336234: [[app_name,50f21d3d,1357785277207001,d], [client_ip,50f21d3d,1357785277207001,d], [client_req_id,50f21d3d,1357785277207001,d], [mysql_call_cnt,50f21d3d,1357785277207001,d], [mysql_duration_us,50f21d3d,1357785277207001,d], [mysql_failure_call_cnt,50f21d3d,1357785277207001,d], [mysql_success_call_cnt,50f21d3d,1357785277207001,d], [req_duration_us,50f21d3d,1357785277207001,d], [req_finish_time_us,50f21d3d,1357785277207001,d], [req_method,50f21d3d,1357785277207001,d], [req_service,50f21d3d,1357785277207001,d], [req_start_time_us,50f21d3d,1357785277207001,d], [success,50f21d3d,1357785277207001,d]] } My experience with TTL columns so far has been pretty similar to Viktor's in that the only way to keep them row count under control is to force major compactions. In real world use, STCS and LCS both leave TTL expired rows around forever as far as I can tell. When testing with minimal data, removal of TTL expired rows seem to work as expected but in this case there seems to be some divergence from real life work and test samples. -Bryan On Thu, Jan 17, 2013 at 1:47 AM, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote: @Bryan, ** ** To keep data size as low as possible with TTL columns we still use STCS and nightly major compactions. ** ** Experience with LCS was not successful in our case, data size keeps too high along with amount of compactions
LCS not removing rows with all TTL expired columns
On cassandra 1.1.5 with a write heavy workload, we're having problems getting rows to be compacted away (removed) even though all columns have expired TTL. We've tried size tiered and now leveled and are seeing the same symptom: the data stays around essentially forever. Currently we write all columns with a TTL of 72 hours (259200 seconds) and expect to add 10 GB of data to this CF per day per node. Each node currently has 73 GB for the affected CF and shows no indications that old rows will be removed on their own. Why aren't rows being removed? Below is some data from a sample row which should have been removed several days ago but is still around even though it has been involved in numerous compactions since being expired. $ ./bin/nodetool -h localhost getsstables metrics request_summary 459fb460-5ace-11e2-9b92-11d67b6163b4 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db $ ls -alF /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db $ ./bin/sstable2json /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump -e '36/1 %x') { 34353966623436302d356163652d313165322d396239322d313164363762363136336234: [[app_name,50f21d3d,1357785277207001,d], [client_ip,50f21d3d,1357785277207001,d], [client_req_id,50f21d3d,1357785277207001,d], [mysql_call_cnt,50f21d3d,1357785277207001,d], [mysql_duration_us,50f21d3d,1357785277207001,d], [mysql_failure_call_cnt,50f21d3d,1357785277207001,d], [mysql_success_call_cnt,50f21d3d,1357785277207001,d], [req_duration_us,50f21d3d,1357785277207001,d], [req_finish_time_us,50f21d3d,1357785277207001,d], [req_method,50f21d3d,1357785277207001,d], [req_service,50f21d3d,1357785277207001,d], [req_start_time_us,50f21d3d,1357785277207001,d], [success,50f21d3d,1357785277207001,d]] } Decoding the column timestamps to shows that the columns were written at Thu, 10 Jan 2013 02:34:37 GMT and that their TTL expired at Sun, 13 Jan 2013 02:34:37 GMT. The date of the SSTable shows that it was generated on Jan 16 which is 3 days after all columns have TTL-ed out. The schema shows that gc_grace is set to 0 since this data is write-once, read-seldom and is never updated or deleted. create column family request_summary with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 0 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'NONE' and bloom_filter_fp_chance = 1.0 and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'}; Thanks in advance for help in understanding why rows such as this are not removed! -Bryan
Re: LCS not removing rows with all TTL expired columns
According to the timestamps (see original post) the SSTable was written (thus compacted compacted) 3 days after all columns for that row had expired and 6 days after the row was created; yet all columns are still showing up in the SSTable. Note that the column shows now rows when a get for that key is run so that's working correctly, but the data is lugged around far longer than it should be -- maybe forever. -Bryan On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh ailin...@gmail.com wrote: To get column removed you have to meet two requirements 1. column should be expired 2. after that CF gets compacted I guess your expired columns are propagated to high tier CF, which gets compacted rarely. So, you have to wait when high tier CF gets compacted. Andrey On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot btal...@aeriagames.comwrote: On cassandra 1.1.5 with a write heavy workload, we're having problems getting rows to be compacted away (removed) even though all columns have expired TTL. We've tried size tiered and now leveled and are seeing the same symptom: the data stays around essentially forever. Currently we write all columns with a TTL of 72 hours (259200 seconds) and expect to add 10 GB of data to this CF per day per node. Each node currently has 73 GB for the affected CF and shows no indications that old rows will be removed on their own. Why aren't rows being removed? Below is some data from a sample row which should have been removed several days ago but is still around even though it has been involved in numerous compactions since being expired. $ ./bin/nodetool -h localhost getsstables metrics request_summary 459fb460-5ace-11e2-9b92-11d67b6163b4 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db $ ls -alF /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db -rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db $ ./bin/sstable2json /virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db -k $(echo -n 459fb460-5ace-11e2-9b92-11d67b6163b4 | hexdump -e '36/1 %x') { 34353966623436302d356163652d313165322d396239322d313164363762363136336234: [[app_name,50f21d3d,1357785277207001,d], [client_ip,50f21d3d,1357785277207001,d], [client_req_id,50f21d3d,1357785277207001,d], [mysql_call_cnt,50f21d3d,1357785277207001,d], [mysql_duration_us,50f21d3d,1357785277207001,d], [mysql_failure_call_cnt,50f21d3d,1357785277207001,d], [mysql_success_call_cnt,50f21d3d,1357785277207001,d], [req_duration_us,50f21d3d,1357785277207001,d], [req_finish_time_us,50f21d3d,1357785277207001,d], [req_method,50f21d3d,1357785277207001,d], [req_service,50f21d3d,1357785277207001,d], [req_start_time_us,50f21d3d,1357785277207001,d], [success,50f21d3d,1357785277207001,d]] } Decoding the column timestamps to shows that the columns were written at Thu, 10 Jan 2013 02:34:37 GMT and that their TTL expired at Sun, 13 Jan 2013 02:34:37 GMT. The date of the SSTable shows that it was generated on Jan 16 which is 3 days after all columns have TTL-ed out. The schema shows that gc_grace is set to 0 since this data is write-once, read-seldom and is never updated or deleted. create column family request_summary with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 0 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'NONE' and bloom_filter_fp_chance = 1.0 and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'}; Thanks in advance for help in understanding why rows such as this are not removed! -Bryan
Re: State of Cassandra and Java 7
Brian, did any of your issues with java 7 result in corrupting data in cassandra? We just ran into an issue after upgrading a test cluster from Cassandra 1.1.5 and Oracle JDK 1.6.0_29-b11 to Cassandra 1.1.7 and 7u10. What we saw is values in columns with validation Class=org.apache.cassandra.db.marshal.LongType that were proper integers becoming corrupted so that they become stored as strings. I don't have a reproducible test case yet but will work on making one over the holiday if I can. For example, a column with a long type that was originally written and stored properly (say with value 1200) was somehow changed during cassandra operations (compaction seems the only possibility) to be the value '1200' with quotes. The data was written using the phpcassa library and that application and library haven't been changed. This has only happened on our test cluster which was upgraded and hasn't happened on our live cluster which was not upgraded. Many of our column families were affected and all affected columns are Long (or bigint for cql3). Errors when reading using CQL3 command client look like this: Failed to decode value '1356441225' (for column 'expires') as bigint: unpack requires a string argument of length 8 and when reading with cassandra-cli the error is [default@cf] get token['fbc1e9f7cc2c0c2fa186138ed28e5f691613409c0bcff648c651ab1f79f9600b']; = (column=client_id, value=8ec4c29de726ad4db3f89a44cb07909c04f90932d, timestamp=1355836425784329, ttl=648000) A long is exactly 8 bytes: 10 -Bryan On Mon, Dec 17, 2012 at 7:33 AM, Brian Tarbox tar...@cabotresearch.comwrote: I was using jre-7u9-linux-x64 which was the latest at the time. I'll confess that I did not file any bugs...at the time the advice from both the Cassandra and Zookeeper lists was to stay away from Java 7 (and my boss had had enough of my reporting that *the problem was Java 7* for me to spend a lot more time getting the details). Brian On Sun, Dec 16, 2012 at 4:54 AM, Sylvain Lebresne sylv...@datastax.comwrote: On Sat, Dec 15, 2012 at 7:12 PM, Michael Kjellman mkjell...@barracuda.com wrote: What issues have you ran into? Actually curious because we push 1.1.5-7 really hard and have no issues whatsoever. A related question is which which version of java 7 did you try? The first releases of java 7 were apparently famous for having many issues but it seems the more recent updates are much more stable. -- Sylvain On Dec 15, 2012, at 7:51 AM, Brian Tarbox tar...@cabotresearch.com wrote: We've reverted all machines back to Java 6 after running into numerous Java 7 issues...some running Cassandra, some running Zookeeper, others just general problems. I don't recall any other major language release being such a mess. On Fri, Dec 14, 2012 at 5:07 PM, Bill de hÓra b...@dehora.net wrote: At least that would be one way of defining officially supported. Not quite, because, Datastax is not Apache Cassandra. the only issue related to Java 7 that I know of is CASSANDRA-4958, but that's osx specific (I wouldn't advise using osx in production anyway) and it's not directly related to Cassandra anyway so you can easily use the beta version of snappy-java as a workaround if you want to. So that non blocking issue aside, and as far as we know, Cassandra supports Java 7. Is it rock-solid in production? Well, only repeated use in production can tell, and that's not really in the hand of the project. Exactly right. If enough people use Cassandra on Java7 and enough people file bugs about Java 7 and enough people work on bugs for Java 7 then Cassandra will eventually work well enough on Java7. Bill On 14 Dec 2012, at 19:43, Drew Kutcharian d...@venarc.com wrote: In addition, the DataStax official documentation states: Versions earlier than 1.6.0_19 should not be used. Java 7 is not recommended. http://www.datastax.com/docs/1.1/install/install_rpm On Dec 14, 2012, at 9:42 AM, Aaron Turner synfina...@gmail.com wrote: Does Datastax (or any other company) support Cassandra under Java 7? Or will they tell you to downgrade when you have some problem, because they don't support C* running on 7? At least that would be one way of defining officially supported. On Fri, Dec 14, 2012 at 2:22 AM, Sylvain Lebresne sylv...@datastax.com wrote: What kind of official statement do you want? As far as I can be considered an official voice of the project, my statement is: various people run in production with Java 7 and it seems to work. Or to answer the initial question, the only issue related to Java 7 that I know of is CASSANDRA-4958, but that's osx specific (I wouldn't advise using osx in production anyway) and it's not directly related to Cassandra anyway so you can easily use the beta version of snappy-java as a workaround if you want to. So that non blocking issue aside, and as far as we know, Cassandra supports Java 7. Is it rock-solid
Re: CQL timestamps and timezones
With 1.1.5, the TS is displayed with the local timezone and seems correct. cqlsh:bat create table test (id uuid primary key, ts timestamp ); cqlsh:bat insert into test (id,ts) values ( '89d09c88-40ac-11e2-a1e2-6067201fae78', '2012-12-07T10:00:00-'); cqlsh:bat select * from test; id | ts --+-- 89d09c88-40ac-11e2-a1e2-6067201fae78 | 2012-12-07 02:00:00-0800 cqlsh:bat -Bryan On Fri, Dec 7, 2012 at 1:14 PM, B. Todd Burruss bto...@gmail.com wrote: trying to figure out if i'm doing something wrong or a bug. i am creating a simple schema, inserting a timestamp using ISO8601 format, but when retrieving the timestamp, the timezone is displayed incorrectly. i'm inserting using GMT, the result is shown with +, but the time is for my local timezone (-0800) tried with 1.1.6 (DSE 2.2.1), and 1.2.0-rc1-SNAPSHOT here's the trace: bin/cqlsh Connected to Test Cluster at localhost:9160. [cqlsh 2.3.0 | Cassandra 1.2.0-rc1-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.35.0] Use HELP for help. cqlsh CREATE KEYSPACE btoddb WITH replication = {'class':'SimpleStrategy', 'replication_factor':1}; cqlsh cqlsh USE btoddb; cqlsh:btoddb CREATE TABLE test ( ... id uuid PRIMARY KEY, ... ts TIMESTAMP ... ); cqlsh:btoddb cqlsh:btoddb INSERT INTO test ... (id, ts) ... values ( ... '89d09c88-40ac-11e2-a1e2-6067201fae78', ... '2012-12-07T10:00:00-' ... ); cqlsh:btoddb cqlsh:btoddb SELECT * FROM test; id | ts --+-- 89d09c88-40ac-11e2-a1e2-6067201fae78 | 2012-12-07 02:00:00+ cqlsh:btoddb
Re: need some help with row cache
The row cache itself is global and the size is set with row_cache_size_in_mb. It must be enabled per CF using the proper settings. CQL3 isn't complete yet in C* 1.1 so if the cache settings aren't shown there, then you'll probably need to use cassandra-cli. -Bryan On Tue, Nov 27, 2012 at 10:41 PM, Wz1975 wz1...@yahoo.com wrote: Use cassandracli. Thanks. -Wei Sent from my Samsung smartphone on ATT Original message Subject: Re: need some help with row cache From: Yiming Sun yiming@gmail.com To: user@cassandra.apache.org CC: Also, what command can I used to see the caching setting? DESC TABLE cf doesn't list caching at all. Thanks. -- Y. On Wed, Nov 28, 2012 at 12:15 AM, Yiming Sun yiming@gmail.com wrote: Hi Bryan, Thank you very much for this information. So in other words, the settings such as row_cache_size_in_mb in YAML alone are not enough, and I must also specify the caching attribute on a per column family basis? -- Y. On Tue, Nov 27, 2012 at 11:57 PM, Bryan Talbot btal...@aeriagames.com wrote: On Tue, Nov 27, 2012 at 8:16 PM, Yiming Sun yiming@gmail.com wrote: Hello, but it is not clear to me where this setting belongs to, because even in the v1.1.6 conf/cassandra.yaml, there is no such property, and apparently adding this property to the yaml causes a fatal configuration error upon server startup, It's a per column family setting that can be applied using the CLI or CQL. With CQL3 it would be ALTER TABLE cf WITH caching = 'rows_only'; to enable the row cache but no key cache for that CF. -Bryan
Re: outOfMemory error
Well, asking for 500MB of data at once for a server with such modest specs is asking for troubles. Here are my suggestions. Disable the 1 GB row cache Consider allocating that memory for the java heap Xms2500m Xmx2500m Don't fetch all the columns at once -- page through them a slice at a time Increase the memtable to more than 64 MB if you want to write data to this cluster -Bryan On Wed, Nov 28, 2012 at 5:06 AM, Damien Lejeune d.leje...@pepite.be wrote: Hi all, I'm currently experiencing a outOfMemory problem with Cassandra-1.1.6 on Windows XP-Pro (32-bit). The server crashes when I try to query it with a relatively small amount of data (around 100 rows with 5 columns each: to be precise, on my configuration, querying 75 or more rows makes the server to crash). I tried with different library (Hector, JDBC, Thrift) and with the Cassandra stress tool. All lead to the same outOfMemory problem. My dataset is composed, for each row, of: 1 column in DateType, 4 columns in DoubleType. I ran a query to fetch the entire dataset (around 330MB for the raw data + around 200MB for the metadata) and got the log at the end of this message. I also checked the heap-dump with Mat which displays these top values: Class Name Objects Shallow Heap java.nio.HeapByteBuffer 16,253,559 780,170,832 bytes[] 16,254,013 330,207,640 -- Data ? java.util.TreeMap$Entry8,126,711 260,054,752 org.apache.cassandra.db.Column 8,116,589 194,798,136 -- Metadata ? I tried to change the configuration in Cassandra for the values: - row_cache_size_in_mb: tried different value between [0,1000] MB - flush_largest_memtables_at: set to 0.1, but tried with 0.75 - reduce_cache_sizes_at: tried 0.85, 0.6, 0.2 and 0.1 - reduce_cache_capacity_to: tried 0.6 and 0.15 - memtable_total_space_in_mb: 64 MB, but also tried to disable it (- 1/3 of the heap) - Xms1G - Xmx1500M with no real observable improvements regarding my problem. My Cassandra server and client both run on the same machine. Here are the characteristics of my system configuration: - Cassandra-1.1.6 - java version 1.6.0_20 Java(TM) SE Runtime Environment (build 1.6.0_20-b02) Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing) - Windows XP-Pro 32 bits with service pack 3 - CPU double-core, 32 bits @2.26GHz - 3.48 of RAM I'm aware that my system configuration is not an optimized environment to make Cassandra to run efficiently, but I wonder if you guys know a workaround (or any idea on how) to fix this problem. Part of the answer is probably that I do not have enough RAM to run the process, but I also wonder if it is a 'normal' behaviour for Cassandra to handle this particular test case that way. Cheers, Damien Cassandra's LOG --- Starting Cassandra Server INFO 09:10:27,171 Logging initialized INFO 09:10:27,171 JVM vendor/version: Java HotSpot(TM) Client VM/1.6.0_18 INFO 09:10:27,171 Heap size: 1072103424/1569521664 INFO 09:10:27,171 Classpath:
Re: need some help with row cache
On Tue, Nov 27, 2012 at 8:16 PM, Yiming Sun yiming@gmail.com wrote: Hello, but it is not clear to me where this setting belongs to, because even in the v1.1.6 conf/cassandra.yaml, there is no such property, and apparently adding this property to the yaml causes a fatal configuration error upon server startup, It's a per column family setting that can be applied using the CLI or CQL. With CQL3 it would be ALTER TABLE cf WITH caching = 'rows_only'; to enable the row cache but no key cache for that CF. -Bryan
Re: Admin for cassandra?
The https://github.com/sebgiroux/Cassandra-Cluster-Admin app does some of what you're asking. It allows basic browsing and some admin functionality. If you want to run actual CQL queries though, you currently need to use another app for that (like cqlsh). -Bryan On Thu, Nov 15, 2012 at 11:30 PM, Timmy Turner timm.t...@gmail.com wrote: I think an eclipse plugin would be the wrong way to go here. Most people probably just want to browse through the columnfamilies and see whether their queries work out or not. This functionality is imho best implemented as some form of a light-weight editor, not a full blown IDE. I do have something of this kind scheduled as small part of a larger project (seeing as how there is currently no properly working tool that provides this functionality), but concrete results are probably still a few months out.. 2012/11/16 Edward Capriolo edlinuxg...@gmail.com We should build an eclipse plugin named Eclipsandra or something. On Thu, Nov 15, 2012 at 9:45 PM, Wz1975 wz1...@yahoo.com wrote: Cqlsh is probably the closest you will get. Or pay big bucks to hire someone to develop one for you:) Thanks. -Wei Sent from my Samsung smartphone on ATT Original message Subject: Admin for cassandra? From: Kevin Burton rkevinbur...@charter.net To: user@cassandra.apache.org CC: Is there an IDE for a Cassandra database? Similar to the SQL Server Management Studio for SQL server. I mainly want to execute queries and see the results. Preferably that runs under a Windows OS. Thank you.
Re: How to upgrade a ring (0.8.9 nodes) to 1.1.5 with the minimal downtime?
Do a rolling upgrade of the ring to 1.0.12 first and then upgrade to 1.1.x. After each rolling upgrade, you should probably do the recommend nodetool upgradesstables, etc. The datastax documentation about upgrading might be helpful for you: http://www.datastax.com/docs/1.1/install/upgrading -Bryan On Mon, Nov 5, 2012 at 10:55 AM, Yan Wu y...@prospricing.com wrote: Hello, I have a Cassandra ring with 4 nodes in 0.8.9 and like to upgrade all nodes to 1.1.5. It would be great that the upgrade has no downtime or minimal downtime of the ring. After I brought down one of the nodes and upgraded it to 1.1.5, when I tried to bring it up, the new 1.1.5 node looks good but the rest of three 0.8.9 nodes started throwing exceptions: --- Fatal exception in thread Thread[GossipStage:2,5,main] java.lang.UnsupportedOperationException: Not a time-based UUID at org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:92) at org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:75) at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:707) at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:750) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:809) at org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) Then later ERROR 12:03:20,925 Fatal exception in thread Thread[HintedHandoff:1,1,main] java.lang.RuntimeException: java.lang.RuntimeException: Could not reach schema agreement with /xx.xx.xx.xx in 6ms at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: Could not reach schema agreement with /xx.xx.xx.xx in 6ms at org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:293) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:304) at org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:89) at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:397) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 3 more Any suggestions? Thanks in advance. Yan
Re: repair, compaction, and tombstone rows
As the OP of this thread, it is a big itch for my use case. Repair ends up streaming tens of gigabytes of data which has expired TTL and has been compacted away on some nodes but not yet on others. The wasted work is not nice plus it drives up the memory usage (for bloom filters, indexes, etc) of all nodes since there are many more rows to track than planned. Disabling the periodic repair lowered the per-node load by 100GB which was all dead data in my case. -Bryan On Mon, Nov 5, 2012 at 5:12 PM, horschi hors...@gmail.com wrote: That's true, we could just create an already gcable tombstone. It's a bit of an abuse of the localDeletionTime but why not. Honestly a good part of the reason we haven't done anything yet is because we never really had anything for which tombstones of expired columns where a big pain point. Again, feel free to open a ticket (but what we should do is retrieve the ttl from the localExpirationTime when creating the tombstone, not using the creation time (partly because that creation time is a user provided timestamp so we can't use it, and because we must still keep tombstones if the ttl gcGrace)). Created CASSANDRA-4917. I changed the example implementation to use (localExpirationTime-timeToLive) for the tombstone. I agree this is not the biggest itch to scratch. But it might save a few seeks here and there :-) Did you also have a look at DeletedColumn? It uses the updateDigest implementation from its parent class, which applies also the value to the digest. Unfortunetaly the value is the localDeletionTime, which is being generated on each node individually, right? (at RowMutation.delete) The resolution of the time is low, so there is a good chance the timestamps will match on all nodes, but that should be nothing to rely on. cheers, Christian
Re: repair, compaction, and tombstone rows
It seems like CASSANDRA-3442 might be an effective fix for this issue assuming that I'm reading it correctly. It sounds like the intent is to automatically compact SSTables when a certain percent of the columns are gcable by being deleted or with expired tombstones. Is my understanding correct? Would such tables be compacted individually (1-1) or are several eligible tables selected and compacted using the STCS compaction threshold bounds? -Bryan On Thu, Nov 1, 2012 at 9:43 AM, Rob Coli rc...@palominodb.com wrote: On Thu, Nov 1, 2012 at 1:43 AM, Sylvain Lebresne sylv...@datastax.com wrote: on all your columns), you may want to force a compaction (using the JMX call forceUserDefinedCompaction()) of that sstable. The goal being to get read of a maximum of outdated tombstones before running the repair (you could also alternatively run a major compaction prior to the repair, but major compactions have a lot of nasty effect so I wouldn't recommend that a priori). If sstablesplit (reverse compaction) existed, major compaction would be a simple solution to this case. You'd major compact and then split your One Giant SSTable With No Tombstones into a number of smaller ones. :) https://issues.apache.org/jira/browse/CASSANDRA-4766 =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: Cassandra upgrade issues...
Note that 1.0.7 came out before 1.1 and I know there were some compatibility issues that were fixed in later 1.0.x releases which could affect your upgrade. I think it would be best to first upgrade to the latest 1.0.x release, and then upgrade to 1.1.x from there. -Bryan On Thu, Nov 1, 2012 at 1:27 AM, Brian Fleming bigbrianflem...@gmail.comwrote: Hi Sylvain, Simple as that!!! Using the 1.1.5 nodetool version works as expected. My mistake. Many thanks, Brian On Thu, Nov 1, 2012 at 8:24 AM, Sylvain Lebresne sylv...@datastax.comwrote: The first thing I would check is if nodetool is using the right jar. I sounds a lot like if the server has been correctly updated but nodetool haven't and still use the old classes. Check the nodetool executable, it's a shell script, and try echoing the CLASSPATH in there and check it correctly point to what it should. -- Sylvain On Thu, Nov 1, 2012 at 9:10 AM, Brian Fleming bigbrianflem...@gmail.com wrote: Hi, I was testing upgrading from Cassandra v.1.0.7 to v.1.1.5 yesterday on a single node dev cluster with ~6.5GB of data it went smoothly in that no errors were thrown, the data was migrated to the new directory structure, I can still read/write data as expected, etc. However nodetool commands are behaving strangely – full details below. I couldn’t find anything relevant online relating to these exceptions – any help/pointers would be greatly appreciated. Thanks Regards, Brian ‘nodetool cleanup’ runs successfully ‘nodetool info’ produces : Token: 82358484304664259547357526550084691083 Gossip active: true Load : 7.69 GB Generation No: 1351697611 Uptime (seconds) : 58387 Heap Memory (MB) : 936.91 / 1928.00 Exception in thread main java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.cassandra.dht.Token at org.apache.cassandra.tools.NodeProbe.getEndpoint(NodeProbe.java:546) at org.apache.cassandra.tools.NodeProbe.getDataCenter(NodeProbe.java:559) at org.apache.cassandra.tools.NodeCmd.printInfo(NodeCmd.java:313) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:651) ‘nodetool repair’ produces : Exception in thread main java.lang.reflect.UndeclaredThrowableException at $Proxy0.forceTableRepair(Unknown Source) at org.apache.cassandra.tools.NodeProbe.forceTableRepair(NodeProbe.java:203) at org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:880) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:719) Caused by: javax.management.ReflectionException: Signature mismatch for operation forceTableRepair: (java.lang.String, [Ljava.lang.String;) should be (java.lang.String, boolean, [Ljava.lang.String;) at com.sun.jmx.mbeanserver.PerInterface.noSuchMethod(PerInterface.java:152) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:117) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at
repair, compaction, and tombstone rows
I've been experiencing a behavior that is undesirable and it seems like a bug that causes a high amount of wasted work. I have a CF where all columns have a TTL, are generally all inserted in a very short period of time (less than a second) and are never over-written or explicitly deleted. Eventually one node will run a compaction and remove rows containing only tombstones greater than gc_grace_seconds old which is expected. The problem comes up when a repair is run. During the repair the other nodes that haven't run a compaction and still have the tombstoned rows fix the inconsistency and stream the rows (which contain only a tombstone which is more than gc_grace_seconds old) back to the node which had compacted that row away. This ends up occurring over and over and uses a lot of time, storage, and bandwidth to keep repairing rows that are intentionally missing. I think the issue stems from the behavior of compaction of TTL rows and repair. The compaction of TTL rows is a node-local event which will eventually cause tombstoned rows to disappear from the one node doing the compaction and then get repaired from replicas later. I guess this could happen for rows which are explicitly deleted as well. Is this a feature or a bug? How can I avoid repair of rows that were correctly removed via compaction from one node but not from replicas just because compactions run independently on each node? Every repair ends up streaming tens of gigabytes of missing rows to and from replicas. Cassandra 1.1.5 with size tiered compaction strategy and RF=3 -Bryan
Re: constant CMS GC using CPU time
On Thu, Oct 25, 2012 at 4:15 AM, aaron morton aa...@thelastpickle.comwrote: This sounds very much like my heap is so consumed by (mostly) bloom filters that I am in steady state GC thrash. Yes, I think that was at least part of the issue. The rough numbers I've used to estimate working set are: * bloom filter size for 400M rows at 0.00074 fp without java fudge (they are just a big array) 714 MB * memtable size 1024 MB * index sampling: * 24 bytes + key (16 bytes for UUID) = 32 bytes * 400M / 128 default sampling = 3,125,000 * 3,125,000 * 32 = 95 MB * java fudge X5 or X10 = 475MB to 950MB * ignoring row cache and key cache So the high side number is 2213 to 2,688. High because the fudge is a delicious sticky guess and the memtable space would rarely be full. On a 5120 MB heap, with 800MB new you have roughly 4300 MB tenured (some goes to perm) and 75% of that is 3,225 MB. Not terrible but it depends on the working set and how quickly stuff get's tenured which depends on the workload. These values seem reasonable and in line with what I was seeing. There are other CF and apps sharing this cluster but this one was the largest. You can confirm these guesses somewhat manually by enabling all the GC logging in cassandra-env.sh. Restart the node and let it operate normally, probably best to keep repair off. I was using jstat to monitor gc activity and some snippets from that are in my original email in this thread. The key behavior was that full gc was running pretty often and never able to reclaim much (if any) space. There are a few things you could try: * increase the JVM heap by say 1Gb and see how it goes * increase bloom filter false positive, try 0.1 first (see http://www.datastax.com/docs/1.1/configuration/storage_configuration#bloom-filter-fp-chance ) * increase index_interval sampling in yaml. * decreasing compaction_throughput and in_memory_compaction_limit can lesson the additional memory pressure compaction adds. * disable caches or ensure off heap caches are used. I've done several of these already in addition to changing the app to reduce the number of rows retained. How does compaction_throughput relate to memory usage? I assumed that was more for IO tuning. I noticed that lowering concurrent_compactors to 4 (from default of 8) lowered the memory used during compactions. in_memory_compaction_limit_in_mb seems to only be used for wide rows and this CF didn't have any wider than in_memory_compaction_limit_in_mb. My multithreaded_compaction is still false. Watching the gc logs and the cassandra log is a great way to get a feel for what works in your situation. Also take note of any scheduled processing your app does which may impact things, and look for poorly performing queries. Finally this book is a good reference on Java GC http://amzn.com/0137142528 For my understanding what was the average row size for the 400 million keys ? The compacted row mean size for the CF is 8815 (as reported by cfstats) but that comes out to be much larger than the real load per node I was seeing. Each node had about 200GB of data for the CF with 4 nodes in the cluster and RF=3. At the time, the TTL for all columns was 3 days and gc_grace_seconds was 5 days. Since then I've reduced the TTL to 1 hour and set gc_grace_seconds to 0 so the number of rows and data dropped to a level it can handle. -Bryan
Re: constant CMS GC using CPU time
On Wed, Oct 24, 2012 at 2:38 PM, Rob Coli rc...@palominodb.com wrote: On Mon, Oct 22, 2012 at 8:38 AM, Bryan Talbot btal...@aeriagames.com wrote: The nodes with the most data used the most memory. All nodes are affected eventually not just one. The GC was on-going even when the nodes were not compacting or running a heavy application load -- even when the main app was paused constant the GC continued. This sounds very much like my heap is so consumed by (mostly) bloom filters that I am in steady state GC thrash. Yes, I think that was at least part of the issue. Do you have heap graphs which show a healthy sawtooth GC cycle which then more or less flatlines? I didn't save any graphs but that is what they would look like. I was using jstat to monitor gc activity. -Bryan
Re: constant CMS GC using CPU time
These GC settings are the default (recommended?) settings from cassandra-env. I added the UseCompressedOops. -Bryan On Mon, Oct 22, 2012 at 6:15 PM, Will @ SOHO w...@voodoolunchbox.comwrote: On 10/22/2012 09:05 PM, aaron morton wrote: # GC tuning options JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=1 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseCompressedOops You are too far behind the reference JVM's. Parallel GC is the preferred and highest performing form in the current Security Baseline version of the JVM's. -- Bryan Talbot Architect / Platform team lead, Aeria Games and Entertainment Silicon Valley | Berlin | Tokyo | Sao Paulo
Re: constant CMS GC using CPU time
On Mon, Oct 22, 2012 at 6:05 PM, aaron morton aa...@thelastpickle.comwrote: The GC was on-going even when the nodes were not compacting or running a heavy application load -- even when the main app was paused constant the GC continued. If you restart a node is the onset of GC activity correlated to some event? Yes and no. When the nodes were generally under the .75 occupancy threshold a weekly repair -pr job would cause them to go over the threshold and then stay there even after the repair had completed and there were no ongoing compactions. It acts as though at least some substantial amount of memory used during repair was never dereferenced once the repair was complete. Once one CF in particular grew larger the constant GC would start up pretty soon (less than 90 minutes) after a node restart even without a repair. As a test we dropped the largest CF and the memory usage immediately dropped to acceptable levels and the constant GC stopped. So it's definitely related to data load. memtable size is 1 GB, row cache is disabled and key cache is small (default). How many keys did the CF have per node? I dismissed the memory used to hold bloom filters and index sampling. That memory is not considered part of the memtable size, and will end up in the tenured heap. It is generally only a problem with very large key counts per node. I've changed the app to retain less data for that CF but I think that it was about 400M rows per node. Row keys are a TimeUUID. All of the rows are write-once, never updated, and rarely read. There are no secondary indexes for this particular CF. They were 2+ GB (as reported by nodetool cfstats anyway). It looks like the default bloom_filter_fp_chance defaults to 0.0 The default should be 0.000744. If the chance is zero or null this code should run when a new SSTable is written // paranoia -- we've had bugs in the thrift - avro - CfDef dance before, let's not let that break things logger.error(Bloom filter FP chance of zero isn't supposed to happen); Were the CF's migrated from an old version ? Yes, the CF were created in 1.0.9, then migrated to 1.0.11 and finally to 1.1.5 with a upgradesstables run at each upgrade along the way. I could not find a way to view the current bloom_filter_fp_chance settings when they are at a default value. JMX reports the actual fp rate and if a specific rate is set for a CF that shows up in describe table but I couldn't find out how to tell what the default was. I didn't inspect the source. Is there any way to predict how much memory the bloom filters will consume if the size of the row keys, number or rows is known, and fp chance is known? See o.a.c.utils.BloomFilter.getFilter() in the code This http://hur.st/bloomfilter appears to give similar results. Ahh, very helpful. This indicates that 714MB would be used for the bloom filter for that one CF. JMX / cfstats reports Bloom Filter Space Used but the MBean method name (getBloomFilterDiskSpaceUsed) indicates this is the on-disk space. If on-disk and in-memory space used is similar then summing up all the Bloom Filter Space Used says they're currently consuming 1-2 GB of the heap which is substantial. If a CF is rarely read is it safe to set bloom_filter_fp_chance to 1.0? It just means more trips to SSTable indexes for a read correct? Trade RAM for time (disk I/O). -Bryan
Re: constant CMS GC using CPU time
The memory usage was correlated with the size of the data set. The nodes were a bit unbalanced which is normal due to variations in compactions. The nodes with the most data used the most memory. All nodes are affected eventually not just one. The GC was on-going even when the nodes were not compacting or running a heavy application load -- even when the main app was paused constant the GC continued. As a test we dropped the largest CF and the memory usage immediately dropped to acceptable levels and the constant GC stopped. So it's definitely related to data load. memtable size is 1 GB, row cache is disabled and key cache is small (default). I believe one culprit turned out to be the bloom filters. They were 2+ GB (as reported by nodetool cfstats anyway). It looks like the default bloom_filter_fp_chance defaults to 0.0 even though guides recommend 0.10 as the minimum value. Raising that to 0.20 for some write-mostly CF reduced memory used by 1GB or so. Is there any way to predict how much memory the bloom filters will consume if the size of the row keys, number or rows is known, and fp chance is known? -Bryan On Mon, Oct 22, 2012 at 12:25 AM, aaron morton aa...@thelastpickle.comwrote: If you are using the default settings I would try to correlate the GC activity with some application activity before tweaking. If this is happening on one machine out of 4 ensure that client load is distributed evenly. See if the raise in GC activity us related to Compaction, repair or an increase in throughput. OpsCentre or some other monitoring can help with the last one. Your mention of TTL makes me think compaction may be doing a bit of work churning through rows. Some things I've done in the past before looking at heap settings: * reduce compaction_throughput to reduce the memory churn * reduce in_memory_compaction_limit * if needed reduce concurrent_compactors Currently it seems like the memory used scales with the amount of bytes stored and not with how busy the server actually is. That's not such a good thing. The memtable_total_space_in_mb in yaml tells C* how much memory to devote to the memtables. That with the global row cache setting says how much memory will be used with regard to storing data and it will not increase inline with the static data load. Now days GC issues are typically due to more dynamic forces, like compaction, repair and throughput. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 20/10/2012, at 6:59 AM, Bryan Talbot btal...@aeriagames.com wrote: ok, let me try asking the question a different way ... How does cassandra use memory and how can I plan how much is needed? I have a 1 GB memtable and 5 GB total heap and that's still not enough even though the number of concurrent connections and garbage generation rate is fairly low. If I were using mysql or oracle, I could compute how much memory could be used by N concurrent connections, how much is allocated for caching, temp spaces, etc. How can I do this for cassandra? Currently it seems like the memory used scales with the amount of bytes stored and not with how busy the server actually is. That's not such a good thing. -Bryan On Thu, Oct 18, 2012 at 11:06 AM, Bryan Talbot btal...@aeriagames.comwrote: In a 4 node cluster running Cassandra 1.1.5 with sun jvm 1.6.0_29-b11 (64-bit), the nodes are often getting stuck in state where CMS collections of the old space are constantly running. The JVM configuration is using the standard settings in cassandra-env -- relevant settings are included below. The max heap is currently set to 5 GB with 800MB for new size. I don't believe that the cluster is overly busy and seems to be performing well enough other than this issue. When nodes get into this state they never seem to leave it (by freeing up old space memory) without restarting cassandra. They typically enter this state while running nodetool repair -pr but once they start doing this, restarting them only fixes it for a couple of hours. Compactions are completing and are generally not queued up. All CF are using STCS. The busiest CF consumes about 100GB of space on disk, is write heavy, and all columns have a TTL of 3 days. Overall, there are 41 CF including those used for system keyspace and secondary indexes. The number of SSTables per node currently varies from 185-212. Other than frequent log warnings about GCInspector - Heap is 0.xxx full... and StorageService - Flushing CFS(...) to relieve memory pressure there are no other log entries to indicate there is a problem. Does the memory needed vary depending on the amount of data stored? If so, how can I predict how much jvm space is needed? I don't want to make the heap too large as that's bad too. Maybe there's a memory leak related to compaction that doesn't allow meta-data to be purged? -Bryan 12 GB of RAM
Re: constant CMS GC using CPU time
ok, let me try asking the question a different way ... How does cassandra use memory and how can I plan how much is needed? I have a 1 GB memtable and 5 GB total heap and that's still not enough even though the number of concurrent connections and garbage generation rate is fairly low. If I were using mysql or oracle, I could compute how much memory could be used by N concurrent connections, how much is allocated for caching, temp spaces, etc. How can I do this for cassandra? Currently it seems like the memory used scales with the amount of bytes stored and not with how busy the server actually is. That's not such a good thing. -Bryan On Thu, Oct 18, 2012 at 11:06 AM, Bryan Talbot btal...@aeriagames.comwrote: In a 4 node cluster running Cassandra 1.1.5 with sun jvm 1.6.0_29-b11 (64-bit), the nodes are often getting stuck in state where CMS collections of the old space are constantly running. The JVM configuration is using the standard settings in cassandra-env -- relevant settings are included below. The max heap is currently set to 5 GB with 800MB for new size. I don't believe that the cluster is overly busy and seems to be performing well enough other than this issue. When nodes get into this state they never seem to leave it (by freeing up old space memory) without restarting cassandra. They typically enter this state while running nodetool repair -pr but once they start doing this, restarting them only fixes it for a couple of hours. Compactions are completing and are generally not queued up. All CF are using STCS. The busiest CF consumes about 100GB of space on disk, is write heavy, and all columns have a TTL of 3 days. Overall, there are 41 CF including those used for system keyspace and secondary indexes. The number of SSTables per node currently varies from 185-212. Other than frequent log warnings about GCInspector - Heap is 0.xxx full... and StorageService - Flushing CFS(...) to relieve memory pressure there are no other log entries to indicate there is a problem. Does the memory needed vary depending on the amount of data stored? If so, how can I predict how much jvm space is needed? I don't want to make the heap too large as that's bad too. Maybe there's a memory leak related to compaction that doesn't allow meta-data to be purged? -Bryan 12 GB of RAM in host with ~6 GB used by java and ~6 GB for OS and buffer cache. $ free -m total used free sharedbuffers cached Mem: 12001 11870131 0 4 5778 -/+ buffers/cache: 6087 5914 Swap:0 0 0 jvm settings in cassandra-env MAX_HEAP_SIZE=5G HEAP_NEWSIZE=800M # GC tuning options JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=1 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseCompressedOops jstat shows about 12 full collections per minute with old heap usage constantly over 75% so CMS is always over the CMSInitiatingOccupancyFraction threshold. $ jstat -gcutil -t 22917 5000 4 Timestamp S0 S1 E O P YGC YGCTFGC FGCT GCT 132063.0 34.70 0.00 26.03 82.29 59.88 21580 506.887 17523 3078.941 3585.829 132068.0 34.70 0.00 50.02 81.23 59.88 21580 506.887 17524 3079.220 3586.107 132073.1 0.00 24.92 46.87 81.41 59.88 21581 506.932 17525 3079.583 3586.515 132078.1 0.00 24.92 64.71 81.40 59.88 21581 506.932 17527 3079.853 3586.785 Other hosts not currently experiencing the high CPU load have a heap less than .75 full. $ jstat -gcutil -t 6063 5000 4 Timestamp S0 S1 E O P YGC YGCTFGC FGCT GCT 520731.6 0.00 12.70 36.37 71.33 59.26 46453 1688.809 14785 2130.779 3819.588 520736.5 0.00 12.70 53.25 71.33 59.26 46453 1688.809 14785 2130.779 3819.588 520741.5 0.00 12.70 68.92 71.33 59.26 46453 1688.809 14785 2130.779 3819.588 520746.5 0.00 12.70 83.11 71.33 59.26 46453 1688.809 14785 2130.779 3819.588
constant CMS GC using CPU time
In a 4 node cluster running Cassandra 1.1.5 with sun jvm 1.6.0_29-b11 (64-bit), the nodes are often getting stuck in state where CMS collections of the old space are constantly running. The JVM configuration is using the standard settings in cassandra-env -- relevant settings are included below. The max heap is currently set to 5 GB with 800MB for new size. I don't believe that the cluster is overly busy and seems to be performing well enough other than this issue. When nodes get into this state they never seem to leave it (by freeing up old space memory) without restarting cassandra. They typically enter this state while running nodetool repair -pr but once they start doing this, restarting them only fixes it for a couple of hours. Compactions are completing and are generally not queued up. All CF are using STCS. The busiest CF consumes about 100GB of space on disk, is write heavy, and all columns have a TTL of 3 days. Overall, there are 41 CF including those used for system keyspace and secondary indexes. The number of SSTables per node currently varies from 185-212. Other than frequent log warnings about GCInspector - Heap is 0.xxx full... and StorageService - Flushing CFS(...) to relieve memory pressure there are no other log entries to indicate there is a problem. Does the memory needed vary depending on the amount of data stored? If so, how can I predict how much jvm space is needed? I don't want to make the heap too large as that's bad too. Maybe there's a memory leak related to compaction that doesn't allow meta-data to be purged? -Bryan 12 GB of RAM in host with ~6 GB used by java and ~6 GB for OS and buffer cache. $ free -m total used free sharedbuffers cached Mem: 12001 11870131 0 4 5778 -/+ buffers/cache: 6087 5914 Swap:0 0 0 jvm settings in cassandra-env MAX_HEAP_SIZE=5G HEAP_NEWSIZE=800M # GC tuning options JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=1 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:+UseCompressedOops jstat shows about 12 full collections per minute with old heap usage constantly over 75% so CMS is always over the CMSInitiatingOccupancyFraction threshold. $ jstat -gcutil -t 22917 5000 4 Timestamp S0 S1 E O P YGC YGCTFGC FGCT GCT 132063.0 34.70 0.00 26.03 82.29 59.88 21580 506.887 17523 3078.941 3585.829 132068.0 34.70 0.00 50.02 81.23 59.88 21580 506.887 17524 3079.220 3586.107 132073.1 0.00 24.92 46.87 81.41 59.88 21581 506.932 17525 3079.583 3586.515 132078.1 0.00 24.92 64.71 81.40 59.88 21581 506.932 17527 3079.853 3586.785 Other hosts not currently experiencing the high CPU load have a heap less than .75 full. $ jstat -gcutil -t 6063 5000 4 Timestamp S0 S1 E O P YGC YGCTFGC FGCT GCT 520731.6 0.00 12.70 36.37 71.33 59.26 46453 1688.809 14785 2130.779 3819.588 520736.5 0.00 12.70 53.25 71.33 59.26 46453 1688.809 14785 2130.779 3819.588 520741.5 0.00 12.70 68.92 71.33 59.26 46453 1688.809 14785 2130.779 3819.588 520746.5 0.00 12.70 83.11 71.33 59.26 46453 1688.809 14785 2130.779 3819.588
Re: hadoop consistency level
I believe that reading with CL.ONE will still cause read repair to be run (in the background) 'read_repair_chance' of the time. -Bryan On Thu, Oct 18, 2012 at 1:52 PM, Andrey Ilinykh ailin...@gmail.com wrote: On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman mkjell...@barracuda.com wrote: Not sure I understand your question (if there is one..) You are more than welcome to do CL ONE and assuming you have hadoop nodes in the right places on your ring things could work out very nicely. If you need to guarantee that you have all the data in your job then you'll need to use QUORUM. If you don't specify a CL in your job config it will default to ONE (at least that's what my read of the ConfigHelper source for 1.1.6 shows) I have two questions. 1. I can benefit from data locality (and Hadoop) only with CL ONE. Is it correct? 2. With CL QUORUM cassandra reads data from all replicas. In this case Hadoop doesn't give me any benefits. Application running outside the cluster has the same performance. Is it correct? Thank you, Andrey
Re: MBean cassandra.db.CompactionManager TotalBytesCompacted counts backwards
I'm attempting to plot how busy the node is doing compactions but there seems to only be a few metrics reported that might be suitable: CompletedTasks, PendingTasks, TotalBytesCompacted, TotalCompactionsCompleted. It's not clear to me what the difference between CompletedTasks and TotalCompactionsCompleted is but I am plotting TotalCompactionsCompleted / sec as one metric; however, this rate is nearly always less than 1 and doesn't capture how much resources are used doing the compaction. A compaction of 4 smallest SSTables counts the same as a compaction of 4 largest SSTables but the cost is hugely different. Thus, I'm also plotting TotalBytesCompacted / sec. Since the TotalBytesCompacted value sometimes moves backwards I'm not confident that it's reporting what it is meant to report. The code and comments indicate that it should only be incremented by the final size of the newly created SSTable or by the bytes-compacted-so-far for a larger compaction, so I don't see why it should be reasonable for it to sometimes decrease. How should the impact of compaction be measured if not by bytes compacted? -Bryan On Sun, Oct 7, 2012 at 7:39 AM, Edward Capriolo edlinuxg...@gmail.comwrote: I have not looked at this JMX object in a while, however the compaction manager can support multiple threads. Also it moves from 0-filesize each time it has to compact a set of files. That is more useful for showing current progress rather then lifetime history. On Fri, Oct 5, 2012 at 7:27 PM, Bryan Talbot btal...@aeriagames.com wrote: I've recently added compaction rate (in bytes / second) to my monitors for cassandra and am seeing some odd values. I wasn't expecting the values for TotalBytesCompacted to sometimes decrease from one reading to the next. It seems that the value should be monotonically increasing while a server is running -- obviously it would start again at 0 when the server is restarted or if the counter rolls over (unlikely for a 64 bit long). Below are two samples taken 60 seconds apart: the value decreased by 2,954,369,012 between the two readings. reported_metric=[timestamp:1349476449, status:200, request:[mbean:org.apache.cassandra.db:type=CompactionManager, attribute:TotalBytesCompacted, type:read], value:7548675470069] previous_metric=[timestamp:1349476389, status:200, request:[mbean:org.apache.cassandra.db:type=CompactionManager, attribute:TotalBytesCompacted, type:read], value:7551629839081] I briefly looked at the code for CompactionManager and a few related classes and don't see anyplace that is performing subtraction explicitly; however, there are many additions of signed long values that are not validated and could conceivably contain a negative value thus causing the totalBytesCompacted to decrease. It's interesting to note that the all of the differences I've seen so far are more than the overflow value of a signed 32 bit value. The OS (CentOS 5.7) and sun java vm (1.6.0_29) are both 64 bit. JNA is enabled. Is this expected and normal? If so, what is the correct interpretation of this metric? I'm seeing the negatives values a few times per hour when reading it once every 60 seconds. -Bryan -- Bryan Talbot Architect / Platform team lead, Aeria Games and Entertainment Silicon Valley | Berlin | Tokyo | Sao Paulo
Re: what's the most 1.1 stable version?
We've been using 1.1.5 for a few weeks now and it's been stable for our uses. Also, make sure you upgrade to a more recent version of 1.0 branch before going to 1.1. Version 1.0.7 was released before 1.1 and there are upgrade-path fixed applied to 1.0 after that. Our upgrade path was 1.0.9 - 1.0.11 - 1.1.5 which worked well. -Bryan On Fri, Oct 5, 2012 at 8:01 AM, Andrey Ilinykh ailin...@gmail.com wrote: In 1.1.5 file descriptor leak was fixed. In my case it was critical. Nodes went down every several days. But not everyone had this problem. Thank you, Andrey On Fri, Oct 5, 2012 at 7:42 AM, Alexandru Sicoe adsi...@gmail.com wrote: Hello, We are planning to upgrade from version 1.0.7 to the 1.1 branch. Which is the stable version that people are using? I see the latest release is 1.1.5 but maybe it's not fully wise to use this. Is 1.1.4 the one to use? Cheers, Alex -- Bryan Talbot Architect / Platform team lead, Aeria Games and Entertainment Silicon Valley | Berlin | Tokyo | Sao Paulo
MBean cassandra.db.CompactionManager TotalBytesCompacted counts backwards
I've recently added compaction rate (in bytes / second) to my monitors for cassandra and am seeing some odd values. I wasn't expecting the values for TotalBytesCompacted to sometimes decrease from one reading to the next. It seems that the value should be monotonically increasing while a server is running -- obviously it would start again at 0 when the server is restarted or if the counter rolls over (unlikely for a 64 bit long). Below are two samples taken 60 seconds apart: the value decreased by 2,954,369,012 between the two readings. reported_metric=[timestamp:1349476449, status:200, request:[mbean:org.apache.cassandra.db:type=CompactionManager, attribute:TotalBytesCompacted, type:read], value:7548675470069] previous_metric=[timestamp:1349476389, status:200, request:[mbean:org.apache.cassandra.db:type=CompactionManager, attribute:TotalBytesCompacted, type:read], value:7551629839081] I briefly looked at the code for CompactionManager and a few related classes and don't see anyplace that is performing subtraction explicitly; however, there are many additions of signed long values that are not validated and could conceivably contain a negative value thus causing the totalBytesCompacted to decrease. It's interesting to note that the all of the differences I've seen so far are more than the overflow value of a signed 32 bit value. The OS (CentOS 5.7) and sun java vm (1.6.0_29) are both 64 bit. JNA is enabled. Is this expected and normal? If so, what is the correct interpretation of this metric? I'm seeing the negatives values a few times per hour when reading it once every 60 seconds. -Bryan
is Not a time-based UUID serious?
I'm testing upgrading a multi-node cluster from 1.0.9 to 1.1.5 and ran into the error message described here: https://issues.apache.org/jira/browse/CASSANDRA-4195 What I can't tell is if this is a serious issue or if it can be safely ignored. If it is a serious issue, shouldn't the migration guides for 1.1.x require that upgrades cannot be rolling or that all nodes must be running 1.0.11 or greater first? 2012-09-11 17:12:46,299 [GossipStage:1] ERROR org.apache.cassandra.service.AbstractCassandraDaemon - Fatal exception in thread Thread[GossipStage:1,5,main] java.lang.UnsupportedOperationException: Not a time-based UUID at java.util.UUID.timestamp(UUID.java:308) at org.apache.cassandra.service.MigrationManager.updateHighestKnown(MigrationManager.java:121) at org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:99) at org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:83) at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:806) at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:849) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:908) at org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -Bryan
Re: is Not a time-based UUID serious?
To answer my own question: yes, the error is fatal. This also means that upgrades to 1.1.x from 1.0.x MUST use 1.0.11 or greater it seems to be successful. My test upgrade from 1.0.9 to 1.1.5 left the cluster in a state that wasn't able to come to a schema agreement and blocked schema changes. -Bryan On Wed, Sep 12, 2012 at 2:42 PM, Bryan Talbot btal...@aeriagames.comwrote: I'm testing upgrading a multi-node cluster from 1.0.9 to 1.1.5 and ran into the error message described here: https://issues.apache.org/jira/browse/CASSANDRA-4195 What I can't tell is if this is a serious issue or if it can be safely ignored. If it is a serious issue, shouldn't the migration guides for 1.1.x require that upgrades cannot be rolling or that all nodes must be running 1.0.11 or greater first? 2012-09-11 17:12:46,299 [GossipStage:1] ERROR org.apache.cassandra.service.AbstractCassandraDaemon - Fatal exception in thread Thread[GossipStage:1,5,main] java.lang.UnsupportedOperationException: Not a time-based UUID at java.util.UUID.timestamp(UUID.java:308) at org.apache.cassandra.service.MigrationManager.updateHighestKnown(MigrationManager.java:121) at org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:99) at org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:83) at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:806) at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:849) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:908) at org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) -Bryan