hector - cassandra versions compatibility
Hi! I posted this question on hector users list but no one answered, so I am trying here as well. I have production cluster running Cassandra 1.0.8 and a test cluster with Cassandra 1.1.6. In my Java app I do not user maven, but rather have my lib directory with the jar files I use. When I ran my client code, currently using cassandra-all-1.0.8.jar cassandra-clientutil-1.0.8.jar cassandra-thrift-1.0.9.jar hector-core-1.0-5.jar *it worked fine with both Cassandra 1.0.8 and 1.1.6.* When I changed only hector to be hector-core-1.1-2.jar, *it also worked fine with both Cassandra 1.0.8 and 1.1.6. * When I switched to cassandra-all-1.1.5.jar cassandra-clientutil-1.1.5.jar cassandra-thrift-1.1.5.jar hector-core-1.1-2.jar *it didn't work, WITH EITHER Cassandra version...* I had exceptions below. Anyone can help or have an idea? Thanks, Tamar java.lang.IncompatibleClassChangeError: org/apache/cassandra/thrift/Cassandra$Client at me.prettyprint.cassandra.connection.client.HThriftClient.getCassandra(HThriftClient.java:88) at me.prettyprint.cassandra.connection.client.HThriftClient.getCassandra(HThriftClient.java:97) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:251) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:132) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.getColumn(KeyspaceServiceImpl.java:858) at me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:57) at me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:52) at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:101) at me.prettyprint.cassandra.model.thrift.ThriftColumnQuery.execute(ThriftColumnQuery.java:51) *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 tokLogo.png
Re: Datatype Conversion in CQL-Client?
If you are talking about the CQL-client that comes with Cassandra (cqlsh), it is actually written in Python: https://github.com/apache/cassandra/blob/trunk/bin/cqlsh For information on datatypes (and conversion) take a look at the CQL definition: http://www.datastax.com/docs/1.0/references/cql/index (Look at the CQL Data Types section) If that's not the client you are referencing, let us know which one you mean: http://brianoneill.blogspot.com/2012/08/cassandra-apis-laundry-list.html -brian On Nov 17, 2012, at 9:54 PM, Timmy Turner wrote: Thanks for the links, however I'm interested in the functionality that the official Cassandra client/API (which is in Java) offers. 2012/11/17 aaron morton aa...@thelastpickle.com Does the official/built-in Cassandra CQL client (in 1.2) What language ? Check the Java http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/ and python http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/ drivers. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 16/11/2012, at 11:21 AM, Timmy Turner timm.t...@gmail.com wrote: Does the official/built-in Cassandra CQL client (in 1.2) offer any built-in option to get direct values/objects when reading a field, instead of just a byte array? -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Re: huge commitlog
I am wondering whether the huge commitlog size is the expected behavior or not? Nope. Did you notice the large log size during or after the inserts ? If after did the size settle ? Are you using commit log archiving ? (in commitlog_archiving.properties) and around 700 mini column family (around 10M in data_file_directories) Can you describe how you created the 700 CF's ? and how can we reduce the size of commitlog? As a work around nodetool flush should checkpoint the log. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao hsiao.chuanh...@gmail.com wrote: hi Cassandra Developers, I am experiencing huge commitlog size (200+G) after inserting huge amount of data. It is a 4-node cluster with RF= 3, and currently each has 200+G commit log (so there are around 1T commit log in total) The setting of commitlog_total_space_in_mb is default. I am using 1.1.6. I did not do nodetool cleanup and nodetool flush yet, but I did nodetool repair -pr for each column family. There is 1 huge column family (around 68G in data_file_directories), and 18 mid-huge column family (around 1G in data_file_directories) and around 700 mini column family (around 10M in data_file_directories) I am wondering whether the huge commitlog size is the expected behavior or not? and how can we reduce the size of commitlog? Sincerely, Hsiao
Re: unable to read saved rowcache from disk
. But what is the upper bound? And rules of thumb? If you are using the off heap cache the upper bound is memory. If you are using the on head it's the JVM heap. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 17/11/2012, at 2:35 PM, Manu Zhang owenzhang1...@gmail.com wrote: Did that take into account the token, the row key, and the row payload, and the java memory overhead ? how could I watch the heap usage then since jconsole is not able to connect Cassandra at that time? Trying delete the saved cache and restarting. there is no problem for me to do so. But what is the upper bound? And rules of thumb? On Sat, Nov 17, 2012 at 9:15 AM, aaron morton aa...@thelastpickle.com wrote: Just curious why do you think row key will take 300 byte? That's what I thought it said earlier in the email thread. If the row key is Long type, doesn't it take 8 bytes? Yes, 8 bytes on disk. In his case, the rowCache was 500M with 1.6M rows, so the row data is 300B. Did I miss something? Did that take into account the token, the row key, and the row payload, and the java memory overhead ? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 16/11/2012, at 9:35 AM, Wei Zhu wz1...@yahoo.com wrote: Just curious why do you think row key will take 300 byte? If the row key is Long type, doesn't it take 8 bytes? In his case, the rowCache was 500M with 1.6M rows, so the row data is 300B. Did I miss something? Thanks. -Wei From: aaron morton aa...@thelastpickle.com To: user@cassandra.apache.org Sent: Thursday, November 15, 2012 12:15 PM Subject: Re: unable to read saved rowcache from disk For a row cache of 1,650,000: 16 byte token 300 byte row key ? and row data ? multiply by a java fudge factor or 5 or 10. Trying delete the saved cache and restarting. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 15/11/2012, at 8:20 PM, Wz1975 wz1...@yahoo.com wrote: Before shut down, you saw rowcache has 500m, 1.6m rows, each row average 300B, so 700k row should be a little over 200m, unless it is reading more, maybe tombstone? Or the rows on disk have grown for some reason, but row cache was not updated? Could be something else eats up the memory. You may profile memory and see who consumes the memory. Thanks. -Wei Sent from my Samsung smartphone on ATT Original message Subject: Re: unable to read saved rowcache from disk From: Manu Zhang owenzhang1...@gmail.com To: user@cassandra.apache.org CC: 3G, other jvm parameters are unchanged. On Thu, Nov 15, 2012 at 2:40 PM, Wz1975 wz1...@yahoo.com wrote: How big is your heap? Did you change the jvm parameter? Thanks. -Wei Sent from my Samsung smartphone on ATT Original message Subject: Re: unable to read saved rowcache from disk From: Manu Zhang owenzhang1...@gmail.com To: user@cassandra.apache.org CC: add a counter and print out myself On Thu, Nov 15, 2012 at 1:51 PM, Wz1975 wz1...@yahoo.com wrote: Curious where did you see this? Thanks. -Wei Sent from my Samsung smartphone on ATT Original message Subject: Re: unable to read saved rowcache from disk From: Manu Zhang owenzhang1...@gmail.com To: user@cassandra.apache.org CC: OOM at deserializing 747321th row On Thu, Nov 15, 2012 at 9:08 AM, Manu Zhang owenzhang1...@gmail.com wrote: oh, as for the number of rows, it's 165. How long would you expect it to be read back? On Thu, Nov 15, 2012 at 3:57 AM, Wei Zhu wz1...@yahoo.com wrote: Good information Edward. For my case, we have good size of RAM (76G) and the heap is 8G. So I set the row cache to be 800M as recommended. Our column is kind of big, so the hit ratio for row cache is around 20%, so according to datastax, might just turn the row cache altogether. Anyway, for restart, it took about 2 minutes to load the row cache INFO [main] 2012-11-14 11:43:29,810 AutoSavingCache.java (line 108) reading saved cache /var/lib/cassandra/saved_caches/XXX-f2-RowCache INFO [main] 2012-11-14 11:45:12,612 ColumnFamilyStore.java (line 451) completed loading (102801 ms; 21125 keys) row cache for XXX.f2 Just for comparison, our key is long, the disk usage for row cache is 253K. (it only stores key when row cache is saved to disk, so 253KB/ 8bytes = 31625 number of keys). It's about right... So for 15MB, there could be a lot of narrow rows. (if the key is Long, could be more than 1M rows) Thanks. -Wei From: Edward Capriolo edlinuxg...@gmail.com To: user@cassandra.apache.org Sent: Tuesday, November 13, 2012 11:13 PM Subject: Re: unable to read saved
Re: Cassandra nodes failing with OOM
1. How much GCInspector warnings per hour are considered 'normal'? None. A couple during compaction or repair is not the end of the world. But if you have enough to thinking about per hour it's too many. 2. What should be the next thing to check? Try to determine if the GC activity correlates to application workload, compaction or repair. Try to determine what the working set of the server is. Watch the GC activity (via gc logs or JMX) and see what the size of the tenured heap is after a CMS. Or try to calculate it http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html Look at your data model and query patterns for places where very large queries are being made. Or rows that are very long lived with a lot of deletes (prob not as much as an issue with LDB). 3. What are the possible failure reasons and how to prevent those? As above. As a work around sometimes drastically slowing down compaction can help. For LDB try reducing in_memory_compaction_limit_in_mb and compaction_throughput_mb_per_sec Hope that helps. - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 17/11/2012, at 7:07 PM, Ивaн Cобoлeв sobol...@gmail.com wrote: Dear Community, advice from you needed. We have a cluster, 1/6 nodes of which died for various reasons(3 had OOM message). Nodes died in groups of 3, 1, 2. No adjacent died, though we use SimpleSnitch. Version: 1.1.6 Hardware: 12Gb RAM / 8 cores(virtual) Data: 40Gb/node Nodes: 36 nodes Keyspaces:2(RF=3, R=W=2) + 1(OpsCenter) CFs:36, 2 indexes Partitioner: Random Compaction: Leveled(we don't want 2x space for housekeeping) Caching: Keys only All is pretty much standard apart from the one CF receiving writes in 64K chunks and having sstable_size_in_mb=100. No JNA installed - this is to be fixed soon. Checking sysstat/sar I can see 80-90% CPU idle, no anomalies in io and the only change - network activity spiking. All the nodes before dying had the following on logs: INFO [ScheduledTasks:1] 2012-11-15 21:35:05,512 StatusLogger.java (line 72) MemtablePostFlusher 1 4 0 INFO [ScheduledTasks:1] 2012-11-15 21:35:13,540 StatusLogger.java (line 72) FlushWriter 1 3 0 INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line 72) HintedHandoff 1 6 0 INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line 77) CompactionManager 5 9 GCInspector warnings were there too, they went from ~0.8 to 3Gb heap in 5-10mins. So, could you please give me a hint on: 1. How much GCInspector warnings per hour are considered 'normal'? 2. What should be the next thing to check? 3. What are the possible failure reasons and how to prevent those? Thank you very much in advance, Ivan
Re: Query regarding SSTable timestamps and counts
As per datastax documentation, a manual compaction forces the admin to start compaction manually and disables the automated compaction (atleast for major compactions but not minor compactions ) It does not disable compaction. it creates one big file, which will not be compacted until there are (by default) 3 other very big files. 1. Does a nodetool stop compaction also force the admin to manually run major compaction ( I.e. disable automated major compactions ? ) No. Stop just stops the current compaction. Nothing is disabled. 2. Can a node restart reset the automated major compaction if a node gets into a manual mode compaction for whatever reason ? Major compaction is not automatic. It is the manual nodetool compact command. Automatic (minor) compaction is controlled by min_compaction_threshold and max_compaction_threshold (for the default compaction strategy). 3. What is the ideal number of SSTables for a table in a keyspace ( I mean are there any indicators as to whether my compaction is alright or not ? ) This is not something you have to worry about. Unless you are seeing 1,000's of files using the default compaction. For example, I have seen SSTables on the disk more than 10 days old wherein there were other SSTables belonging to the same table but much younger than the older SSTables ( No problems. 4. Does a upgradesstables fix any compaction issues ? What are the compaction issues you are having ? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 18/11/2012, at 1:18 AM, Ananth Gundabattula agundabatt...@gmail.com wrote: We have a cluster running cassandra 1.1.4. On this cluster, 1. We had to move the nodes around a bit when we were adding new nodes (there was quite a good amount of node movement ) 2. We had to stop compactions during some of the days to save some disk space on some of the nodes when they were running very very low on disk spaces. (via nodetool stop COMPACTION) As per datastax documentation, a manual compaction forces the admin to start compaction manually and disables the automated compaction (atleast for major compactions but not minor compactions ) Here are the questions I have regarding compaction: 1. Does a nodetool stop compaction also force the admin to manually run major compaction ( I.e. disable automated major compactions ? ) 2. Can a node restart reset the automated major compaction if a node gets into a manual mode compaction for whatever reason ? 3. What is the ideal number of SSTables for a table in a keyspace ( I mean are there any indicators as to whether my compaction is alright or not ? ) . For example, I have seen SSTables on the disk more than 10 days old wherein there were other SSTables belonging to the same table but much younger than the older SSTables ( The node movement and repair and cleanup happened between the older SSTables and the new SSTables being touched/modified) 4. Does a upgradesstables fix any compaction issues ? Regards, Ananth
Re: Upgrade 1.1.2 - 1.1.6
time (UTC) 01 2 3 4 5 6 7 8 9 10 11 12 13 Good value 88 44 26 35 26 86 187 251 455 389 473 367 453 373 C* counter149 82 45 68 38 146 329 414 746 566 473 377 453 373 I finished my Cassandra 1.1.6 upgrades at 9:30 UTC. This looks like the counters were more out of sync before the upgrade than after? Do you know if your client is retrying counter operations ? (I saw some dropped messages in the S1 log). S1 shows a lot of Commit Log replay going on. Reading your timeline below this sounds like the auto restart catching you out. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 16/11/2012, at 10:22 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Here is an example of the increase for some counter (counting events per hour) time (UTC) 01 2 3 4 5 6 7 8 9 10 11 12 13 Good value 88 44 26 35 26 86 187 251 455 389 473 367 453 373 C* counter149 82 45 68 38 146 329 414 746 566 473 377 453 373 I finished my Cassandra 1.1.6 upgrades at 9:30 UTC. I found wrong values since the day before at 20:00 UTC (counters from hours before are good) Here are the logs from the output: Server 1: http://pastebin.com/WyCm6Ef5 (This one is from the same server as the first bash history on my first mail) Server 2: http://pastebin.com/gBe2KL2b (This one is from the same server as the second bash history on my first mail) Alain 2012/11/15 aaron morton aa...@thelastpickle.com Can you provide an example of the increase ? Can you provide the log from startup ? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 16/11/2012, at 3:21 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: We had an issue with counters over-counting even using the nodetool drain command before upgrading... Here is my bash history 69 cp /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak 70 cp /etc/cassandra/cassandra-env.sh /etc/cassandra/cassandra-env.sh.bak 71 sudo apt-get install cassandra 72 nodetool disablethrift 73 nodetool drain 74 service cassandra stop 75 cat /etc/cassandra/cassandra-env.sh /etc/cassandra/cassandra-env.sh.bak 76 vim /etc/cassandra/cassandra-env.sh 77 cat /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak 78 vim /etc/cassandra/cassandra.yaml 79 service cassandra start So I think I followed these steps http://www.datastax.com/docs/1.1/install/upgrading#upgrade-steps I merged my conf files with an external tool so consider I merged my conf files on steps 76 and 78. I saw that the sudo apt-get install cassandra stop the server and restart it automatically. So it updated without draining and restart before I had the time to reconfigure the conf files. Is this normal ? Is there a way to avoid it ? So for the second node I decided to try to stop C*before the upgrade. 125 cp /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak 126 cp /etc/cassandra/cassandra-env.sh /etc/cassandra/cassandra-env.sh.bak 127 nodetool disablegossip 128 nodetool disablethrift 129 nodetool drain 130 service cassandra stop 131 sudo apt-get install cassandra //131 : This restarted cassandra 132 nodetool disablethrift 133 nodetool disablegossip 134 nodetool drain 135 service cassandra stop 136 cat /etc/cassandra/cassandra-env.sh /etc/cassandra/cassandra-env.sh.bak 137 cim /etc/cassandra/cassandra-env.sh 138 vim /etc/cassandra/cassandra-env.sh 139 cat /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak 140 vim /etc/cassandra/cassandra.yaml 141 service cassandra start After both of these updates I saw my current counters increase without any reason. Did I do anything wrong ? Alain
Re: unable to read saved rowcache from disk
If you are using the off heap cache the upper bound is memory. If you are using the on head it's the JVM heap. But as I said earlier, I could not watch the usage of JVM heap while reading saved caches
Re: huge commitlog
Hi Aaron, Thank you very much for the replying. The 700 CFs were created in the beginning (before any insertion.) I did not do anything with commitlog_archiving.properties, so I guess I was not using commit log archiving. What I did was doing a lot of insertions (and some deletions) using another 4 machines with 32 processes in total. (There are 4 nodes in my setting, so there are 8 machines in total) I did see huge logs in /var/log/cassandra after such huge amount of insertions. Right now I can't distinguish whether single insertion also cause huge logs. nodetool flush hanged (maybe because of 200G+ commitlog) Because these machines are not in production (guaranteed no more insertion/deletion) I ended up restarting cassandra one node each time, the commitlog shrinked back to 4G. I am doing repair on each node now. I'll try to re-import and keep logs when the commitlog increases insanely again. Sincerely, Hsiao On Mon, Nov 19, 2012 at 3:19 AM, aaron morton aa...@thelastpickle.com wrote: I am wondering whether the huge commitlog size is the expected behavior or not? Nope. Did you notice the large log size during or after the inserts ? If after did the size settle ? Are you using commit log archiving ? (in commitlog_archiving.properties) and around 700 mini column family (around 10M in data_file_directories) Can you describe how you created the 700 CF's ? and how can we reduce the size of commitlog? As a work around nodetool flush should checkpoint the log. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao hsiao.chuanh...@gmail.com wrote: hi Cassandra Developers, I am experiencing huge commitlog size (200+G) after inserting huge amount of data. It is a 4-node cluster with RF= 3, and currently each has 200+G commit log (so there are around 1T commit log in total) The setting of commitlog_total_space_in_mb is default. I am using 1.1.6. I did not do nodetool cleanup and nodetool flush yet, but I did nodetool repair -pr for each column family. There is 1 huge column family (around 68G in data_file_directories), and 18 mid-huge column family (around 1G in data_file_directories) and around 700 mini column family (around 10M in data_file_directories) I am wondering whether the huge commitlog size is the expected behavior or not? and how can we reduce the size of commitlog? Sincerely, Hsiao
Re: huge commitlog
What consistency level are you writing with? If you were writing with ANY, try writing with a higher consistency level. -Tupshin On Nov 18, 2012 9:05 PM, Chuan-Heng Hsiao hsiao.chuanh...@gmail.com wrote: Hi Aaron, Thank you very much for the replying. The 700 CFs were created in the beginning (before any insertion.) I did not do anything with commitlog_archiving.properties, so I guess I was not using commit log archiving. What I did was doing a lot of insertions (and some deletions) using another 4 machines with 32 processes in total. (There are 4 nodes in my setting, so there are 8 machines in total) I did see huge logs in /var/log/cassandra after such huge amount of insertions. Right now I can't distinguish whether single insertion also cause huge logs. nodetool flush hanged (maybe because of 200G+ commitlog) Because these machines are not in production (guaranteed no more insertion/deletion) I ended up restarting cassandra one node each time, the commitlog shrinked back to 4G. I am doing repair on each node now. I'll try to re-import and keep logs when the commitlog increases insanely again. Sincerely, Hsiao On Mon, Nov 19, 2012 at 3:19 AM, aaron morton aa...@thelastpickle.com wrote: I am wondering whether the huge commitlog size is the expected behavior or not? Nope. Did you notice the large log size during or after the inserts ? If after did the size settle ? Are you using commit log archiving ? (in commitlog_archiving.properties) and around 700 mini column family (around 10M in data_file_directories) Can you describe how you created the 700 CF's ? and how can we reduce the size of commitlog? As a work around nodetool flush should checkpoint the log. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao hsiao.chuanh...@gmail.com wrote: hi Cassandra Developers, I am experiencing huge commitlog size (200+G) after inserting huge amount of data. It is a 4-node cluster with RF= 3, and currently each has 200+G commit log (so there are around 1T commit log in total) The setting of commitlog_total_space_in_mb is default. I am using 1.1.6. I did not do nodetool cleanup and nodetool flush yet, but I did nodetool repair -pr for each column family. There is 1 huge column family (around 68G in data_file_directories), and 18 mid-huge column family (around 1G in data_file_directories) and around 700 mini column family (around 10M in data_file_directories) I am wondering whether the huge commitlog size is the expected behavior or not? and how can we reduce the size of commitlog? Sincerely, Hsiao
Re: huge commitlog
I have RF = 3. Read/Write consistency has already been set as TWO. It did seem that the data were not consistent yet. (There are some CFs that I expected empty after the operations, but still got some data, and the number of data were decreasing after retrying to get all data from that CF) Sincerely, Hsiao On Mon, Nov 19, 2012 at 11:14 AM, Tupshin Harper tups...@tupshin.com wrote: What consistency level are you writing with? If you were writing with ANY, try writing with a higher consistency level. -Tupshin On Nov 18, 2012 9:05 PM, Chuan-Heng Hsiao hsiao.chuanh...@gmail.com wrote: Hi Aaron, Thank you very much for the replying. The 700 CFs were created in the beginning (before any insertion.) I did not do anything with commitlog_archiving.properties, so I guess I was not using commit log archiving. What I did was doing a lot of insertions (and some deletions) using another 4 machines with 32 processes in total. (There are 4 nodes in my setting, so there are 8 machines in total) I did see huge logs in /var/log/cassandra after such huge amount of insertions. Right now I can't distinguish whether single insertion also cause huge logs. nodetool flush hanged (maybe because of 200G+ commitlog) Because these machines are not in production (guaranteed no more insertion/deletion) I ended up restarting cassandra one node each time, the commitlog shrinked back to 4G. I am doing repair on each node now. I'll try to re-import and keep logs when the commitlog increases insanely again. Sincerely, Hsiao On Mon, Nov 19, 2012 at 3:19 AM, aaron morton aa...@thelastpickle.com wrote: I am wondering whether the huge commitlog size is the expected behavior or not? Nope. Did you notice the large log size during or after the inserts ? If after did the size settle ? Are you using commit log archiving ? (in commitlog_archiving.properties) and around 700 mini column family (around 10M in data_file_directories) Can you describe how you created the 700 CF's ? and how can we reduce the size of commitlog? As a work around nodetool flush should checkpoint the log. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 17/11/2012, at 2:30 PM, Chuan-Heng Hsiao hsiao.chuanh...@gmail.com wrote: hi Cassandra Developers, I am experiencing huge commitlog size (200+G) after inserting huge amount of data. It is a 4-node cluster with RF= 3, and currently each has 200+G commit log (so there are around 1T commit log in total) The setting of commitlog_total_space_in_mb is default. I am using 1.1.6. I did not do nodetool cleanup and nodetool flush yet, but I did nodetool repair -pr for each column family. There is 1 huge column family (around 68G in data_file_directories), and 18 mid-huge column family (around 1G in data_file_directories) and around 700 mini column family (around 10M in data_file_directories) I am wondering whether the huge commitlog size is the expected behavior or not? and how can we reduce the size of commitlog? Sincerely, Hsiao
Re: Query regarding SSTable timestamps and counts
Hello Aaron, Thanks a lot for the reply. Looks like the documentation is confusing. Here is the link I am referring to: http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction It does not disable compaction. As per the above url, After running a major compaction, automatic minor compactions are no longer triggered, frequently requiring you to manually run major compactions on a routine basis. ( Just before the heading Tuning Column Family compression in the above link) With respect to the replies below : it creates one big file, which will not be compacted until there are (by default) 3 other very big files. This is for the minor compaction and major compaction should theoretically result in one large file irrespective of the number of data files initially? This is not something you have to worry about. Unless you are seeing 1,000's of files using the default compaction. Well my worry has been because of the large amount of node movements we have done in the ring. We started off with 6 nodes and increased the capacity to 12 with disproportionate increases every time which resulted in a lot of clean of data folders except system, run repair and then a cleanup with an aborted attempt in between. There were some data.db files older by more than 2 weeks and were not modified since then. My understanding of the compaction process was that since data files keep continuously merging we should not have data files with very old last modified timestamps (assuming there is a good amount of writes to the table continuously) I did not have a for sure way of telling if everything is alright with the compaction looking at the last modified timestamps of all the data.db files. What are the compaction issues you are having ? Your replies confirm that the timestamps should not be an issue to worry about. So I guess I should not be calling them as issues any more. But performing an upgradesstables did decrease the number of data files and removed all the data files with the old timestamps. Regards, Ananth On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.comwrote: As per datastax documentation, a manual compaction forces the admin to start compaction manually and disables the automated compaction (atleast for major compactions but not minor compactions ) It does not disable compaction. it creates one big file, which will not be compacted until there are (by default) 3 other very big files. 1. Does a nodetool stop compaction also force the admin to manually run major compaction ( I.e. disable automated major compactions ? ) No. Stop just stops the current compaction. Nothing is disabled. 2. Can a node restart reset the automated major compaction if a node gets into a manual mode compaction for whatever reason ? Major compaction is not automatic. It is the manual nodetool compact command. Automatic (minor) compaction is controlled by min_compaction_threshold and max_compaction_threshold (for the default compaction strategy). 3. What is the ideal number of SSTables for a table in a keyspace ( I mean are there any indicators as to whether my compaction is alright or not ? ) This is not something you have to worry about. Unless you are seeing 1,000's of files using the default compaction. For example, I have seen SSTables on the disk more than 10 days old wherein there were other SSTables belonging to the same table but much younger than the older SSTables ( No problems. 4. Does a upgradesstables fix any compaction issues ? What are the compaction issues you are having ? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 18/11/2012, at 1:18 AM, Ananth Gundabattula agundabatt...@gmail.com wrote: We have a cluster running cassandra 1.1.4. On this cluster, 1. We had to move the nodes around a bit when we were adding new nodes (there was quite a good amount of node movement ) 2. We had to stop compactions during some of the days to save some disk space on some of the nodes when they were running very very low on disk spaces. (via nodetool stop COMPACTION) As per datastax documentation, a manual compaction forces the admin to start compaction manually and disables the automated compaction (atleast for major compactions but not minor compactions ) Here are the questions I have regarding compaction: 1. Does a nodetool stop compaction also force the admin to manually run major compaction ( I.e. disable automated major compactions ? ) 2. Can a node restart reset the automated major compaction if a node gets into a manual mode compaction for whatever reason ? 3. What is the ideal number of SSTables for a table in a keyspace ( I mean are there any indicators as to whether my compaction is alright or not ? ) . For example, I have seen SSTables on the disk more than 10 days old wherein there were other SSTables belonging to the
Re: get_range_slice gets no rowcache support?
yes, https://issues.apache.org/jira/browse/CASSANDRA-1302 thanks On Wed, Nov 14, 2012 at 2:04 AM, Tyler Hobbs ty...@datastax.com wrote: As far as I know, the row cache has never been populated by get_range_slices(), only normal gets/multigets. The behavior is this way because get_range_slices() is almost exclusively used to page over an entire column family, which generally would not fit into the cache and would simply a) ruin your cache if used for gets (b) generate a lot of extra garbage, and (c) result in nothing but cache misses. With that said, I'm sure there are still a few use cases where using the cache would be beneficial, so I'm sure there's a ticket out there somewhere that presents a few options for supporting this. On Thu, Nov 8, 2012 at 8:39 PM, Manu Zhang owenzhang1...@gmail.comwrote: I did overlook something. get_range_slice will invoke cfs.getRawCachedRow instead of cfs.getThroughCache. Hence, no row will be cached if it's not present in the row cache. Well, this puzzles me further as to that how the range of rows is expected to get stored into the row cache in the first place. Would someone please clarify it for me? Thanks in advance. On Thu, Nov 8, 2012 at 3:23 PM, Manu Zhang owenzhang1...@gmail.comwrote: I've asked this question before. And after reading the source codes, I find that get_range_slice doesn't query rowcache before reading from Memtable and SSTable. I just want to make sure whether I've overlooked something. If my observation is correct, what's the consideration here? -- Tyler Hobbs DataStax http://datastax.com/
Re: Datatype Conversion in CQL-Client?
I think Timmy might be referring to the upcoming native CQL Java driver that might be coming with 1.2 - It was mentioned here: http://www.datastax.com/wp-content/uploads/2012/08/7_Datastax_Upcoming_Changes_in_Drivers.pdf I would also be interested on testing that but I can't find it from repositories. Any hints? Regards, Tommi L. *From:* Brian O'Neill [mailto:boneil...@gmail.com] *On Behalf Of *Brian O'Neill *Sent:* 18. marraskuuta 2012 17:47 *To:* user@cassandra.apache.org *Subject:* Re: Datatype Conversion in CQL-Client? *Importance:* Low ** ** ** ** If you are talking about the CQL-client that comes with Cassandra (cqlsh), it is actually written in Python: https://github.com/apache/cassandra/blob/trunk/bin/cqlsh ** ** For information on datatypes (and conversion) take a look at the CQL definition: http://www.datastax.com/docs/1.0/references/cql/index (Look at the CQL Data Types section) ** ** If that's not the client you are referencing, let us know which one you mean: http://brianoneill.blogspot.com/2012/08/cassandra-apis-laundry-list.html** ** ** ** -brian ** ** On Nov 17, 2012, at 9:54 PM, Timmy Turner wrote: Thanks for the links, however I'm interested in the functionality that the official Cassandra client/API (which is in Java) offers. ** ** 2012/11/17 aaron morton aa...@thelastpickle.com Does the official/built-in Cassandra CQL client (in 1.2) What language ? ** ** Check the Java http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/ and python http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/ drivers.*** * ** ** Cheers ** ** ** ** - Aaron Morton Freelance Cassandra Developer New Zealand ** ** @aaronmorton http://www.thelastpickle.com ** ** On 16/11/2012, at 11:21 AM, Timmy Turner timm.t...@gmail.com wrote: Does the official/built-in Cassandra CQL client (in 1.2) offer any built-in option to get direct values/objects when reading a field, instead of just a byte array? ** ** ** ** ** ** -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/ ** **