Re: Sorting & pagination in apache cassandra 2.1
In the example you gave the primary key user _ name is the row key. Since the default partition is random you are getting rows in random order. Since each row no clustering column there is no further grouping of data. Or in simple terms each row has one record and is being returned ordered by column name. To see some meaningful ordering there should be some clustering column defined. You can use create additional column families to maintain ordering. Or use external solutions like elasticsearch. On Jan 12, 2016 10:07 PM, "anuja jain"wrote: > I understand the meaning of SSTable but whats the reason behind sorting > the table on the basis of int columns first.. > Is there any data type preference in cassandra? > Also What is the alternative to creating materialised views if my > cassandra version is prior to 3.0 (specifically 2.1) and which is already > in production.? > > > On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli > wrote: > >> On Mon, Jan 11, 2016 at 11:30 PM, anuja jain >> wrote: >> >>> 1 more question, what does it mean by "cassandra inherently sorts data"? >>> >> >> SSTable = Sorted Strings Table. >> >> It doesn't contain "Strings" anymore, really, but that's a hint.. :) >> >> =Rob >> > >
Re: Cassandra table as Queue
I would suggest looking at Comcast Message Bus schema definition. https://github.com/Comcast/cmb -Naren On Thu, Aug 20, 2015 at 10:30 AM, algermissen1971 algermissen1...@icloud.com wrote: Hi Rado, On 20 Aug 2015, at 15:05, Radoslav Smilyanov radoslav.smilya...@novarto.com wrote: Hello, I need to have a table that is acting as a queue for specific data. Data that has to be stored in this table are some unique ids that have to be predefined and whenever it is requested one id has to be obtained by the queue and new one has to be added. This queue table will have fixed size of 50 000 entries. I see that it is not recommended at all to use cassandra table for a queue, but I need to find a design for my data that will not cause performance issues caused by tombstones. I am using cassandra 2.1.6 with java driver and I am afraid that at some point of time I will start experiencing performance issues caused by many tombstones. Current design of my table with one column is not good enough for querying the data since now I am using: 1. select * from table limit 1 which returns me first id in the table 2. delete from table where id = id_from step 1 Did someone try to implement a queue with cassandra table that is working productively now without any performance issues? I will appreciate some hints how can I achieve good performance in cassandra for a queue table. I came up with a design last year that I am using without problems with a java-driver -based implementation in production since several months. Two caveats: - Our environment is not high-volume or high-frequency. Message counts per minute come in dozens, at most. So the design is not tested in heavy scenarios. We merely needed something based on the existing tech-stack. - The Ruby version has a logical bug, mentioned in the README. https://github.com/algermissen/cassandra-ruby-sharded-workers Given the tombstone problem I what I know by now, I'd rather not use a TTL on the messages but remove outdated time shards completely after e.g. a week. But since reads never really go to an outdated shard, the tombstones do not slow down the reads. Hope that helps. Jan Thanks, Rado -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*
Re: Cassandra leap second
We also experienced same, i.e. high cpu on Cassandra 1.1.4 node running in AWS. Restarting the vm worked. On Wed, Jul 1, 2015 at 4:58 AM, Jason Wee peich...@gmail.com wrote: same here too, on branch 1.1 and have not seen any high cpu usage. On Wed, Jul 1, 2015 at 2:52 PM, John Wong gokoproj...@gmail.com wrote: Which version are you running and what's your kernel version? We are still running on 1.2 branch but we have not seen any high cpu usage yet... On Tue, Jun 30, 2015 at 11:10 PM, snair123 . nair...@outlook.com wrote: reboot of the machine worked -- From: nair...@outlook.com To: user@cassandra.apache.org Subject: Cassandra leap second Date: Wed, 1 Jul 2015 02:54:53 + Is it ok to run this https://blog.mozilla.org/it/2012/06/30/mysql-and-the-leap-second-high-cpu-and-the-fix/ Seeing high cpu consumption for cassandra process -- Sent from Jeff Dean's printf() mobile console -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*
Re: Data model suggestions
I think one table say record should be good. The primary key is record id. This will ensure good distribution. Just update the active attribute to true or false. For range query on active vs archive records maintain 2 indexes or try secondary index. On Apr 23, 2015 1:32 PM, Ali Akhtar ali.rac...@gmail.com wrote: Good point about the range selects. I think they can be made to work with limits, though. Or, since the active records will never usually be 500k, the ids may just be cached in memory. Most of the time, during reads, the queries will just consist of select * where primaryKey = someValue . One row at a time. The question is just, whether to keep all records in one table (including archived records which wont be queried 99% of the time), or to keep active records in their own table, and delete them when they're no longer active. Will that produce tombstone issues? On Fri, Apr 24, 2015 at 12:56 AM, Manoj Khangaonkar khangaon...@gmail.com wrote: Hi, If your external API returns active records, that means I am guessing you need to do a select * on the active table to figure out which records in the table are no longer active. You might be aware that range selects based on partition key will timeout in cassandra. They can however be made to work using the column cluster key. To comment more, We would need to see your proposed cassandra tables and queries that you might need to run. regards On Thu, Apr 23, 2015 at 9:45 AM, Ali Akhtar ali.rac...@gmail.com wrote: That's returned by the external API we're querying. We query them for active records, if a previous active record isn't included in the results, that means its time to archive that record. On Thu, Apr 23, 2015 at 9:20 PM, Manoj Khangaonkar khangaon...@gmail.com wrote: Hi, How do you determine if the record is no longer active ? Is it a perioidic process that goes through every record and checks when the last update happened ? regards On Thu, Apr 23, 2015 at 8:09 AM, Ali Akhtar ali.rac...@gmail.com wrote: Hey all, We are working on moving a mysql based application to Cassandra. The workflow in mysql is this: We have two tables: active and archive . Every hour, we pull in data from an external API. The records which are active, are kept in 'active' table. Once a record is no longer active, its deleted from 'active' and re-inserted into 'archive' The purpose for that, is because most of the time, queries are only done against the active records rather than archived. Therefore keeping the active table small may help with faster queries, if it only has to search 200k records vs 3 million or more. Is it advisable to keep the same data model in Cassandra? I'm concerned about tombstone issues when records are deleted from active. Thanks. -- http://khangaonkar.blogspot.com/ -- http://khangaonkar.blogspot.com/
Re: question about secondary index or not
I am sure there will be other attributes associated with employee. Reading and throwing away records on the client is not good. Better maintain another column family that holds reference to only male employees. This will make your pagination logic simple on the client side without wasting resources on server or client side. My experience with secondary indexes was also not good. My own index CF gave 100% better performance than secondary index for the same usecase and result. On Thu, Jan 30, 2014 at 6:41 AM, Edward Capriolo edlinuxg...@gmail.comwrote: There is a aubtle difference between work well amd efficient design. Say you add this index, that is a huge cost on disk just because cql may not allow the where clause you want. Shameless plug but this is why i worked on intravert...server side paging may be the right answer here. I plan on opening that work all up again and finding a way to get it merged into cassandra. On Wednesday, January 29, 2014, Mullen, Robert robert.mul...@pearson.com wrote: Thanks for that info ondrej, I've never tested out secondary indexes as I've avoided them because of all the uncertainty around them, and your statement just adds to the uncertainty. Everything I had read said that secondary indexes were supposed to work well for columns with low cardinality, but I guess that's not always the case. peace, Rob On Wed, Jan 29, 2014 at 2:21 AM, Ondřej Černoš cern...@gmail.com wrote: Hi, we had a similar use case. Just do the filtering client-side, the #2 example performs horribly, secondary indexes on something dividing the set into two roughly the same size subsets just don't work. Give it a try on localhost with just a couple of records (150.000), you will see. regards, ondrej On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin y2klyf+w...@gmail.com wrote: in my #2 example: select * from people where company_id='xxx' and gender='male' I already specify the first part of the primary key(row key) in my where clause, so how does the secondary indexed column gender='male help determine which row to return? It is more like filtering a list of column from a row(which is exactly I can do that in #1 example). But then if I don't create index first, the cql statement will run into syntax error. On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert robert.mul...@pearson.com wrote: I would do #2. Take a look at this blog which talks about secondary indexes, cardinality, and what it means for cassandra. Secondary indexes in cassandra are a different beast, so often old rules of thumb about indexes don't apply. http://www.wentnet.com/blog/?p=77 On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Generally indexes on binary fields true/false male/female are not terrible effective. On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the male employee given a company id, I can do 1/ select * from people where company_id=' and loop through the result efficiently to pick the employee who has gender column value equal to male 2/ add a seconday index create index gender_index on people(gender) select * from people where company_id='xxx' and gender='male' I though #2 seems more appropriate, but I also thought the secondary index is helping only locating the primary row key, with the select clause in #2, is it more efficient than #1 where application responsible loop through the result and filter the right content? ( It totally make sense if I only need to find out all the male employee(and not within a company) by using select * from people where gender='male ) thanks -- Sorry this was sent from mobile. Will do less grammar and spell check than usual. -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*
Re: Cassandra ring not behaving like a ring
Any pointers? I am planning to do rolling restart of the cluster nodes to see if it will help. On Jan 15, 2014 2:59 PM, Narendra Sharma narendra.sha...@gmail.com wrote: RF=3. On Jan 15, 2014 1:18 PM, Andrey Ilinykh ailin...@gmail.com wrote: what is the RF? What does nodetool ring show? On Wed, Jan 15, 2014 at 1:03 PM, Narendra Sharma narendra.sha...@gmail.com wrote: Sorry for the odd subject but something is wrong with our cassandra ring. We have a 9 node ring as below. N1 - UP/NORMAL N2 - UP/NORMAL N3 - UP/NORMAL N4 - UP/NORMAL N5 - UP/NORMAL N6 - UP/NORMAL N7 - UP/NORMAL N8 - UP/NORMAL N9 - UP/NORMAL Using random partitioner and simple snitch. Cassandra 1.1.6 in AWS. I added a new node with token that is exactly in middle of N6 and N7. So the ring displayed as following N1 - UP/NORMAL N2 - UP/NORMAL N3 - UP/NORMAL N4 - UP/NORMAL N5 - UP/NORMAL N6 - UP/NORMAL N6.5 - UP/JOINING N7 - UP/NORMAL N8 - UP/NORMAL N9 - UP/NORMAL I noticed that N6.5 is streaming from N1, N2, N6 and N7. I expect it to steam from (worst case) N5, N6, N7, N8. What could potentially cause the node to get confused about the ring? -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*
Re: Cassandra ring not behaving like a ring
Here is the nodetool ring output. Address DC RackStatus State Load Effective-Ownership Token 148873535527910577765226390751398592512 10.3.1.179 datacenter1 rack1 Up Normal 752.53 GB 37.50% 0 10.3.1.29 datacenter1 rack1 Up Normal 704.36 GB 37.50% 21267647932558653966460912964485513215 10.3.1.206 datacenter1 rack1 Up Normal 561.68 GB 31.25% 31901471898837980949691369446728269825 10.3.1.175 datacenter1 rack1 Up Normal 1.33 TB 25.00% 42535295865117307932921825928971026431 10.3.1.239 datacenter1 rack1 Up Normal 784.91 GB 18.75% 53169119831396634916152282411213783039 10.3.1.24 datacenter1 rack1 Up Normal 1.06 TB 18.75% 63802943797675961899382738893456539648 *I tried add a new node with token 7443676776395522613195375699296255* 10.3.1.177 datacenter1 rack1 Up Normal 1.01 TB 25.00% 85070591730234615865843651857942052863 10.3.1.135 datacenter1 rack1 Up Normal 702.56 GB 31.25% 106338239662793269832304564822427566080 10.3.1.178 datacenter1 rack1 Up Normal 783.75 GB 37.50% 127605887595351923798765477786913079295 10.3.1.30 datacenter1 rack1 Up Normal 630.09 GB 37.50% 148873535527910577765226390751398592512 After looking at the nodes it was streaming from, I stopped the node. On Thu, Jan 16, 2014 at 12:49 PM, Jonathan Haddad j...@jonhaddad.com wrote: Please include the output of nodetool ring, otherwise no one can help you. On Thu, Jan 16, 2014 at 12:45 PM, Narendra Sharma narendra.sha...@gmail.com wrote: Any pointers? I am planning to do rolling restart of the cluster nodes to see if it will help. On Jan 15, 2014 2:59 PM, Narendra Sharma narendra.sha...@gmail.com wrote: RF=3. On Jan 15, 2014 1:18 PM, Andrey Ilinykh ailin...@gmail.com wrote: what is the RF? What does nodetool ring show? On Wed, Jan 15, 2014 at 1:03 PM, Narendra Sharma narendra.sha...@gmail.com wrote: Sorry for the odd subject but something is wrong with our cassandra ring. We have a 9 node ring as below. N1 - UP/NORMAL N2 - UP/NORMAL N3 - UP/NORMAL N4 - UP/NORMAL N5 - UP/NORMAL N6 - UP/NORMAL N7 - UP/NORMAL N8 - UP/NORMAL N9 - UP/NORMAL Using random partitioner and simple snitch. Cassandra 1.1.6 in AWS. I added a new node with token that is exactly in middle of N6 and N7. So the ring displayed as following N1 - UP/NORMAL N2 - UP/NORMAL N3 - UP/NORMAL N4 - UP/NORMAL N5 - UP/NORMAL N6 - UP/NORMAL N6.5 - UP/JOINING N7 - UP/NORMAL N8 - UP/NORMAL N9 - UP/NORMAL I noticed that N6.5 is streaming from N1, N2, N6 and N7. I expect it to steam from (worst case) N5, N6, N7, N8. What could potentially cause the node to get confused about the ring? -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/* -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*
Cassandra ring not behaving like a ring
Sorry for the odd subject but something is wrong with our cassandra ring. We have a 9 node ring as below. N1 - UP/NORMAL N2 - UP/NORMAL N3 - UP/NORMAL N4 - UP/NORMAL N5 - UP/NORMAL N6 - UP/NORMAL N7 - UP/NORMAL N8 - UP/NORMAL N9 - UP/NORMAL Using random partitioner and simple snitch. Cassandra 1.1.6 in AWS. I added a new node with token that is exactly in middle of N6 and N7. So the ring displayed as following N1 - UP/NORMAL N2 - UP/NORMAL N3 - UP/NORMAL N4 - UP/NORMAL N5 - UP/NORMAL N6 - UP/NORMAL N6.5 - UP/JOINING N7 - UP/NORMAL N8 - UP/NORMAL N9 - UP/NORMAL I noticed that N6.5 is streaming from N1, N2, N6 and N7. I expect it to steam from (worst case) N5, N6, N7, N8. What could potentially cause the node to get confused about the ring? -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*
Re: Cassandra ring not behaving like a ring
RF=3. On Jan 15, 2014 1:18 PM, Andrey Ilinykh ailin...@gmail.com wrote: what is the RF? What does nodetool ring show? On Wed, Jan 15, 2014 at 1:03 PM, Narendra Sharma narendra.sha...@gmail.com wrote: Sorry for the odd subject but something is wrong with our cassandra ring. We have a 9 node ring as below. N1 - UP/NORMAL N2 - UP/NORMAL N3 - UP/NORMAL N4 - UP/NORMAL N5 - UP/NORMAL N6 - UP/NORMAL N7 - UP/NORMAL N8 - UP/NORMAL N9 - UP/NORMAL Using random partitioner and simple snitch. Cassandra 1.1.6 in AWS. I added a new node with token that is exactly in middle of N6 and N7. So the ring displayed as following N1 - UP/NORMAL N2 - UP/NORMAL N3 - UP/NORMAL N4 - UP/NORMAL N5 - UP/NORMAL N6 - UP/NORMAL N6.5 - UP/JOINING N7 - UP/NORMAL N8 - UP/NORMAL N9 - UP/NORMAL I noticed that N6.5 is streaming from N1, N2, N6 and N7. I expect it to steam from (worst case) N5, N6, N7, N8. What could potentially cause the node to get confused about the ring? -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*
Cassandra 1.1.6 crash without any exception or error in log
8 node cluster running in aws. Any pointers where I should start looking? No kill -9 in history.
Re: Cassandra 1.1.6 crash without any exception or error in log
The root cause turned out to be high heap. The Linux OOM Killer ( http://linux-mm.org/OOM_Killer) killed the process. It took some time to figure out but very interesting. We knew high heap is a problem but had no clue when the actual heap usage was well within limit and the process disappeared. syslog helped figure this out. About Linux OOM Killer It is the job of the linux 'oom killer' to *sacrifice* one or more processes in order to free up memory for the system when all else fails On Thu, Jan 2, 2014 at 10:38 AM, Robert Coli rc...@eventbrite.com wrote: On Thu, Jan 2, 2014 at 8:13 AM, Narendra Sharma narendra.sha...@gmail.com wrote: 8 node cluster running in aws. Any pointers where I should start looking? No kill -9 in history. You should start looking at instructions as to how to upgrade to at least the top of the 1.1 line... :D =Rob -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*
Re: Cassandra 1.1.6 crash without any exception or error in log
In this case the Java/Cassandra process never ran out of memory. Rather it had 20% heap free. It is the OS that ran out of memory. This is the side effect of running with large heap. I was aware of the Java's inefficiency wrt large heap but had to keep it due to large bloomfilter. Note we are still on 1.1.x. On Thu, Jan 2, 2014 at 10:03 PM, Nitin Sharma nitin.sha...@bloomreach.comwrote: I would recommend always running cassandra with -XX:+HeapDumpOnOutofMemoryError. This dumps out a *.hprof file if the process dies due to OOM You can later analyze the hprof files using Eclipse Memory Analyzer (Eclipse MAT http://www.eclipse.org/mat) to figure out root causes and potential leaks Hope this helps -- Nitin On Thu, Jan 2, 2014 at 9:00 PM, Narendra Sharma narendra.sha...@gmail.com wrote: The root cause turned out to be high heap. The Linux OOM Killer ( http://linux-mm.org/OOM_Killer) killed the process. It took some time to figure out but very interesting. We knew high heap is a problem but had no clue when the actual heap usage was well within limit and the process disappeared. syslog helped figure this out. About Linux OOM Killer It is the job of the linux 'oom killer' to *sacrifice* one or more processes in order to free up memory for the system when all else fails On Thu, Jan 2, 2014 at 10:38 AM, Robert Coli rc...@eventbrite.comwrote: On Thu, Jan 2, 2014 at 8:13 AM, Narendra Sharma narendra.sha...@gmail.com wrote: 8 node cluster running in aws. Any pointers where I should start looking? No kill -9 in history. You should start looking at instructions as to how to upgrade to at least the top of the 1.1 line... :D =Rob -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/* -- -- Nitin -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*
Re: Cassandra 1.1.6 - Disk usage and Load displayed in ring doesn't match
Thanks Aaron. No tmp files and not even a single exception in the system.log. If the file was last modified on 20-Nov then there must be an entry for that in the log (either completed streaming or compacted). On Tue, Dec 17, 2013 at 7:23 PM, Aaron Morton aa...@thelastpickle.comwrote: -tmp- files will sit in the data dir, if there was an error creating them during compaction or flushing to disk they will sit around until a restart. Check the logs for errors to see if compaction was failing on something. Cheers - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 17/12/2013, at 12:28 pm, Narendra Sharma narendra.sha...@gmail.com wrote: No snapshots. I restarted the node and now the Load in ring is in sync with the disk usage. Not sure what caused it to go out of sync. However, the Live SStable count doesn't match exactly with the number of data files on disk. I am going through the Cassandra code to understand what could be the reason for the mismatch in the sstable count and also why there is no reference of some of the data files in system.log. On Mon, Dec 16, 2013 at 2:45 PM, Arindam Barua aba...@247-inc.com wrote: Do you have any snapshots on the nodes where you are seeing this issue? Snapshots will link to sstables which will cause them not be deleted. -Arindam *From:* Narendra Sharma [mailto:narendra.sha...@gmail.com] *Sent:* Sunday, December 15, 2013 1:15 PM *To:* user@cassandra.apache.org *Subject:* Cassandra 1.1.6 - Disk usage and Load displayed in ring doesn't match We have 8 node cluster. Replication factor is 3. For some of the nodes the Disk usage (du -ksh .) in the data directory for CF doesn't match the Load reported in nodetool ring command. When we expanded the cluster from 4 node to 8 nodes (4 weeks back), everything was okay. Over period of last 2-3 weeks the disk usage has gone up. We increased the RF from 2 to 3 2 weeks ago. I am not sure if increasing the RF is causing this issue. For one of the nodes that I analyzed: 1. nodetool ring reported load as 575.38 GB 2. nodetool cfstats for the CF reported: SSTable count: 28 Space used (live): 572671381955 Space used (total): 572671381955 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned 46 4. 'du -ksh .' in the data folder for CF returned 720G The above numbers indicate that there are some sstables that are obsolete and are still occupying space on disk. What could be wrong? Will restarting the node help? The cassandra process is running for last 45 days with no downtime. However, because the disk usage is high, we are not able to run full compaction. Also, I can't find reference to each of the sstables on disk in the system.log file. For eg I have one data file on disk as (ls -lth): 86G Nov 20 06:14 I have system.log file with first line: INFO [main] 2013-11-18 09:41:56,120 AbstractCassandraDaemon.java (line 101) Logging initialized The 86G file must be a result of some compaction. I see no reference of data file in system.log file between 11/18 to 11/25. What could be the reason for that? The only reference is dated 11/29 when the file was being streamed to another node (new node). How can I identify the obsolete files and remove them? I am thinking about following. Let me know if it make sense. 1. Restart the node and check the state. 2. Move the oldest data files to another location (to another mount point) 3. Restart the node again 4. Run repair on the node so that it can get the missing data from its peers. I compared the numbers of a healthy node for the same CF: 1. nodetool ring reported load as 662.95 GB 2. nodetool cfstats for the CF reported: SSTable count: 16 Space used (live): 670524321067 Space used (total): 670524321067 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned 16 4. 'du -ksh .' in the data folder for CF returned 625G -Naren -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com/* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/* -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com/* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/* -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*
Re: Cassandra 1.1.6 - Disk usage and Load displayed in ring doesn't match
No snapshots. I restarted the node and now the Load in ring is in sync with the disk usage. Not sure what caused it to go out of sync. However, the Live SStable count doesn't match exactly with the number of data files on disk. I am going through the Cassandra code to understand what could be the reason for the mismatch in the sstable count and also why there is no reference of some of the data files in system.log. On Mon, Dec 16, 2013 at 2:45 PM, Arindam Barua aba...@247-inc.com wrote: Do you have any snapshots on the nodes where you are seeing this issue? Snapshots will link to sstables which will cause them not be deleted. -Arindam *From:* Narendra Sharma [mailto:narendra.sha...@gmail.com] *Sent:* Sunday, December 15, 2013 1:15 PM *To:* user@cassandra.apache.org *Subject:* Cassandra 1.1.6 - Disk usage and Load displayed in ring doesn't match We have 8 node cluster. Replication factor is 3. For some of the nodes the Disk usage (du -ksh .) in the data directory for CF doesn't match the Load reported in nodetool ring command. When we expanded the cluster from 4 node to 8 nodes (4 weeks back), everything was okay. Over period of last 2-3 weeks the disk usage has gone up. We increased the RF from 2 to 3 2 weeks ago. I am not sure if increasing the RF is causing this issue. For one of the nodes that I analyzed: 1. nodetool ring reported load as 575.38 GB 2. nodetool cfstats for the CF reported: SSTable count: 28 Space used (live): 572671381955 Space used (total): 572671381955 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned 46 4. 'du -ksh .' in the data folder for CF returned 720G The above numbers indicate that there are some sstables that are obsolete and are still occupying space on disk. What could be wrong? Will restarting the node help? The cassandra process is running for last 45 days with no downtime. However, because the disk usage is high, we are not able to run full compaction. Also, I can't find reference to each of the sstables on disk in the system.log file. For eg I have one data file on disk as (ls -lth): 86G Nov 20 06:14 I have system.log file with first line: INFO [main] 2013-11-18 09:41:56,120 AbstractCassandraDaemon.java (line 101) Logging initialized The 86G file must be a result of some compaction. I see no reference of data file in system.log file between 11/18 to 11/25. What could be the reason for that? The only reference is dated 11/29 when the file was being streamed to another node (new node). How can I identify the obsolete files and remove them? I am thinking about following. Let me know if it make sense. 1. Restart the node and check the state. 2. Move the oldest data files to another location (to another mount point) 3. Restart the node again 4. Run repair on the node so that it can get the missing data from its peers. I compared the numbers of a healthy node for the same CF: 1. nodetool ring reported load as 662.95 GB 2. nodetool cfstats for the CF reported: SSTable count: 16 Space used (live): 670524321067 Space used (total): 670524321067 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned 16 4. 'du -ksh .' in the data folder for CF returned 625G -Naren -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/* -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*
Cassandra 1.1.6 - Disk usage and Load displayed in ring doesn't match
We have 8 node cluster. Replication factor is 3. For some of the nodes the Disk usage (du -ksh .) in the data directory for CF doesn't match the Load reported in nodetool ring command. When we expanded the cluster from 4 node to 8 nodes (4 weeks back), everything was okay. Over period of last 2-3 weeks the disk usage has gone up. We increased the RF from 2 to 3 2 weeks ago. I am not sure if increasing the RF is causing this issue. For one of the nodes that I analyzed: 1. nodetool ring reported load as 575.38 GB 2. nodetool cfstats for the CF reported: SSTable count: 28 Space used (live): 572671381955 Space used (total): 572671381955 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned 46 4. 'du -ksh .' in the data folder for CF returned 720G The above numbers indicate that there are some sstables that are obsolete and are still occupying space on disk. What could be wrong? Will restarting the node help? The cassandra process is running for last 45 days with no downtime. However, because the disk usage is high, we are not able to run full compaction. Also, I can't find reference to each of the sstables on disk in the system.log file. For eg I have one data file on disk as (ls -lth): 86G Nov 20 06:14 I have system.log file with first line: INFO [main] 2013-11-18 09:41:56,120 AbstractCassandraDaemon.java (line 101) Logging initialized The 86G file must be a result of some compaction. I see no reference of data file in system.log file between 11/18 to 11/25. What could be the reason for that? The only reference is dated 11/29 when the file was being streamed to another node (new node). How can I identify the obsolete files and remove them? I am thinking about following. Let me know if it make sense. 1. Restart the node and check the state. 2. Move the oldest data files to another location (to another mount point) 3. Restart the node again 4. Run repair on the node so that it can get the missing data from its peers. I compared the numbers of a healthy node for the same CF: 1. nodetool ring reported load as 662.95 GB 2. nodetool cfstats for the CF reported: SSTable count: 16 Space used (live): 670524321067 Space used (total): 670524321067 3. 'ls -1 *Data* | wc -l' in the data folder for CF returned 16 4. 'du -ksh .' in the data folder for CF returned 625G -Naren -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.aeris.com* *http://narendrasharma.blogspot.com/ http://narendrasharma.blogspot.com/*
Re: Cassandra 1.1.6 - New node bootstrap not completing
I was successfully able to bootstrap the node. The issue was RF 2. Thanks again Robert. On Wed, Oct 30, 2013 at 10:29 AM, Narendra Sharma narendra.sha...@gmail.com wrote: Thanks Robert. I didn't realize that some of the keyspaces (not all and esp. the biggest one I was focusing on) had RF 2. I wasted 3 days on it. Thanks again for the pointers. I will try again and share the results. On Wed, Oct 30, 2013 at 12:28 AM, Robert Coli rc...@eventbrite.comwrote: On Tue, Oct 29, 2013 at 11:45 AM, Narendra Sharma narendra.sha...@gmail.com wrote: We had a cluster of 4 nodes in AWS. The average load on each node was approx 750GB. We added 4 new nodes. It is now more than 30 hours and the node is still in JOINING mode. Specifically I am analyzing the one with IP 10.3.1.29. There is no compaction or streaming or index building happening. If your cluster has RF2, you are bootstrapping two nodes into the same range simultaneously. That is not supported. [1,2] The node you are having the problem with is in the range that is probably overlapping. If I were you I would : 1) stop all Joining nodes and wipe their state including system keyspace 2) optionally removetoken any nodes which remain in cluster gossip state after stopping 3) re-start/bootstrap them one at a time, waiting for each to complete bootstrapping before starting the next one 4) (unrelated) Upgrade from 1.1.6 to the head of 1.1.x ASAP. =Rob [1] https://issues.apache.org/jira/browse/CASSANDRA-2434 [2] https://issues.apache.org/jira/browse/CASSANDRA-2434?focusedCommentId=13091851page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13091851 -- Narendra Sharma Software Engineer *http://www.aeris.com* *http://narendrasharma.blogspot.com/* -- Narendra Sharma Software Engineer *http://www.aeris.com* *http://narendrasharma.blogspot.com/*
Cassandra 1.1.6 - New node bootstrap not completing
We had a cluster of 4 nodes in AWS. The average load on each node was approx 750GB. We added 4 new nodes. It is now more than 30 hours and the node is still in JOINING mode. Specifically I am analyzing the one with IP 10.3.1.29. There is no compaction or streaming or index building happening. $ ./nodetool ring Note: Ownership information does not include topology, please specify a keyspace. Address DC RackStatus State LoadOwns Token 148873535527910577765226390751398592512 10.3.1.179 datacenter1 rack1 Up Normal 740.41 GB 25.00% 0 10.3.1.29 datacenter1 rack1 Up Joining 562.49 GB 0.00% 21267647932558653966460912964485513215 10.3.1.175 datacenter1 rack1 Up Normal 755.7 GB 25.00% 42535295865117307932921825928971026431 10.3.1.30 datacenter1 rack1 Up Joining 565.68 GB 0.00% 63802943797675961899382738893456539648 10.3.1.177 datacenter1 rack1 Up Normal 754.18 GB 25.00% 85070591730234615865843651857942052863 10.3.1.135 datacenter1 rack1 Up Normal 95.97 GB 20.87% 120580289963820081458352857409882669785 10.3.1.178 datacenter1 rack1 Up Normal 747.53 GB 4.13% 127605887595351923798765477786913079295 10.3.1.24 datacenter1 rack1 Up Joining 522.09 GB 0.00% 148873535527910577765226390751398592512 $ ./nodetool netstats Mode: JOINING Not sending any streams. Nothing streaming from /10.3.1.177 Nothing streaming from /10.3.1.179 Pool NameActive Pending Completed Commandsn/a 0 82 Responses n/a 0 40135123 $ ./nodetool compactionStats pending tasks: 0 Active compaction remaining time :n/a $ ./nodetool info Token: 21267647932558653966460912964485513215 Gossip active: true Thrift active: false Load : 562.49 GB Generation No: 1382981644 Uptime (seconds) : 90340 Heap Memory (MB) : 9298.59 / 13272.00 Data Center : datacenter1 Rack : rack1 Exceptions : 2 Key Cache: size 104857584 (bytes), capacity 104857584 (bytes), 187373 hits, 94709046 requests, 0.002 recent hit rate, 14400 save period in seconds Row Cache: size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds The 2 Exceptions in info output are the ones that were logged when I stopped index build to let bootstrap complete faster. Any clue whats wrong and where should I look for to further analyze the issue? I haven't restarted the Cassandra process. I am afraid the node will start bootstrap again if I restart the node. Thanks, Naren -- Narendra Sharma Software Engineer *http://www.aeris.com* *http://narendrasharma.blogspot.com/*
Re: Cassandra 1.1.6 - New node bootstrap not completing
Thanks Robert. I didn't realize that some of the keyspaces (not all and esp. the biggest one I was focusing on) had RF 2. I wasted 3 days on it. Thanks again for the pointers. I will try again and share the results. On Wed, Oct 30, 2013 at 12:28 AM, Robert Coli rc...@eventbrite.com wrote: On Tue, Oct 29, 2013 at 11:45 AM, Narendra Sharma narendra.sha...@gmail.com wrote: We had a cluster of 4 nodes in AWS. The average load on each node was approx 750GB. We added 4 new nodes. It is now more than 30 hours and the node is still in JOINING mode. Specifically I am analyzing the one with IP 10.3.1.29. There is no compaction or streaming or index building happening. If your cluster has RF2, you are bootstrapping two nodes into the same range simultaneously. That is not supported. [1,2] The node you are having the problem with is in the range that is probably overlapping. If I were you I would : 1) stop all Joining nodes and wipe their state including system keyspace 2) optionally removetoken any nodes which remain in cluster gossip state after stopping 3) re-start/bootstrap them one at a time, waiting for each to complete bootstrapping before starting the next one 4) (unrelated) Upgrade from 1.1.6 to the head of 1.1.x ASAP. =Rob [1] https://issues.apache.org/jira/browse/CASSANDRA-2434 [2] https://issues.apache.org/jira/browse/CASSANDRA-2434?focusedCommentId=13091851page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13091851 -- Narendra Sharma Software Engineer *http://www.aeris.com* *http://narendrasharma.blogspot.com/*
Re: Querying for rows without a particular column
This is an interesting usecase. If you implement it correctly then you may end up getting all the rows in your cluster for certain bad queries :)...so be careful. I would ask why do you want to know such rows and what will you do with them? -Naren On Mon, Feb 13, 2012 at 12:16 PM, Asankha C. Perera asan...@apache.orgwrote: Hi All I am using expiring columns in my column family, and need to search for the rows where a particular column expired (and no longer exists).. I am using Hector client. How can I make a query to find the rows of my interest? thanks asankha -- Asankha C. Perera AdroitLogic, http://adroitlogic.org http://esbmagic.blogspot.com -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: Implications of length of column names
It is good to have short column names. They save space all the way from network transfer to in-memory usage to storage. It is also good idea to club immutables columns that are read together and store as single column. We gained significant overall performance benefits with this. -Naren On Fri, Feb 10, 2012 at 12:20 PM, Drew Kutcharian d...@venarc.com wrote: What are the implications of using short vs long column names? Is it better to use short column names or longer ones? I know for MongoDB you are better of using short field names http://www.mongodb.org/display/DOCS/Optimizing+Storage+of+Small+Objects Does this apply to Cassandra column names? -- Drew -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: Deleting a column vs setting it's value to empty
IMO deleting is always better. It is better to not store the column if there is no value associated. -Naren On Fri, Feb 10, 2012 at 12:15 PM, Drew Kutcharian d...@venarc.com wrote: Hi Everyone, Let's say I have the following object which I would like to save in Cassandra: class User { UUID id; //row key String name; //columnKey: name, columnValue: the name of the user String description; //columnKey: description, columnValue: the description of the user } Description can be nullable. What's the best approach when a user updates her description and sets it to null? Should I delete the description column or set it to an empty string? In addition, if I go with the delete column strategy, since I don't know what was the previous value of description (the column could not even exist), what would happen when I delete a non existent column? Thanks, Drew -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: Unbalanced cluster with RandomPartitioner
I believe you need to move the nodes on the ring. What was the load on the nodes before you added 5 new nodes? Its just that you are getting data in certain token range more than others. -Naren On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach marcel.steinb...@chors.de wrote: On 18.01.2012, at 02:19, Maki Watanabe wrote: Are there any significant difference of number of sstables on each nodes? No, no significant difference there. Actually, node 8 is among those with more sstables but with the least load (20GB) On 17.01.2012, at 20:14, Jeremiah Jordan wrote: Are you deleting data or using TTL's? Expired/deleted data won't go away until the sstable holding it is compacted. So if compaction has happened on some nodes, but not on others, you will see this. The disparity is pretty big 400Gb to 20GB, so this probably isn't the issue, but with our data using TTL's if I run major compactions a couple times on that column family it can shrink ~30%-40%. Yes, we do delete data. But I agree, the disparity is too big to blame only the deletions. Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks ago. After adding the node, we did compactions and cleanups and didn't have a balanced cluster. So that should have removed outdated data, right? 2012/1/18 Marcel Steinbach marcel.steinb...@chors.de: We are running regular repairs, so I don't think that's the problem. And the data dir sizes match approx. the load from the nodetool. Thanks for the advise, though. Our keys are digits only, and all contain a few zeros at the same offsets. I'm not that familiar with the md5 algorithm, but I doubt that it would generate 'hotspots' for those kind of keys, right? On 17.01.2012, at 17:34, Mohit Anchlia wrote: Have you tried running repair first on each node? Also, verify using df -h on the data dirs On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach marcel.steinb...@chors.de wrote: Hi, we're using RP and have each node assigned the same amount of the token space. The cluster looks like that: Address Status State LoadOwnsToken 205648943402372032879374446248852460236 1 Up Normal 310.83 GB 12.50% 56775407874461455114148055497453867724 2 Up Normal 470.24 GB 12.50% 78043055807020109080608968461939380940 3 Up Normal 271.57 GB 12.50% 99310703739578763047069881426424894156 4 Up Normal 282.61 GB 12.50% 120578351672137417013530794390910407372 5 Up Normal 248.76 GB 12.50% 141845999604696070979991707355395920588 6 Up Normal 164.12 GB 12.50% 163113647537254724946452620319881433804 7 Up Normal 76.23 GB12.50% 184381295469813378912913533284366947020 8 Up Normal 19.79 GB12.50% 205648943402372032879374446248852460236 I was under the impression, the RP would distribute the load more evenly. Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single node. Should we just move the nodes so that the load is more even distributed, or is there something off that needs to be fixed first? Thanks Marcel hr style=border-color:blue pchors GmbH brhr style=border-color:blue pspecialists in digital and direct marketing solutionsbr Haid-und-Neu-Straße 7br 76131 Karlsruhe, Germanybr www.chors.com/p pManaging Directors: Dr. Volker Hatz, Markus PlattnerbrAmtsgericht Montabaur, HRB 15029/p p style=font-size:9pxThis e-mail is for the intended recipient only and may contain confidential or privileged information. If you have received this e-mail by mistake, please contact us immediately and completely delete it (and any attachments) and do not forward it or inform any other person of its contents. If you send us messages by e-mail, we take this as your authorization to correspond with you by e-mail. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Neither chors GmbH nor the sender accept liability for any errors or omissions in the content of this message which arise as a result of its e-mail transmission. Please note that all e-mail communications to and from chors GmbH may be monitored./p -- w3m -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: How to reliably achieve unique constraints with Cassandra?
It's very surprising that no one seems to have solved such a common use case. I would say people have solved it using RIGHT tools for the task. On Fri, Jan 6, 2012 at 2:35 PM, Drew Kutcharian d...@venarc.com wrote: Thanks everyone for the replies. Seems like there is no easy way to handle this. It's very surprising that no one seems to have solved such a common use case. -- Drew On Jan 6, 2012, at 2:11 PM, Bryce Allen wrote: That's a good question, and I'm not sure - I'm fairly new to both ZK and Cassandra. I found this wiki page: http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios and I think the lock recipe still works, even if a stale read happens. Assuming that wiki page is correct. There is still subtlety to locking with ZK though, see (Locks based on ephemeral nodes) from the zk mailing list in October: http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0 -Bryce On Fri, 6 Jan 2012 13:36:52 -0800 Drew Kutcharian d...@venarc.com wrote: Bryce, I'm not sure about ZooKeeper, but I know if you have a partition between HazelCast nodes, than the nodes can acquire the same lock independently in each divided partition. How does ZooKeeper handle this situation? -- Drew On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote: On Fri, 6 Jan 2012 10:03:38 -0800 Drew Kutcharian d...@venarc.com wrote: I know that this can be done using a lock manager such as ZooKeeper or HazelCast, but the issue with using either of them is that if ZooKeeper or HazelCast is down, then you can't be sure about the reliability of the lock. So this potentially, in the very rare instance where the lock manager is down and two users are registering with the same email, can cause major issues. For most applications, if the lock managers is down, you don't acquire the lock, so you don't enter the critical section. Rather than allowing inconsistency, you become unavailable (at least to writes that require a lock). -Bryce -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: How to reliably achieve unique constraints with Cassandra?
Instead of trying to solve the generic problem of uniqueness, I would focus on the specific problem. For eg lets consider your usecase of user registration with email address as key. You can do following: 1. Create CF (Users) where row key is UUID and has user info specific columns. 2. Whenever user registers create a row in this CF with user status flag as waiting for confirmation. 3. Send email to the user's email address with link that contains the UUID (or encrypted UUID) 4. When user clicks on the link, use the UUID (or decrypted UUID) to lookup user 5. If the user exists with given UUID and status as waiting for confirmation then update the status and create a entry in another CF (EmailUUIDIndex) representing email address to UUID mapping. 6. For authentication you can lookup in the index to get UUID and proceed. 7. If a malicious user registers with someone else's email id then he will never be able to confirm and will never have an entry in EmailUUIDIndex. As a additional check if the entry for email id exists in EmailUUIDIndex then the request for registration can be rejected right away. Make sense? -Naren On Fri, Jan 6, 2012 at 4:00 PM, Drew Kutcharian d...@venarc.com wrote: So what are the common RIGHT solutions/tools for this? On Jan 6, 2012, at 2:46 PM, Narendra Sharma wrote: It's very surprising that no one seems to have solved such a common use case. I would say people have solved it using RIGHT tools for the task. On Fri, Jan 6, 2012 at 2:35 PM, Drew Kutcharian d...@venarc.com wrote: Thanks everyone for the replies. Seems like there is no easy way to handle this. It's very surprising that no one seems to have solved such a common use case. -- Drew On Jan 6, 2012, at 2:11 PM, Bryce Allen wrote: That's a good question, and I'm not sure - I'm fairly new to both ZK and Cassandra. I found this wiki page: http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios and I think the lock recipe still works, even if a stale read happens. Assuming that wiki page is correct. There is still subtlety to locking with ZK though, see (Locks based on ephemeral nodes) from the zk mailing list in October: http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0 -Bryce On Fri, 6 Jan 2012 13:36:52 -0800 Drew Kutcharian d...@venarc.com wrote: Bryce, I'm not sure about ZooKeeper, but I know if you have a partition between HazelCast nodes, than the nodes can acquire the same lock independently in each divided partition. How does ZooKeeper handle this situation? -- Drew On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote: On Fri, 6 Jan 2012 10:03:38 -0800 Drew Kutcharian d...@venarc.com wrote: I know that this can be done using a lock manager such as ZooKeeper or HazelCast, but the issue with using either of them is that if ZooKeeper or HazelCast is down, then you can't be sure about the reliability of the lock. So this potentially, in the very rare instance where the lock manager is down and two users are registering with the same email, can cause major issues. For most applications, if the lock managers is down, you don't acquire the lock, so you don't enter the critical section. Rather than allowing inconsistency, you become unavailable (at least to writes that require a lock). -Bryce -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.persistentsys.com/* *http://narendrasharma.blogspot.com/* -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: Cassandra memory usage
See http://wiki.apache.org/cassandra/FAQ#mmap Also, the discussion on http://comments.gmane.org/gmane.comp.db.cassandra.user/14080 Hopefully these will answer your question. -Naren On Tue, Jan 3, 2012 at 12:53 PM, Daning Wang dan...@netseer.com wrote: I have Cassandra server which has JVM setting -Xms4G -Xmx4G, but why top reports 15G RES memory and 11G SHR memory usage? I understand that -Xmx4G is only for the heap size. but it is strange that OS reports 2.5 times memory usage. Are there a lot of memory used by JNI? Please help to explain this. cassy 2549 39.7 66.1 163805536 16324648 ? Sl Jan02 338:48 /usr/local/cassy/java/current/bin/java -ea -javaagent:./../lib/jamm-0.2.2.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42* -Xms4G -Xmx4G -Xmn1G*-XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=10 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dmx4jport=8085 -Djava.rmi.server.hostname=10.210.101.106 -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dpasswd.properties=./../conf/passwd.properties -cp ./../conf:./../build/classes/main:./../build/classes/thrift:./../lib/antlr-3.2.jar:./../lib/apache-cassandra-0.8.6.jar:./../lib/apache-cassandra-thrift-0.8.6.jar:./../lib/avro-1.4.0-fixes.jar:./../lib/avro-1.4.0-sources-fixes.jar:./../lib/commons-cli-1.1.jar:./../lib/commons-codec-1.2.jar:./../lib/commons-collections-3.2.1.jar:./../lib/commons-lang-2.4.jar:./../lib/concurrentlinkedhashmap-lru-1.1.jar:./../lib/guava-r08.jar:./../lib/high-scale-lib-1.1.2.jar:./../lib/jackson-core-asl-1.4.0.jar:./../lib/jackson-mapper-asl-1.4.0.jar:./../lib/jamm-0.2.2.jar:./../lib/jline-0.9.94.jar:./../lib/jna.jar:./../lib/json-simple-1.1.jar:./../lib/libthrift-0.6.jar:./../lib/log4j-1.2.16.jar:./../lib/mx4j-tools.jar:./../lib/servlet-api-2.5-20081211.jar:./../lib/slf4j-api-1.6.1.jar:./../lib/slf4j-log4j12-1.6.1.jar:./../lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon Top PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 2549 cassy 21 0 156g * 15g 11g *S 66.9 65.5 338:02.72 java Thank you in advance, Daning -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: How to convert start_token,end_token to real key value?
A token is a MD5 hash (one way hash). You cannot compute the key given a token. You can however compute MD5 hash of your keys and compare them with tokens. -Naren On Sat, Dec 31, 2011 at 2:07 PM, ravikumar visweswara talk2had...@gmail.com wrote: Hello All, I have requirement to copy data from cassandra to hadoop from/to a specific key. This is supported in 1.0.0. But I am using cassandra version 0.7.1 and hadoop version 20.2. In my mapreduce job(InputFormat class) i have an object of TokenRange. I need to filter certain ranges based on some exclusion rules. i have readable key range to include. Could some one help me on how to convert start_token and end_token to readable format and compare with my input keys (range)? I know that 1.0.0 have better capabilities to specify keyRanges in hadoop mapreduce. But for now, i will have to work with 0.7.1 Thanks and Regards Ravi -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: Unable to add columns to empty row in Column family: Cassandra
Can u share the code? On Mon, May 2, 2011 at 11:34 PM, anuya joshi anu...@gmail.com wrote: Hello, I am using Cassandra for my application.My Cassandra client uses Thrift APIs directly. The problem I am facing currently is as follows: 1) I added a row and columns in it dynamically via Thrift API Client 2) Next, I used command line client to delete row which actually deleted all the columns in it, leaving empty row with original row id. 3) Now, I am trying to add columns dynamically using client program into this empty row with same row key However, columns are not being inserted. But, when tried from command line client, it worked correctly. Any pointer on this would be of great use Thanks in advance, Regards, Anuya -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: network topology issue
My understanding is that the replication factor is for the entire ring. Even if you have 2 DCs the nodes are part of the same ring. What you get additionally from NTS is that you can specify how many replicas to place in each DC. So RF = 1 and DC1:1, DC2:1 looks incorrect to me. What is possible with NTS is following: RF=3, DC1=1, DC2=2 Would wait for others comments to see if my understand is correct. -Naren On Wed, May 11, 2011 at 5:41 PM, Anurag Gujral anurag.guj...@gmail.comwrote: Thanks Sameer for your answer. I am using two DCs DC1 , DC2 with both having one node each, my straegy_options values are DC1:1,DC2:1 I am not sure what my RF should be , should it be 1 or 2? Please Advise Thanks Anurag On Wed, May 11, 2011 at 5:27 PM, Sameer Farooqui cassandral...@gmail.comwrote: Anurag, The Cassandra ring spans datacenters, so you can't use token 0 on both nodes. Cassandra’s ring is from 0 to 2**127 in size. Try assigning one node the token of 0 and the second node 8.50705917 × 10^37 (input this as a single long number). To add a new keyspace in 0.8, run this from the CLI: create keyspace KEYSPACENAME with placement_strategy = org.apache.Cassandra.locator.NetworkTopologyStrategy' and strategy_options = [{replication_factor:2}]; If using 0.7, run help create keyspace; from the CLI and it'll show you the correct syntax. More info on tokens: http://journal.paul.querna.org/articles/2010/09/24/cassandra-token-selection/ http://journal.paul.querna.org/articles/2010/09/24/cassandra-token-selection/ http://wiki.apache.org/cassandra/Operations#Token_selection On Wed, May 11, 2011 at 4:58 PM, Anurag Gujral anurag.guj...@gmail.comwrote: Hi All, I am testing network topology strategy in cassandra I am using two nodes , one node each in different data center. Since the nodes are in different dc I assigned token 0 to both the nodes. I added both the nodes as seeds in the cassandra.yaml and I am using properyfilesnitch as endpoint snitch where I have specified the colo details. I started first node then I when I restarted second node I got an error that token 0 is already being used.Why am I getting this error. Second Question: I already have cassandra running in two different data centers I want to add a new keyspace which uses networkTopology strategy in the light of above errors how can I accomplish this. Thanks Anurag -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: Newbie question
You can have only one ordering defined in a CF. Super CF will allow you to have nested ordering i.e. SC can have one ordering whereas columns within SC can have other ordering. Note this is defined at CF level and cannot be defined at SC level. To model what you are trying to do, you can check if secondary indexes will be useful (assuming you have standard CF). If not you can create another CF that will just keep NAME as column name and ID as column value. This will ensure ordering by NAME and pointer to original column (or SC depending on your schema). The downside is you will need to run 2 queries to get the data. -Naren On Tue, May 10, 2011 at 6:33 AM, Sam Ganesan sam.gane...@motorola.comwrote: All: A newbie question to the aficianados. I understand that I can stipulate an ordering mechanism when I create a column family to reflect what I am querying in the long run. Generally I need to query a particular column space that I am contructing based on two different columns. The frequency of these queries is not that different from each other. I query based on a numberical ID or a name with equal frequency. What is the recommended way of approaching this problem Regards Sam *__ Sam Ganesan Ph.D. Distinguished member, Technical Staff Motorola Mobility - On Demand Video 900 Chelmsford Street, Lowell, MA 01851 tel:+1 978 614-3165 (changed) mob:+1 978 328-7132 mailto: sam.gane...@motorola.com* -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: cassandra not reading keyspaces defined in cassandra.yaml
Look for Where are my keyspaces? on following page: *http://wiki.apache.org/cassandra/StorageConfiguration * On Mon, May 9, 2011 at 5:51 PM, Anurag Gujral anurag.guj...@gmail.comwrote: Hi All, I have following in my cassandra.yaml keyspaces: - column_families: - column_metadata: [] column_type: Standard compare_with: BytesType gc_grace_seconds: 86400 key_cache_save_period_in_seconds: 14400 keys_cached: 0.0 max_compaction_threshold: 32 memtable_flush_after_mins: 1440 memtable_operations_in_millions: 100.0 memtable_throughput_in_mb: 256 min_compaction_threshold: 4 name: data read_repair_chance: 1.0 row_cache_save_period_in_seconds: 0 rows_cached: 1000 name: offline replica_placement_strategy: org.apache.cassandra.locator.RackUnawareStrategy replication_factor: 1 Cassandra starts properly without giving any warnngs/error but does not create the keyspace offline which is defined above. Please suggest. Thanks Anurag -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: Manual Conflict Resolution in Cassandra
At t8 The request would not start as the CL level of nodes is not available, the write would not be written to node X. The client would get an UnavailableException. In response it should connect to a new coordinator and try again. [Naren] There may (and most likely there will be) be a window when CL will be satisfied while write will still fail because the node is actually down. There are lot of possible scenarios here. I believe Milind is talking about some extreme but likely cases. On Sat, Apr 23, 2011 at 7:31 PM, aaron morton aa...@thelastpickle.comwrote: Have not read the whole thing just the time line. Couple of issues... At t8 The request would not start as the CL level of nodes is not available, the write would not be written to node X. The client would get an UnavailableException. In response it should connect to a new coordinator and try again. At t12 if RR is enabled for the request the read is sent to all UP endpoints for the key. Once CL requests have returned (including the data / non digest request) the responses are repaired and a synchronous (to the read request) RR round is initiated. Once all the requests have responded they are compared again an async RR process is kicked off. So it seems that in a worse case scenario two round of RR are possible, one to make sure the correct data is returned for the request. And another to make sure that all UP replicas agree, as it may not be the case that all UP replicas were involved in completing the request. So as written, at t8 the write would have failed and not be stored on any nodes. So the write at t7 would not be lost. I think the crux of this example is the failure mode at t8, I'm assuming Alice is connected to node x: 1) if X is disconnected before the write starts, it will not start any write that requires Quorum CL. Write fails with Unavailable error. 2) If X disconnects from the network *after* sending the write messages, and all messages are successfully actioned (including a local write) the request will fail with a TimedOutException as CL nodes will respond. 3) If X disconnects from the cluster after sending the messages, and the messages it sends are lost but the local write succeeds. The request will fail with a TimedOutException as CL nodes will respond. In all these cases the request is considered to have failed. The client should connect to another node and try again. In the case of timeout the operation was not completed to the CL level you asked for. In the case of unavailable the operation was not started. It can look like the RR conflict resolution is a little naive here, but it's less simple when you consider another scenario. The write at t8 failed at Quorum, and in your deployment the client cannot connect to another node in the cluster, so your code drops the CL down to ONE and gets the write done. You are happy that any nodes in Alice's partition see her write, and that those in Bens partition see he's. When things get back to normal you want the most recent write to what clients consistently see, not the most popular value. The Consistency section here http://wiki.apache.org/cassandra/ArchitectureOverview says the same, it's the most recent value. I tend to think of Consistency as all clients getting the same response to the same query. Not sure if I've made things clearer, feel free to poke holes in my logic :) Hope that helps. Aaron On 23 Apr 2011, at 09:02, Edward Capriolo wrote: On Fri, Apr 22, 2011 at 4:31 PM, Milind Parikh milindpar...@gmail.com wrote: Is there a chance of getting manual conflict resolution in Cassandra? Please see attachment for why this is important in some cases. Regards Milind I think about this often. LDAP servers like SunOne have pluggable conflict resolution. I could see the read-repair algorithm being pluggable. -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: seed faq
Here are some more details that might help: 1. You are right that Seeds are referred on startup to learn about the ring. 2. It is a good idea to have more than 1 seed. Seed is not SPoF. Remember Gossip also provides eventual consistency. So if seed is missing, the new node may not have the correct view of the ring. However, after talking to other nodes it will eventually have the uptodate state of the ring. 3. In an iteration Gossiper on a node sends gossip message - To a known live node (picked randomly) - To a known dead node (based on some probability) - To a seed node (based on some probability) Thanks, Naren On Wed, Apr 20, 2011 at 7:13 PM, Maki Watanabe watanabe.m...@gmail.comwrote: I made self answered faqs on seed after reading the wiki and code. If I misunderstand something, please point out to me. == What are seeds? == Seeds, or seed nodes are the nodes which new nodes refer to on bootstrap to know ring information. When you add a new node to ring, you need to specify at least one live seed to contact. Once a node join the ring, it learns about the other nodes, so it doesn't need seed on subsequent boot. There is no special configuration for seed node itself. In stable and static ring, you can point non-seed node as seed on bootstrap though it is not recommended. Nodes in the ring tend to send Gossip message to seeds more often by design, so it is probable that seeds have most recent and updated information of the ring. ( Refer to [[ArchitectureGossip]] for more details ) == Does single seed mean single point of failure? == If you are using replicated CF on the ring, only one seed in the ring doesn't mean single point of failure. The ring can operate or boot without the seed. But it is recommended to have multiple seeds in production system to maintain the ring. Thanks -- maki -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: Starting the Cassandra server from Java (without command line)
The write up is a year old but still will give you fair idea of how to do. http://prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/ http://prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/ Thanks, Naren On Thu, Apr 14, 2011 at 10:59 AM, sam_ amin_shar...@yahoo.com wrote: Hello there, To start the Cassandra server we can use the following command in command prompt: cassandra -f I am wondering if it is possible to directly start the server inside a Java program using thrift API or a lower level class inside Cassandra implementation. The purpose of this is to be able to run JUnit tests that need to start Cassandra server in SetUp(), without the need to create a process and run cassandra from command line. Thanks, Sam -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Starting-the-Cassandra-server-from-Java-without-command-line-tp6273826p6273826.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: Cassandra 2 DC deployment
I think this is reasonable assuming you have enough backhaul to perform reads across DC if read requests hit DC2 (with one copy of data) or one replica from DC1 is down. Moreover, since you clearly stated that you would prefer availability over consistency, you should be prepared for stale reads :) On Tue, Apr 12, 2011 at 8:12 AM, Raj N raj.cassan...@gmail.com wrote: Hi experts, We are planning to deploy Cassandra in 2 datacenters. Let assume there are 3 nodes, RF=3, 2 nodes in 1 DC and 1 node in 2nd DC. Under normal operations, we would read and write at QUORUM. What we want to do though is if we lose a datacenter which has 2 nodes, DC1 in this case, we want to downgrade our consistency to ONE. Basically I am saying that whenever there is a partition, then prefer availability over consistency. In order to do this we plan to catch UnavailableException and take corrective action. So try QUORUM under normal circumstances, if unavailable try ONE. My questions - Do you guys see any flaws with this approach? What happens when DC1 comes back up and we start reading/writing at QUORUM again? Will we read stale data in this case? Thanks -Raj -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: weird error when connecting to cassandra mbean proxy
The correct object name is org.apache.cassandra.db:type=StorageProxy -Naren On Thu, Apr 7, 2011 at 4:36 PM, Anurag Gujral anurag.guj...@gmail.comwrote: Hi All, I have written a code for connecting to mbean server runnning on cassandra node. I get the following error: Exception in thread main java.lang.reflect.UndeclaredThrowableException at $Proxy1.getReadOperations(Unknown Source) at com.smeet.cassandra.CassandraJmxHttpServerMy.init(CassandraJmxHttpServerMy.java:72) at com.smeet.cassandra.CassandraJmxHttpServerMy.main(CassandraJmxHttpServerMy.java:77) Caused by: javax.management.InstanceNotFoundException: org.apache.cassandra.service:type=StorageProxy at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1118) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:679) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:672) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90) I have attached the code file. Cassandra is running on the port I am trying to connect to . Please Suggest Thanks Anurag -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: old JMX code is not working with new cassandra version
I think you need to specify the port in the JMXServiceURL. The exception indicates there is no service listening on given host and port. Also, I guess, based on 127.0.0.1, you are running the client on same m/c as Cassandra. If that is not the case then fix the host as well. You might want to look at the cassandra-env.sh file and comments it in. On Tue, Apr 5, 2011 at 5:56 PM, Anurag Gujral anurag.guj...@gmail.comwrote: Hi All, I had written code for cassandra 0.6.3 using JMX to call compaction,when I try to use that code to connect to 0.7.3 I get the following error Exception in thread main java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: java.net.ConnectException: Connection refused at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:110) at javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown Source) at javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2327) at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:279) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248) at com.bluekai.Client.doCompaction(Client.java:51) at com.bluekai.Client.main(Client.java:41) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:525) at java.net.Socket.connect(Socket.java:475) at java.net.Socket.init(Socket.java:372) at java.net.Socket.init(Socket.java:186) at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22) at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128) at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595 Any suggestions Thanks Anurag I am pasting below the structure of the code I am using,it is giving above error on when JMXConnectorFactory.connect is called. MXServiceURL url = new JMXServiceURL(service:jmx:rmi:///jndi/rmi:// + host + /jmxrmi); System.out.println(before connection=host:+host); JMXConnector jmxc = JMXConnectorFactory.connect(url, null); System.out.println(After connection); // Get an MBeanServerConnection MBeanServerConnection mbsc = jmxc.getMBeanServerConnection(); // Construct the ObjectName for the QueueSampler MXBean ObjectName mxbeanName = new ObjectName(org.apache.cassandra.db:type=ColumnFamilyStores,keyspace=+keyspace+,columnfamily=+columnfamily); // Create a dedicated proxy for the MXBean instead of // going directly through the MBean server connection ColumnFamilyStores mxbeanProxy = JMX.newMXBeanProxy(mbsc, mxbeanName, ColumnFamilyStores.class); mxbeanProxy.forceMajorCompaction(); jmxc.close(); -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: Understanding cfhistogram output
There are 6 columns in the output. *- Offset* This is the buckets. Same as values on X-axis in a graph. The unit is determined based on the other columns. *- SSTables* This represents the number of sstables accessed per read. For eg if a read operation involved accessing 3 sstables then you will find a +ve against offset 3. Most of the times the values will be against lower offset values. *- Write Latency * This represents the number of operations and their latency (micro seconds). If 100 operations took say 5 ms then you will find an entry against offset 5. This shows the distribution of number of operations across a range of latency *- Read Latency* Similar to write latency. The unit is microseconds. *- Row Size* This represents the number of rows with given size. How many rows of given size exist. *- Column Count* Similar to row size. This represents the column count. How many rows with given number of columns exist. 1. Note that these are estimates and not exact numbers. 2. The values ofcourse change over period of time. On Fri, Apr 1, 2011 at 10:21 AM, Anurag Gujral anurag.guj...@gmail.comwrote: Hi All, I ran nodetool with cfhistogram I dont fully understand the output.Can someone please shower some light on it. Thanks Anurag -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: Fatal error from a cassandra node
http://wiki.apache.org/cassandra/MemtableThresholds#JVM_Heap_Size On Wed, Mar 30, 2011 at 11:41 AM, Peter Schuller peter.schul...@infidyne.com wrote: I have 6 node cassandra cluster all are setup with same configurationI am getting fatal exceptions in one of the nodes ERROR [Thread-604] 2011-03-29 20:19:13,218 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[Thread-604,5,main] java.lang.OutOfMemoryError: Java heap space ERROR [Thread-607] 2011-03-29 19:47:29,272 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[Thread-607,5,main] java.lang.OutOfMemoryError: Java heap space ERROR [Thread-605] 2011-03-29 19:38:09,081 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[Thread-605,5,main] java.lang.OutOfMemoryError: Java heap space ERROR [MutationStage:2] 2011-03-29 19:37:16,659 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.OutOfMemoryError: Java heap space ERROR [GossipStage:1] 2011-03-29 20:27:29,898 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[GossipStage:1,5,main] java.lang.OutOfMemoryError: Java heap space All all the nodes have 32 G of ram. Everytime I try to restart the failed node I get the above errors. Unless something is outright wrong, it sounds like you need to increase your JVM heap size in cassandra-env.sh. That you're getting it on start-up sounds consistent with commit log reply filling the heap in the form of memtables that are sized too big for your heap. There's a wiki page somewhere that describes the overall rule of thumb for heap sizing, but I can't find it right now. -- / Peter Schuller -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: difference between compaction, repair, clean
Short answers: - compaction - Initiate immediate full compaction. Removes deleted data. - clean - Initiates immediate cleanup i.e. remove the data that is deleted and that doesn't belong to this node. Internally performs full compaction. - repair - Used to make different copies (replicas) of data consistent by exchanging data with with other replicas. The details on following links should be good to understand them in detail: http://www.datastax.com/docs/0.7/utilities/nodetool http://www.datastax.com/docs/0.7/utilities/nodetool http://wiki.apache.org/cassandra/NodeProbe http://wiki.apache.org/cassandra/NodeProbe http://wiki.apache.org/cassandra/Operations http://wiki.apache.org/cassandra/OperationsThanks, Naren On Wed, Mar 30, 2011 at 12:57 PM, Jonathan Colby jonathan.co...@gmail.comwrote: I'm a little unclear on the differences between the nodetool operations: - compaction - repair - clean I understand that compaction consolidates the SSTables and physically performs deletes by taking into account the Tombstones. But what does clean and repair do then? -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: Fatal error from a cassandra node
OOM at startup with 16GB... seems like an issue. Which version are you using? Can you provide some details on failed node? What exactly happened? That might give some clue. Also, you might want to start with log level set to debug to findout what more on what exactly Cassandra is doing that is causing OOM. -Naren On Wed, Mar 30, 2011 at 4:45 PM, Anurag Gujral anurag.guj...@gmail.comwrote: I am using 16G of heap space how much more should i increase. Please suggest Thanks Anurag On Wed, Mar 30, 2011 at 11:43 AM, Narendra Sharma narendra.sha...@gmail.com wrote: http://wiki.apache.org/cassandra/MemtableThresholds#JVM_Heap_Size On Wed, Mar 30, 2011 at 11:41 AM, Peter Schuller peter.schul...@infidyne.com wrote: I have 6 node cassandra cluster all are setup with same configurationI am getting fatal exceptions in one of the nodes ERROR [Thread-604] 2011-03-29 20:19:13,218 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[Thread-604,5,main] java.lang.OutOfMemoryError: Java heap space ERROR [Thread-607] 2011-03-29 19:47:29,272 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[Thread-607,5,main] java.lang.OutOfMemoryError: Java heap space ERROR [Thread-605] 2011-03-29 19:38:09,081 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[Thread-605,5,main] java.lang.OutOfMemoryError: Java heap space ERROR [MutationStage:2] 2011-03-29 19:37:16,659 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.OutOfMemoryError: Java heap space ERROR [GossipStage:1] 2011-03-29 20:27:29,898 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[GossipStage:1,5,main] java.lang.OutOfMemoryError: Java heap space All all the nodes have 32 G of ram. Everytime I try to restart the failed node I get the above errors. Unless something is outright wrong, it sounds like you need to increase your JVM heap size in cassandra-env.sh. That you're getting it on start-up sounds consistent with commit log reply filling the heap in the form of memtables that are sized too big for your heap. There's a wiki page somewhere that describes the overall rule of thumb for heap sizing, but I can't find it right now. -- / Peter Schuller -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/* -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: Cassandra error Insufficient space to compact
The space referred in the log message is disk space and not heap. So check if you are running low on disk space. Thanks, Naren On Wed, Mar 30, 2011 at 4:55 PM, Anurag Gujral anurag.guj...@gmail.comwrote: Hi All, I am getting following message from cassandra WARN [CompactionExecutor:1] 2011-03-30 18:46:33,272 CompactionManager.java (line 406) insufficient space to compact all requested files SSTableReader( I am using 16G of java heap space ,please let me know should I consider this as a sign of something which I need to worry about. Thanks Anurag -- Narendra Sharma Solution Architect *http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: cassandra client sample code for 0.7.3
Hope you find following useful. It uses raw thirft. In case you find difficulty in build and/or running the code, please reply back. private Cassandra.Client createClient(String host, int port) { TTransport framedTransport = new TFramedTransport(new TSocket(host, port)); TProtocol framedProtocol = new TBinaryProtocol(framedTransport); Cassandra.Client client = new Cassandra.Client(framedProtocol); framedTransport.open(); client.set_keyspace(Keyspace); return client; } private Mutation getMutation(SuperColumn sc) { ColumnOrSuperColumn csc = new ColumnOrSuperColumn(); csc.setSuper_column(sc); csc.setSuper_columnIsSet(true); Mutation m = new Mutation(); m.setColumn_or_supercolumn(csc); m.setColumn_or_supercolumnIsSet(true); return m; } private Mutation getMutation(Column c) { ColumnOrSuperColumn csc = new ColumnOrSuperColumn(); csc.setColumn(c); csc.setColumnIsSet(true); Mutation m = new Mutation(); m.setColumn_or_supercolumn(csc); m.setColumn_or_supercolumnIsSet(true); return m; } private Column createColumn(String name, String value, long time) { Column c = new Column(); c.setName(name.getBytes()); c.setValue(value.getBytes()); c.setTimestamp(time); return c; } Cassandra.Client client = createClient(host, port); long timeStamp = System.currentTimeMillis(); //For Standard CF Column col1 = createColumn(name1, value1, timeStamp); Column col2 = createColumn(name2, value2, timeStamp); MapString, ListMutation mutations = new HashMapString, ListMutation(); ListMutation mutation = new ArrayListMutation(); mutation.add(getMutation(col1)); mutation.add(getMutation(col2)); mutations.put(StandardCF, mutation); MapByteBuffer, MapString, ListMutation mutationMap = new HashMapByteBuffer, MapString, ListMutation(); mutationMap.put(ByteBuffer.wrap(getBytes(rowkey)), mutations); client.batch_mutate(mutationMap, CL); //for Super CF SuperColumn info = new SuperColumn(); info.setName(info); ListColumn cols = new ArrayListColumn(); cols.add(createColumn(name1, val1, timeStamp)); cols.add(createColumn(name2, val2, timeStamp)); info.setColumns(cols); MapString, ListMutation mutations = new HashMapString, ListMutation(); ListMutation mutation = new ArrayListMutation(); mutation.add(getMutation(info)); mutations.put(SuperCF, mutation); MapByteBuffer, MapString, ListMutation mutationMap = new HashMapByteBuffer, MapString, ListMutation(); mutationMap.put(ByteBuffer.wrap(getBytes(row-key)), mutations); client.batch_mutate(mutationMap, CL); Thanks, Naren On Thu, Mar 24, 2011 at 10:01 PM, Anurag Gujral anurag.guj...@gmail.comwrote: I am in need of sample code(basically cassandra client) in java using batch_mutate If someone has please reply back Thanks Anurag
Option for ordering columns by timestamp in CF
Cassandra 0.7.4 Column names in my CF are of type byte[] but I want to order columns by timestamp. What is the best way to achieve this? Does it make sense for Cassandra to support ordering of columns by timestamp as option for a column family irrespective of the column name type? Thanks, Naren
Re: ParNew (promotion failed)
I think it is due to fragmentation in old gen, due to which survivor area cannot be moved to old gen. 300MB data size of memtable looks high for 3G heap. I learned that in memory overhead of memtable can be as high as 10x of memtable data size in memory. So either increase the heap or reduce the memtable thresholds further so that old gen gets freed up faster. With 16CFs, I would do both i.e. increase the heap to say 4GB and reduce memtable thresholds further. -Naren On Wed, Mar 23, 2011 at 8:18 AM, ruslan usifov ruslan.usi...@gmail.comwrote: Hello Sometimes i seen in gc log follow message: 2011-03-23T14:40:56.049+0300: 14897.104: [GC 14897.104: [ParNew (promotion failed) Desired survivor size 41943040 bytes, new threshold 2 (max 2) - age 1:5573024 bytes,5573024 total - age 2:5064608 bytes, 10637632 total : 672577K-670749K(737280K), 0.1837950 secs]14897.288: [CMS: 1602487K-779310K(2326528K), 4.7525580 secs] 2270940K-779310K(3063808K), [ CMS Perm : 20073K-19913K(33420K)], 4.9365810 secs] [Times: user=5.06 sys=0.00, real=4.93 secs] Total time for which application threads were stopped: 4.9378750 seconds How can i minimize they frequency, or disable? May current workload is a many small objects (about 200 bytes long), and summary of my memtables about 300 MB (16 CF). My heap is 3G,
Re: ParNew (promotion failed)
I understand that. The overhead could be as high as 10x of memtable data size. So overall the overhead for 16CF collectively in your case could be 300*10 = 3G. Thanks, Naren On Wed, Mar 23, 2011 at 11:18 AM, ruslan usifov ruslan.usi...@gmail.comwrote: 2011/3/23 Narendra Sharma narendra.sha...@gmail.com I think it is due to fragmentation in old gen, due to which survivor area cannot be moved to old gen. 300MB data size of memtable looks high for 3G heap. I learned that in memory overhead of memtable can be as high as 10x of memtable data size in memory. So either increase the heap or reduce the memtable thresholds further so that old gen gets freed up faster. With 16CFs, I would do both i.e. increase the heap to say 4GB and reduce memtable thresholds further. I think that you don't undestend me, 300MB is a summary thresholds on all 16 CF, so one memtable_threshold is about 18MB. Or all the same it is necessary to reduce memtable_threshold?
Re: ParNew (promotion failed)
I haven't used G1. I remember someone shared his experience in detail on G1. The bottom line is you need to test it for your deployment and based on test and results conclude if it will work for you. I believe for a small heap G1 will do well. -Naren On Wed, Mar 23, 2011 at 1:47 PM, ruslan usifov ruslan.usi...@gmail.comwrote: 2011/3/23 Narendra Sharma narendra.sha...@gmail.com I understand that. The overhead could be as high as 10x of memtable data size. So overall the overhead for 16CF collectively in your case could be 300*10 = 3G. And how about G1 GC, it must prevent memory fragmentation. but some post on this email, told that it is not so good as it described. What do you think about it?
Re: How to find what node a key is on
The logic to find the node is not complicated. You compute the MD5 hash of the key. Create sorted list of tokens assigned to the nodes in the ring. Find the first token greater than the hash. This is the first node. Next in the list is the replica, which depends on the RF. Now this is simple because this assumes SimpleStrategy for replica placement. For other strategies finding replicas will be more involved. Cassandra is a distributed databases. Each node is aware of the state of the cluster and token distribution. Moving the logic into client is possible but the benefits are way less compared to pain. At the same time doing it for a large cluster would be more painful. I would discourage you from going that route. Thanks, Naren On Wed, Mar 23, 2011 at 5:16 PM, Sameer Farooqui cassandral...@gmail.comwrote: No problems with read performance, just curious about what kind of overhead was being added b/c we're doing read tests. If it's easy to figure out where the row is stored, I'd be interested in trying it. If not, don't worry about it. - Sameer On Wed, Mar 23, 2011 at 4:31 PM, aaron morton aa...@thelastpickle.comwrote: Each row is stored on RF nodes, and your read will be sent to CL number of nodes. Messages only take a single hop from the coordinator to each node the read is performed on, so the networking overhead varies with the number of nodes involved in the request. There are man factors other than networking that influence the speed of a read request. There are features available to determine which nodes holds replicas for a particular key. AFAIK they are not intended for use by clients. Are you currently having problems with read performance ? Hope that helps. Aaron On 24 Mar 2011, at 11:53, Sameer Farooqui wrote: Does anybody know if it's possible to find out what node a specific key/row lives on? We have a 30 node cluster and I'm curious how much faster it'll be to read data directly from the node that stores the data. We're using random partitioner, by the way. *Sameer Farooqui *Accenture Technology Labs
Re: getting exception when cassandra 0.7.3 is starting
Is this new install or upgrade? Thanks, Naren On Wed, Mar 16, 2011 at 11:15 PM, Anurag Gujral anurag.guj...@gmail.comwrote: I am getting exception when starting cassandra 0.7.3 ERROR 01:10:48,321 Exception encountered during startup. java.lang.NegativeArraySizeException at org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:274) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:213) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:466) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:447) at org.apache.cassandra.db.Table.initCf(Table.java:317) at org.apache.cassandra.db.Table.init(Table.java:254) at org.apache.cassandra.db.Table.open(Table.java:110) at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:207) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:129) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:316) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79) Exception encountered during startup. java.lang.NegativeArraySizeException at org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:274) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:213) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:466) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:447) at org.apache.cassandra.db.Table.initCf(Table.java:317) at org.apache.cassandra.db.Table.init(Table.java:254) at org.apache.cassandra.db.Table.open(Table.java:110) at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:207) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:129) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:316) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79)
Re: Pauses of GC
What heap size are you running with? and Which version of Cassandra? Thanks, Naren On Thu, Mar 17, 2011 at 3:45 AM, ruslan usifov ruslan.usi...@gmail.comwrote: Hello Some times i have very long GC pauses: Total time for which application threads were stopped: 0.0303150 seconds 2011-03-17T13:19:56.476+0300: 33295.671: [GC 33295.671: [ParNew: 678855K-20708K(737280K), 0.0271230 secs] 1457643K-806795K(4112384K), 0.027305 0 secs] [Times: user=0.33 sys=0.00, real=0.03 secs] Total time for which application threads were stopped: 0.0291820 seconds 2011-03-17T13:20:32.962+0300: 2.157: [GC 2.157: [ParNew: 676068K-23527K(737280K), 0.0302180 secs] 1462155K-817599K(4112384K), 0.030402 0 secs] [Times: user=0.31 sys=0.00, real=0.03 secs] Total time for which application threads were stopped: 0.1270270 seconds 2011-03-17T13:21:11.908+0300: 33371.103: [GC 33371.103: [ParNew: 678887K-21564K(737280K), 0.0268160 secs] 1472959K-823191K(4112384K), 0.027011 0 secs] [Times: user=0.28 sys=0.00, real=0.03 secs] Total time for which application threads were stopped: 0.0293330 seconds 2011-03-17T13:21:50.482+0300: 33409.677: [GC 33409.677: [ParNew: 676924K-21115K(737280K), 0.0281720 secs] 1478551K-829900K(4112384K), 0.028363 0 secs] [Times: user=0.27 sys=0.00, real=0.03 secs] Total time for which application threads were stopped: 0.0339610 seconds 2011-03-17T13:22:32.849+0300: 33452.044: [GC 33452.044: [ParNew: 676475K-25948K(737280K), 0.0317600 secs] 1485260K-842061K(4112384K), 0.031952 0 secs] [Times: user=0.22 sys=0.00, real=0.03 secs] Total time for which application threads were stopped: 0.0344430 seconds 2011-03-17T13:23:14.924+0300: 33494.119: [GC 33494.119: [ParNew: 681308K-25087K(737280K), 0.0282600 secs] 1497421K-848300K(4112384K), 0.028436 0 secs] [Times: user=0.32 sys=0.00, real=0.03 secs] Total time for which application threads were stopped: 0.0309160 seconds 2011-03-17T13:23:57.192+0300: 33536.387: [GC 33536.387: [ParNew: 680447K-24805K(737280K), 0.0299910 secs] 1503660K-855829K(4112384K), 0.030167 0 secs] [Times: user=0.29 sys=0.01, real=0.03 secs] Total time for which application threads were stopped: 0.0324200 seconds 2011-03-17T13:24:01.553+0300: 33540.748: [GC 33540.749: [ParNew: 680165K-31886K(737280K), 0.0495620 secs] 1511189K-936503K(4112384K), 0.049742 0 secs] [Times: user=0.57 sys=0.00, real=0.05 secs] Total time for which application threads were stopped: 0.0507030 seconds 2011-03-17T13:37:56.009+0300: 34375.204: [GC 34375.204: [ParNew: 687246K-28727K(737280K), 0.0244720 secs] 1591863K-942459K(4112384K), 0.024690 0 secs] [Times: user=0.18 sys=0.00, real=0.02 secs] Total time for which application threads were stopped: 806.7442720 seconds Total time for which application threads were stopped: 0.0006590 seconds Total time for which application threads were stopped: 0.0004360 seconds Total time for which application threads were stopped: 0.0004630 seconds Total time for which application threads were stopped: 0.0008120 seconds 2011-03-17T13:37:59.018+0300: 34378.213: [GC 34378.213: [ParNew: 676678K-21640K(737280K), 0.0137740 secs] 1590410K-949991K(4112384K), 0.013961 0 secs] [Times: user=0.13 sys=0.02, real=0.01 secs] Total time for which application threads were stopped: 0.0145920 seconds Total time for which application threads were stopped: 0.1036080 seconds Total time for which application threads were stopped: 0.0585600 seconds Total time for which application threads were stopped: 0.0600550 seconds Total time for which application threads were stopped: 0.0008560 seconds Total time for which application threads were stopped: 0.0006770 seconds Total time for which application threads were stopped: 0.0005910 seconds Total time for which application threads were stopped: 0.0351330 seconds Total time for which application threads were stopped: 0.0329020 seconds Total time for which application threads were stopped: 0.0728490 seconds Total time for which application threads were stopped: 0.0480990 seconds Total time for which application threads were stopped: 0.0804250 seconds 2011-03-17T13:38:04.394+0300: 34383.589: [GC 34383.589: [ParNew: 677000K-8375K(737280K), 0.0218310 secs] 1605351K-944271K(4112384K), 0.0220300 secs] I have follow nodetoll cfstats on hung node: Keyspace: fishdom_tuenti Read Count: 4970999 Read Latency: 1.0267005945887335 ms. Write Count: 1441619 Write Latency: 0.013146585887117193 ms. Pending Tasks: 0 Column Family: decor SSTable count: 3 Space used (live): 1296203532 Space used (total): 1302520037 Memtable Columns Count: 1066 Memtable Data Size: 121742 Memtable Switch Count: 11 Read Count: 108125 Read Latency: 2.809 ms. Write Count: 11261 Write Latency: 0.006 ms. Pending Tasks: 0 Key cache capacity: 30 Key cache size: 46470 Key cache
Re: Pauses of GC
Depending on your memtable thresholds the heap may be too small for the deployment. At the same time I don't see any other log statements around that long pause that you have shown in the log snippet. It looks little odd to me. All the ParNew collected almost same amount of heap and did not take lot of time. Check if it is due to some JVM bug. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6477891 -Naren On Thu, Mar 17, 2011 at 9:47 AM, ruslan usifov ruslan.usi...@gmail.comwrote: 2011/3/17 Narendra Sharma narendra.sha...@gmail.com What heap size are you running with? and Which version of Cassandra? 4G with cassandra 0.7.4
Re: Cassandra c++ client
libcassandra isn't vary active. Since we already has a object pool library, we went for using raw thrift in C++ instead of using any other library. Thanks, Naren On Wed, Mar 16, 2011 at 10:03 PM, Primal Wijesekera primalwijesek...@yahoo.com wrote: You could try this, https://github.com/posulliv/libcassandra - primal -- *From:* Anurag Gujral anurag.guj...@gmail.com *To:* user@cassandra.apache.org *Sent:* Wed, March 16, 2011 9:36:25 PM *Subject:* Cassandra c++ client Hi All, Anyone knows about stable C++ client for cassandra? Thanks Anurag
Re: Calculate memory used for keycache
Sometime back I looked at the code to find that out. Following is the result. There will be some additional overhead for internal DS for ConcurrentLinkedHashMap. Keycache size * (8 bytes for position i.e. value + X bytes for key + 16 bytes for token (RP) + 8 byte reference for DecoratedKey + 8 bytes for descriptor reference) Thanks, Naren On Mon, Mar 14, 2011 at 1:29 PM, ruslan usifov ruslan.usi...@gmail.comwrote: Hello How is it possible calculate this value? I think that key size, if we use RandomPartitioner will 16 bytes so keycache will took 16*(num of keycache elements) bytes ??
Re: calculating initial_token
On the same page there is a section on Load Balance that talks about python script to compute tokens. I believe your question is more about assigning new tokens and not compute tokens. 1. nodetool loadbalance will result in recomputation of tokens. It will pick tokens based on the load and not the once assigned by you. 2. You can either use decommission and bootstrap with new tokens OR Use nodetool move Thanks, Naren On Mon, Mar 14, 2011 at 1:18 PM, Sasha Dolgy sdo...@gmail.com wrote: Sorry for being a bit daft ... Wanted a bit of validation or rejection ... If I have a 6 node cluster, replication factor 2 (don't think this is applicable to the token decision) is the following sufficient and correct for determining the tokens: #!/bin/bash for nodes in {0..5}; do echo $nodes*(2^127/5) | bc; done Gives me a result of: 0 34028236692093846346337460743176821145 68056473384187692692674921486353642290 102084710076281539039012382229530463435 136112946768375385385349842972707284580 170141183460469231731687303715884105725 My ring right now is: 10.0.0.2 Up Normal 225 KB 40.78% 24053088190195663439419935163232881936 10.0.0.3Up Normal 201.21 KB 19.17% 56667357399723182105247119364967854254 10.0.0.4 Up Normal 213.15 KB 17.61% 86624712919272143003828971968762407027 10.0.0.5 Up Normal 214.54 KB 11.22% 105714724128406151241468359303513100912 10.0.0.6 Up Normal 206.39 KB 5.61% 115259729732973155360288052970888447854 10.0.0.7Up Normal 247.68 KB 5.61% 124804735337540159479107746638263794797 If my new tokens are correct: 1. cassandra.yaml is updated on each node with new token 2. node is restarted and a nodetool repair is run, or is a nodetool loadbalance run Thanks in advance ... been staring at http://wiki.apache.org/cassandra/Operations#Token_selection for too long -- Sasha Dolgy sasha.do...@gmail.com
Re: calculating initial_token
The %age (owns) is just the arc length in terms of %age of tokens a node owns out of the total token space. It doesn't reflect the actual data. The size (load) is the real current load. -Naren On Mon, Mar 14, 2011 at 2:59 PM, Sasha Dolgy sdo...@gmail.com wrote: ah, you know ... i have been reading it wrong. the output shows a nice fancy column called Owns but i've only ever seen the percentage ... the amount of data or load is even ... doh. thanks for the reply. cheers -sd On Mon, Mar 14, 2011 at 10:47 PM, Narendra Sharma narendra.sha...@gmail.com wrote: On the same page there is a section on Load Balance that talks about python script to compute tokens. I believe your question is more about assigning new tokens and not compute tokens. 1. nodetool loadbalance will result in recomputation of tokens. It will pick tokens based on the load and not the once assigned by you. 2. You can either use decommission and bootstrap with new tokens OR Use nodetool move Thanks, Naren On Mon, Mar 14, 2011 at 1:18 PM, Sasha Dolgy sdo...@gmail.com wrote: Sorry for being a bit daft ... Wanted a bit of validation or rejection ... If I have a 6 node cluster, replication factor 2 (don't think this is applicable to the token decision) is the following sufficient and correct for determining the tokens: #!/bin/bash for nodes in {0..5}; do echo $nodes*(2^127/5) | bc; done Gives me a result of: 0 34028236692093846346337460743176821145 68056473384187692692674921486353642290 102084710076281539039012382229530463435 136112946768375385385349842972707284580 170141183460469231731687303715884105725 My ring right now is: 10.0.0.2 Up Normal 225 KB 40.78% 24053088190195663439419935163232881936 10.0.0.3Up Normal 201.21 KB 19.17% 56667357399723182105247119364967854254 10.0.0.4 Up Normal 213.15 KB 17.61% 86624712919272143003828971968762407027 10.0.0.5 Up Normal 214.54 KB 11.22% 105714724128406151241468359303513100912 10.0.0.6 Up Normal 206.39 KB 5.61% 115259729732973155360288052970888447854 10.0.0.7Up Normal 247.68 KB 5.61% 124804735337540159479107746638263794797 If my new tokens are correct: 1. cassandra.yaml is updated on each node with new token 2. node is restarted and a nodetool repair is run, or is a nodetool loadbalance run Thanks in advance ... been staring at http://wiki.apache.org/cassandra/Operations#Token_selection for too long -- Sasha Dolgy sasha.do...@gmail.com -- Sasha Dolgy sasha.do...@gmail.com
Re: Does the memtable replace the old version of column with the new overwriting version or is it just a simple append ?
Multiple write for same key and column will result in overwriting of column in a memtable. Basically multiple updates for same (key, column) are reconciled based on the column's timestamp. This happens per memtable. So if a memtable is flushed to an sstable, this rule will be valid for the next memtable. Note that sstables are immutable. So, different sstables may have different versions of same (key, column), and the reconciliation of that happens during read (read repair). This is why reads are slower than writes because conflict resolution happens during read. Hope this answers the question! Thanks, -Naren On Tue, Mar 8, 2011 at 10:44 PM, Aditya Narayan ady...@gmail.com wrote: Do the overwrites of newly written columns(that are present in memtable) *replace the old column* or is it just a simple append. I am trying to understand that if I update these column very very frequently(while they are in memtable), does the read performance of these columns gets affected, since Cassandra will have to read so many versions of the same column. If this is just replacement with old column then I guess read will be much better since it needs to see just single existing version of column. Thanks Aditya Narayan
Re: OOM exceptions
I have been through tuning for GC and OOM recently. If you can provide the cassandra.yaml, I can help. Mostly I had to play with memtable thresholds. Thanks, Naren On Fri, Mar 4, 2011 at 12:43 PM, Mark static.void@gmail.com wrote: We have 7 column families and we are not using the default key cache (20). These were our initial settings so it was not in response to anything. Would you recommend anything else? Thanks On 3/4/11 12:34 PM, Chris Burroughs wrote: - Are you using a key cache? How many keys do you have? Across how many column families You configuration is unusual both in terms of not setting min heap == max heap and the percentage of available RAM used for the heap. Did you change the heap size in response to errors or for another reason? On 03/04/2011 03:25 PM, Mark wrote: This happens during compaction and we are not using the RowsCached attribute. Our initial/max heap are 2 and 6 respectively and we have 8 gigs in these machines. Thanks On 3/4/11 12:05 PM, Chris Burroughs wrote: - Does this occur only during compaction or at seemingly random times? - How large is your heap? What jvm settings are you using? How much physical RAM do you have? - Do you have the row and/or key cache enabled? How are they configured? How large are they when the OOM is thrown? On 03/04/2011 02:38 PM, Mark Miller wrote: Other than adding more memory to the machine is there a way to solve this? Please help. Thanks ERROR [COMPACTION-POOL:1] 2011-03-04 11:11:44,891 CassandraDaemon.java (line org.apache.cassandra.thrift.CassandraDaemon$1) Uncaught exception in thread Thread[COMPACTION-POOL:1,5,main] java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2798) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:111) at java.io.DataOutputStream.write(DataOutputStream.java:107) at java.io.FilterOutputStream.write(FilterOutputStream.java:97) at org.apache.cassandra.utils.FBUtilities.writeByteArray(FBUtilities.java:298) at org.apache.cassandra.db.ColumnSerializer.serialize(ColumnSerializer.java:66) at org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:311) at org.apache.cassandra.db.SuperColumnSerializer.serialize(SuperColumn.java:284) at org.apache.cassandra.db.ColumnFamilySerializer.serializeForSSTable(ColumnFamilySerializer.java:87) at org.apache.cassandra.db.ColumnFamilySerializer.serializeWithIndexes(ColumnFamilySerializer.java:99) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:140) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94) at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:294) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:101) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:82) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636)
Cassandra 0.7.2 - Enable/Disable HH via JMX (Jconsole)
I am unable to enable/disable HH via JMX (JConsole). Even though the load is on and read/writes happening, I don't see operations component on Jconsole. To clarify further, I see only Jconsole-MBeans-org.apache.cassandra.db.StorageProxy.Attributes. I don't see Jconsole-MBeans-org.apache.cassandra.db.StorageProxy.Operations. As a result I cannot operation like enable/disable HH. Is this is a bug or I am missing something? Thanks, Naren
Re: New thread for : How does Cassandra handle failure during synchronous writes
You are missing the point. The coordinator node that is handling the request won't wait for all the nodes to return their copy/digest of data. It just wait for Q (RF/2+1) nodes to return. This is the reason I explained two possible scenarios. Further, on what basis Cassandra will know that the data on N1 is result of a failure? Think about it!! Also, take a look at http://wiki.apache.org/cassandra/API. Following is from Cassandra wiki: Because the repair replication process only requires a write to reach a single node to propagate, a write which 'fails' to meet consistency requirements will still appear eventually so long at it was written to at least one node. With W and R both using QUORUM, the best consistency we can achieve is the guarantee that we will receive the same value regardless of which nodes we read from. However, we can still peform a W=QUORUM that fails but reaches one server, perform a R=QUORUM that reads the old value, and then sometime later perform a R=QUORUM that reads the new value. Hope this make things very clear! On Thu, Feb 24, 2011 at 4:47 AM, Anthony John chirayit...@gmail.com wrote: c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data that was written to node1 will be returned. In this case - N1 will be identified as a discrepancy and the change will be discarded via read repair [Naren] How will Cassandra know this is a discrepancy? Because at Q - only N1 will have the new data and other other nodes won't. This lack of consistency on N! will be detected and repaired. The value that meets Q - the values from N2-3 - will be returned. HTH
Re: dropped mutations, UnavailableException, and long GC
1. Why 24GB of heap? Do you need this high heap? Bigger heap can lead to longer GC cycles but 15min look too long. 2. Do you have ROW cache enabled? 3. How many column families do you have? 4. Enable GC logs and monitor what GC is doing to get idea of why it is taking so long. You can add following to enable gc log. # GC logging options -- uncomment to enable # JVM_OPTS=$JVM_OPTS -XX:+PrintGCDetails # JVM_OPTS=$JVM_OPTS -XX:+PrintGCTimeStamps # JVM_OPTS=$JVM_OPTS -XX:+PrintClassHistogram # JVM_OPTS=$JVM_OPTS -XX:+PrintTenuringDistribution # JVM_OPTS=$JVM_OPTS -XX:+PrintGCApplicationStoppedTime # JVM_OPTS=$JVM_OPTS -Xloggc:/var/log/cassandra/gc.log 5. Move to Cassandra 0.7.2, if possible. It has following nice feature: added flush_largest_memtables_at and reduce_cache_sizes_at options to cassandra.yaml as an escape value for memory pressure Thanks, Naren On Thu, Feb 24, 2011 at 2:21 PM, Jeffrey Wang jw...@palantir.com wrote: Hey all, Our setup is 5 machines running Cassandra 0.7.0 with 24GB of heap and 1.5TB disk each collocated in a DC. We’re doing bulk imports from each of the nodes with RF = 2 and write consistency ANY (write perf is very important). The behavior we’re seeing is this: - Nodes often see each other as dead even though none of the nodes actually go down. I suspect this may be due to long GCs. It seems like increasing the RPC timeout could help this, but I’m not convinced this is the root of the problem. Note that in this case writes return with the UnavailableException. - As mentioned, long GCs. We see the ParNew GC doing a lot of smaller collections (few hundred MB) which are very fast (few hundred ms), but every once in a while the ConcurrentMarkSweep will take a LONG time (up to 15 min!) to collect upwards of 15GB at once. - On some nodes, we see a lot of pending MutationStages build up (e.g. 500K), which leads to the messages “Dropped X MUTATION messages in the last 5000ms,” presumably meaning that Cassandra has decided to not write one of the replicas of the data. This is not a HUGE deal, but is less than ideal. - The end result is that a bunch of writes end up failing due to the UnavailableExceptions, so not all of our data is getting into Cassandra. So my question is: what is the best way to avoid this behavior? Our memtable thresholds are fairly low (256MB) so there should be plenty of heap space to work with. We may experiment with write consistency ONE or ALL to see if the perf hit is not too bad, but I wanted to get some opinions on why this might be happening. Thanks! -Jeffrey
Changing comparators
Today it is not possible to change the comparators (compare_with and compare_subcolumns_with). I went through the discussion on thread http://comments.gmane.org/gmane.comp.db.cassandra.user/12466. Does it make sense to atleast allow one way change i.e. from specific types to generic type? For eg change from TimeUUIDType or UTF8 to BytesType. This could be a manual process where users will do the schema change and then run major compaction on all the nodes to fix the ordering. Thanks, Naren
Re: How does Cassandra handle failure during synchronous writes
Remember the simple rule. Column with highest timestamp is the one that will be considered correct EVENTUALLY. So consider following case: Cluster size = 3 (say node1, node2 and node3), RF = 3, Read/Write CL = QUORUM a. QUORUM in this case requires 2 nodes. Write failed with successful write to only 1 node say node1. b. Read with CL = QUORUM. If read hits node2 and node3, old data will be returned with read repair triggered in background. On next read you will get the data that was written to node1. c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data that was written to node1 will be returned. HTH! Thanks, Naren On Wed, Feb 23, 2011 at 3:36 PM, Ritesh Tijoriwala tijoriwala.rit...@gmail.com wrote: Hi Anthony, I am not talking about the case of CL ANY. I am talking about the case where your consistency level is R + W N and you want to write to W nodes but only succeed in writing to X ( where X W) nodes and hence fail the write to the client. thanks, Ritesh On Wed, Feb 23, 2011 at 2:48 PM, Anthony John chirayit...@gmail.comwrote: Ritesh, At CL ANY - if all endpoints are down - a HH is written. And it is a successful write - not a failed write. Now that does not guarantee a READ of the value just written - but that is a risk that you take when you use the ANY CL! HTH, -JA On Wed, Feb 23, 2011 at 4:40 PM, Ritesh Tijoriwala tijoriwala.rit...@gmail.com wrote: hi Anthony, While you stated the facts right, I don't see how it relates to the question I ask. Can you elaborate specifically what happens in the case I mentioned above to Dave? thanks, Ritesh On Wed, Feb 23, 2011 at 1:57 PM, Anthony John chirayit...@gmail.comwrote: Seems to me that the explanations are getting incredibly complicated - while I submit the real issue is not! Salient points here:- 1. To be guaranteed data consistency - the writes and reads have to be at Quorum CL or more 2. Any W/R at lesser CL means that the application has to handle the inconsistency, or has to be tolerant of it 3. Writing at ANY CL - a special case - means that writes will always go through (as long as any node is up), even if the destination nodes are not up. This is done via hinted handoff. But this can result in inconsistent reads, and yes that is a problem but refer to pt-2 above 4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to handle that case where a particular node is down and the write needs to be replicated to it. But this will not cause inconsistent R as the hinted handoff (in this case) only applies after Quorum is met - so a Quorum R is not dependent on the down node being up, and having got the hint. Hope I state this appropriately! HTH, -JA On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala tijoriwala.rit...@gmail.com wrote: Read repair will probably occur at that point (depending on your config), which would cause the newest value to propagate to more replicas. Is the newest value the quorum value which means it is the old value that will be written back to the nodes having newer non-quorum value or the newest value is the real new value? :) If later, than this seems kind of odd to me and how it will be useful to any application. A bug? Thanks, Ritesh On Wed, Feb 23, 2011 at 12:43 PM, Dave Revell d...@meebo-inc.comwrote: Ritesh, You have seen the problem. Clients may read the newly written value even though the client performing the write saw it as a failure. When the client reads, it will use the correct number of replicas for the chosen CL, then return the newest value seen at any replica. This newest value could be the result of a failed write. Read repair will probably occur at that point (depending on your config), which would cause the newest value to propagate to more replicas. R+WN guarantees serial order of operations: any read at CL=R that occurs after a write at CL=W will observe the write. I don't think this property is relevant to your current question, though. Cassandra has no mechanism to roll back the partial write, other than to simply write again. This may also fail. Best, Dave On Wed, Feb 23, 2011 at 10:12 AM, tijoriwala.rit...@gmail.comwrote: Hi Dave, Thanks for your input. In the steps you mention, what happens when client tries to read the value at step 6? Is it possible that the client may see the new value? My understanding was if R + W N, then client will not see the new value as Quorum nodes will not agree on the new value. If that is the case, then its alright to return failure to the client. However, if not, then it is difficult to program as after every failure, you as an client are not sure if failure is a pseudo failure with some side effects or real failure. Thanks, Ritesh quote author='Dave Revell' Ritesh, There is no commit protocol. Writes may be persisted on some replicas even though the quorum fails. Here's a sequence of
Re: How does Cassandra handle failure during synchronous writes
c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data that was written to node1 will be returned. In this case - N1 will be identified as a discrepancy and the change will be discarded via read repair [Naren] How will Cassandra know this is a discrepancy? On Wed, Feb 23, 2011 at 6:05 PM, Anthony John chirayit...@gmail.com wrote: Remember the simple rule. Column with highest timestamp is the one that will be considered correct EVENTUALLY. So consider following case: I am sorry, that will return inconsistent results even a Q. Time stamp have nothing to do with this. It is just an application provided artifact and could be anything. c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data that was written to node1 will be returned. In this case - N1 will be identified as a discrepancy and the change will be discarded via read repair On Wed, Feb 23, 2011 at 6:47 PM, Narendra Sharma narendra.sha...@gmail.com wrote: Remember the simple rule. Column with highest timestamp is the one that will be considered correct EVENTUALLY. So consider following case: Cluster size = 3 (say node1, node2 and node3), RF = 3, Read/Write CL = QUORUM a. QUORUM in this case requires 2 nodes. Write failed with successful write to only 1 node say node1. b. Read with CL = QUORUM. If read hits node2 and node3, old data will be returned with read repair triggered in background. On next read you will get the data that was written to node1. c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data that was written to node1 will be returned. HTH! Thanks, Naren On Wed, Feb 23, 2011 at 3:36 PM, Ritesh Tijoriwala tijoriwala.rit...@gmail.com wrote: Hi Anthony, I am not talking about the case of CL ANY. I am talking about the case where your consistency level is R + W N and you want to write to W nodes but only succeed in writing to X ( where X W) nodes and hence fail the write to the client. thanks, Ritesh On Wed, Feb 23, 2011 at 2:48 PM, Anthony John chirayit...@gmail.comwrote: Ritesh, At CL ANY - if all endpoints are down - a HH is written. And it is a successful write - not a failed write. Now that does not guarantee a READ of the value just written - but that is a risk that you take when you use the ANY CL! HTH, -JA On Wed, Feb 23, 2011 at 4:40 PM, Ritesh Tijoriwala tijoriwala.rit...@gmail.com wrote: hi Anthony, While you stated the facts right, I don't see how it relates to the question I ask. Can you elaborate specifically what happens in the case I mentioned above to Dave? thanks, Ritesh On Wed, Feb 23, 2011 at 1:57 PM, Anthony John chirayit...@gmail.comwrote: Seems to me that the explanations are getting incredibly complicated - while I submit the real issue is not! Salient points here:- 1. To be guaranteed data consistency - the writes and reads have to be at Quorum CL or more 2. Any W/R at lesser CL means that the application has to handle the inconsistency, or has to be tolerant of it 3. Writing at ANY CL - a special case - means that writes will always go through (as long as any node is up), even if the destination nodes are not up. This is done via hinted handoff. But this can result in inconsistent reads, and yes that is a problem but refer to pt-2 above 4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to handle that case where a particular node is down and the write needs to be replicated to it. But this will not cause inconsistent R as the hinted handoff (in this case) only applies after Quorum is met - so a Quorum R is not dependent on the down node being up, and having got the hint. Hope I state this appropriately! HTH, -JA On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala tijoriwala.rit...@gmail.com wrote: Read repair will probably occur at that point (depending on your config), which would cause the newest value to propagate to more replicas. Is the newest value the quorum value which means it is the old value that will be written back to the nodes having newer non-quorum value or the newest value is the real new value? :) If later, than this seems kind of odd to me and how it will be useful to any application. A bug? Thanks, Ritesh On Wed, Feb 23, 2011 at 12:43 PM, Dave Revell d...@meebo-inc.comwrote: Ritesh, You have seen the problem. Clients may read the newly written value even though the client performing the write saw it as a failure. When the client reads, it will use the correct number of replicas for the chosen CL, then return the newest value seen at any replica. This newest value could be the result of a failed write. Read repair will probably occur at that point (depending on your config), which would cause the newest value to propagate to more replicas. R+WN guarantees serial order of operations: any read at CL=R that occurs after a write at CL=W will observe the write. I don't think
Does HH work (or make sense) for counters?
Version: Cassandra 0.7.1 (build from trunk) Setup: - Cluster of 2 nodes (Say A and B) - HH enabled - Using the default Keyspace definition in cassandra.yaml - Using SuperCounter1 CF Client: - Using CL of ONE I started the two Cassandra nodes, created schema and then shutdown one of the instances (say B). Executed counter update and read operations on A with CL=ONE. Everything worked fine. All counters were returned with correct values. Now started node B, waited for couple of mins. Executed only counter read operation on B with CL=ONE. Initially got no counters for any of the rows. On second (and subsequent tries) try got counters for only one (same row always) out of ten rows. After doing one read with CL=QUORUM, reads with CL=ONE started returning correct data. Thanks, Naren
sstable2json for SuperCounter CF not working
Version: Cassandra 0.7.1 (build from trunk) Setup: - Cluster of 2 nodes (Say A and B) - HH enabled - Using the default Keyspace definition in cassandra.yaml - Using SuperCounter1 CF Steps: - Started the two nodes, loaded schema using nodetool - Executed counter update and read operations on A with CL=ONE. Everything worked fine. All counters were returned with correct values. - Using nodetool flush, flushed the memtable to sstable - Used sstable2json on the sstable and got following exception: [root@msg-qelnx01-v14 bin]# ./sstable2json ../../cassandra071/data/Keyspace1/SuperCounter1-f-1-Data.db WARN 11:38:45,081 Schema definitions were defined both locally and in cassandra.yaml. Definitions in cassandra.yaml were ignored. { 62626232: { 787832: {deletedAt: -9223372036854775808, subColumns: [[616464636f756e74, Exception in thread main org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes at org.apache.cassandra.db.marshal.CounterColumnType.getString(CounterColumnType.java:57) at org.apache.cassandra.tools.SSTableExport.serializeColumns(SSTableExport.java:100) at org.apache.cassandra.tools.SSTableExport.serializeRow(SSTableExport.java:153) at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:296) at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:330) at org.apache.cassandra.tools.SSTableExport.export(SSTableExport.java:343) at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:400) Thanks, Naren
Question on max_hint_window_in_ms
As per config: # this defines the maximum amount of time a dead host will have hints # generated. After it has been dead this long, hints will be dropped. max_hint_window_in_ms: 360 # one hour Will this result in deletion of existing hints (from mem and disk)? or it will just stop creating new hints? Thanks, Naren
EOFException in ReadStage
Version: Cassandra 0.7.1 I am seeing following exception at regular interval (very frequently) in Cassandra. I did a clean install of Cassandra 0.7.1 and deleted all old data. Any idea what could be the cause? The stack is same for all the occurrances. Thanks, Naren ERROR [ReadStage:11232] 2011-01-28 20:19:09,671 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[ReadStage:11232,5,main] java.io.IOError: java.io.EOFException at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:75) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1267) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1159) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1088) at org.apache.cassandra.db.Table.getRow(Table.java:384) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:69) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:70) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:48) at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30) at org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:108) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:71)
Re: Using Cassandra for storing large objects
Thanks Anand. Few questions: - What is the size of nodes (in terms for data)? - How long have you been running? - Howz compaction treating you? Thanks, Naren On Thu, Jan 27, 2011 at 12:13 PM, Anand Somani meatfor...@gmail.com wrote: Using it for storing large immutable objects, like Aaron was suggesting we are splitting the blob across multiple columns. Also we are reading it a few columns at a time (for memory considerations). Currently we have only gone upto about 300-400KB size objects. We do have machines with 32Gb memory and with 8G for java. Row cache is disabled. There is some latency that needs to be sorted out, but overall I am positive. This is with 6.6, am in the process of moving it to 0.7. On Wed, Jan 26, 2011 at 11:37 PM, Narendra Sharma narendra.sha...@gmail.com wrote: Anyone using Cassandra for storing large number (millions) of large (mostly immutable) objects (200KB-5MB size each)? I would like to understand the experience in general considering that Cassandra is not considered a good fit for large objects. https://issues.apache.org/jira/browse/CASSANDRA-265 Thanks, Naren
Re: Using Cassandra for storing large objects
Thanks Anand. Let's keep exchanging our experiences. -Naren On Thu, Jan 27, 2011 at 8:50 PM, Anand Somani meatfor...@gmail.com wrote: At this point we are not in production, in the lab only. The longest test so far has been about 2-3 days, the datasize at this point is about 2-3 TB per node, we have 2 nodes. We do see spikes to high response times (and timeouts), which seemed to be around the time GC kicks in. We were pushing the system as much as we can. Also given our application we can do major compactions at night, have not tried it on this big data set yet. We do still have minor compactions turned on. On Thu, Jan 27, 2011 at 12:56 PM, Narendra Sharma narendra.sha...@gmail.com wrote: Thanks Anand. Few questions: - What is the size of nodes (in terms for data)? - How long have you been running? - Howz compaction treating you? Thanks, Naren On Thu, Jan 27, 2011 at 12:13 PM, Anand Somani meatfor...@gmail.comwrote: Using it for storing large immutable objects, like Aaron was suggesting we are splitting the blob across multiple columns. Also we are reading it a few columns at a time (for memory considerations). Currently we have only gone upto about 300-400KB size objects. We do have machines with 32Gb memory and with 8G for java. Row cache is disabled. There is some latency that needs to be sorted out, but overall I am positive. This is with 6.6, am in the process of moving it to 0.7. On Wed, Jan 26, 2011 at 11:37 PM, Narendra Sharma narendra.sha...@gmail.com wrote: Anyone using Cassandra for storing large number (millions) of large (mostly immutable) objects (200KB-5MB size each)? I would like to understand the experience in general considering that Cassandra is not considered a good fit for large objects. https://issues.apache.org/jira/browse/CASSANDRA-265 Thanks, Naren
Using Cassandra for storing large objects
Anyone using Cassandra for storing large number (millions) of large (mostly immutable) objects (200KB-5MB size each)? I would like to understand the experience in general considering that Cassandra is not considered a good fit for large objects. https://issues.apache.org/jira/browse/CASSANDRA-265 Thanks, Naren
Re: get_range_slices getting deleted rows
Yes. See this http://wiki.apache.org/cassandra/FAQ#range_ghosts -Naren On Tue, Jan 25, 2011 at 2:59 PM, Nick Santini nick.sant...@kaseya.comwrote: Hi, I'm trying a test scenario where I create 100 rows in a CF, then use get_range_slices to get all the rows, and I get 100 rows, so far so good then after the test I delete the rows using remove but without a column or super column, this deletes the row, I can confirm that cos if I try to get it with get_slice using the key I get nothing but then if I do get_range_slice again, where the range goes between new byte[0] and new byte[0] (therefore returning everything), I still get the 100 row keys is that expected to be? thanks Nicolas Santini
Re: cassandra 0.7.0 noob question
The schema is not loaded from cassandra.yaml by default. You need to either load it through jconsole or define it through CLI. Please read following page for details: http://wiki.apache.org/cassandra/LiveSchemaUpdates Also look for Where are my keyspaces on following page: http://wiki.apache.org/cassandra/StorageConfiguration Thanks, Naren On Thu, Jan 6, 2011 at 2:00 PM, felix gao gre1...@gmail.com wrote: Hi all, I started cassandra with very thing untouched in the conf folder, when I examine the cassandra.yaml file, there seems to be a default keyspace defined like below. keyspaces: - name: Keyspace1 replica_placement_strategy: org.apache.cassandra.locator.SimpleStrategy replication_factor: 1 column_families: - name: Standard1 my question is when I ran the cassandra-cli and show keyspaces; only system keyspace is there. What is going on? Thanks, Felix
Re: quick question about super columns
With raw thrift APIs: 1. Fetch column from supercolumn: ColumnPath cp = new ColumnPath(ColumnFamily); cp.setSuper_column(SuperColumnName); cp.setColumn(ColumnName); ColumnOrSuperColumn resp = client.get(getByteBuffer(RowKey), cp, ConsistencyLevel.ONE); Column c = resp.getColumn(); 2. Add a new supercolumn: SuperColumn superColumn = new SuperColumn(); superColumn.setName(getBytes(SuperColumnName)); cols = new ArrayListColumn(); Column c = new Column(); c.setName(name); c.setValue(value); c.setTimestamp(timeStamp); cols.add(c); //repeat above 5 lines for as many cols you want in supercolumn superColumn.setColumns(cols); ListMutation mutations = new ArrayListMutation(); ColumnOrSuperColumn csc = new ColumnOrSuperColumn(); csc.setSuper_column(superColumn); csc.setSuper_columnIsSet(true); Mutation m = new Mutation(); m.setColumn_or_supercolumn(csc); m.setColumn_or_supercolumnIsSet(true); mutations.add(m); MapString, ListMutation allMutations = new HashMapString, ListMutation(); allMutations.put(ColumnFamilyName, mutations); MapByteBuffer, MapString, ListMutation mutationMap = new HashMapByteBuffer, MapString, ListMutation(); mutationMap.put(getByteBuffer(RowKey), mutations); client.batch_mutate(mutationMap, ConsistencyLevel.ONE); HTH! Thanks, Naren On Thu, Jan 6, 2011 at 10:42 PM, Arijit Mukherjee ariji...@gmail.comwrote: Thank you. And is it similar if I want to search a subcolumn within a given supercolumn? I mean I have the supercolumn key and the subcolumn key - can I fetch the particular subcolumn? Can you share a small piece of example code for both? I'm still new into this and trying to figure out the Thrift APIs. I attempted to use Hector, but got myself into more confusion. Arijit On 7 January 2011 11:44, Roshan Dawrani roshandawr...@gmail.com wrote: On Fri, Jan 7, 2011 at 11:39 AM, Arijit Mukherjee ariji...@gmail.com wrote: Hi I've a quick question about supercolumns. EventRecord = { eventKey2: { e2-ts1: {set of columns}, e2-ts2: {set of columns}, ... e2-tsn: {set of columns} } } If I want to append another e2-tsp: {set of columns} to the event record keyed by eventKey2, do I need to retrieve the entire eventKey2 map, and then append this new row and re-insert eventKey2? No, you can simply insert a new super column with its sub-columns with the rowKey that you want, and it will join the other super columns of that row. A row have billions of super columns. Imagine fetching them all, just to add one more super column into it. -- And when the night is cloudy, There is still a light that shines on me, Shine on until tomorrow, let it be.
Re: Any GUI for Cassandra database on Windows?
cassandra-gui doesn't work with Cassandra 0.7. It could be due to thrift version difference, api differences or default framed mode. Better to switch to something that works for sure. Thanks, -Naren On Mon, Dec 27, 2010 at 9:15 PM, Roshan Dawrani roshandawr...@gmail.comwrote: Sorry. Will do that. I am using Cassandra 0.7.0-rc2. I will try this DB client. Thanks. On Tue, Dec 28, 2010 at 10:41 AM, Narendra Sharma narendra.sha...@gmail.com wrote: Please do mention the Cassandra version you are using in all ur queries. It helps. Try https://github.com/driftx/chiton Thanks, Naren On Mon, Dec 27, 2010 at 7:37 PM, Roshan Dawrani roshandawr...@gmail.comwrote: Hi, Is there a GUI client for a Cassandra database for a Windows based setup? I tried the one available at http://code.google.com/p/cassandra-gui/, but it always fails to connect with error: Cannot read. Remote site has closed. Tried to read 4 bytes, but only got 0 bytes. -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawrani http://twitter.com/roshandawrani Skype: roshandawrani
Re: I have TimeUUID sorted keys. Can I get the range query return rows in the same order as sorted keys?
You will need to use OPP to perform range scans. Look for Range Queries on http://wiki.apache.org/cassandra/DataModel Look at this to understand why range queries are not supported for RamdomPartitioner (https://issues.apache.org/jira/browse/CASSANDRA-1750) Thanks, Naren On Mon, Dec 27, 2010 at 8:35 AM, Roshan Dawrani roshandawr...@gmail.comwrote: I had seen RangeSlicesQuery, but I didn't notice that I could also give a key range there. How does a KeyRange work? Doesn't it need some sort from the partitioner - whether that is order preserving or not? I couldn't be sure of a query that was based on order of the rows in the column family, so I didn't explore that much. On Mon, Dec 27, 2010 at 9:55 PM, Narendra Sharma narendra.sha...@gmail.com wrote: Did you look at get_range_slices? Once you get the columns from super column, pick the first and last to form the range and fire the get_range_slice. Thanks, -Naren On Mon, Dec 27, 2010 at 6:12 AM, Roshan Dawrani roshandawr...@gmail.comwrote: This silly question is retrieved back with apology. There couldn't be anything easier to handle at the application level. rgds, Roshan On Mon, Dec 27, 2010 at 9:04 AM, Roshan Dawrani roshandawr...@gmail.com wrote: Hi, I have the following 2 column families - one being used to store full rows for an entity and other is an index table for having the TimeUUID sorted row keys. I am able to query the TimeUUID columns under the super column fine. But now I need to go to main CF and get the data and I want the rows in the same time order as the keys. I am using MultiGetSliceQuery to query the main entity data for the sorted keys, but the rows don't come back in the same order, which defeats the purpose of storing the time sorted subcolumns. I suppose for each key, I can fire an individual SliceQuery, but that does not look efficient to me. I do want to fire a range query. MainEntityCF { TimeUUIDKeyA: [Col1 : Val1, Col2 : Val2, Col3 : Val3] TimeUUIDKeyX: [Col1 : Val1, Col2 : Val2, Col3 : Val3] TimeUUIDKeyB: [Col1 : Val1, Col2 : Val2, Col3 : Val3] TimeUUIDKeyY: [Col1 : Val1, Col2 : Val2, Col3 : Val3] } MainEntityCF_Index { SomeSuperColumn: [TimeUUIDKeyA:null, TimeUUIDKeyB:null, TimeUUIDKeyX:null, TimeUUIDKeyY:null] } -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawrani http://twitter.com/roshandawrani Skype: roshandawrani -- Roshan Blog: http://roshandawrani.wordpress.com/ Twitter: @roshandawrani http://twitter.com/roshandawrani Skype: roshandawrani
Re: Supercolumn Maximums
#1 - No limit #2 - If you are referring to secondary indexes then NO. Also see https://issues.apache.org/jira/browse/CASSANDRA-598 #3 - No limit Following are key limitations: 1. All data for a single row must fit (on disk) on a single machine in the cluster 2. A single column value may not be larger than 2GB. See more on: http://wiki.apache.org/cassandra/CassandraLimitations -Naren On Mon, Dec 27, 2010 at 9:01 PM, David G. Boney dbon...@semanticartifacts.com wrote: 1. What are the maximum number of supercolumns that a row can have? 2. Are supercolumns indexed? 3. What are the maximum number of subcolumns in a supercolumn? - Sincerely, David G. Boney dbon...@semanticartifacts.com http://www.semanticartifacts.com
Re: Cassandra 0.7 - Impact of row size and columns on compaction
This is very useful. Thanks Aaron! -Naren On Sun, Dec 5, 2010 at 12:35 PM, Aaron Morton aa...@thelastpickle.comwrote: AFAIK if the entire row can be read into memory the compaction will be faster. The in_memory_compaction_limit_in_mb setting is used to decide how big the row can be before it has to use a slower two pass process. Also my understanding is that one of the main factors for compaction is the number of over-writes for rows / columns. e.g if the data for a row is spread over a lot of ss tables (for new columns and/or updates and/or deletes) it will take longer to compact that row. Hope that helps. Aaron On 04 Dec, 2010,at 09:23 AM, Narendra Sharma narendra.sha...@gmail.com wrote: What is the impact (performance and I/O) of row size (in bytes) on compaction? What is the impact (performance and I/O) of number of super columns and columns on compaction? Does anyone has any details and data to share? Thanks, Naren
Cassandra 0.7 - Impact of row size and columns on compaction
What is the impact (performance and I/O) of row size (in bytes) on compaction? What is the impact (performance and I/O) of number of super columns and columns on compaction? Does anyone has any details and data to share? Thanks, Naren
Fetch a SuperColumn based on value of column
Hi, My schema has a row that has thousands of Super Columns. The size of each super column is around 500B (20 columns). I need to query 1 SuperColumn based on value of one of its column. Something like SELECT SuperColumn FROM Row WHERE SuperColumn.column=value Questions: 1. Is this possible with current Cassandra APIs? If yes, could you please show with a sample. 2. How would such a query perform if the number of SuperColumns is high ( 10K)? Cassandra version 0.7. Thanks, Naren
Re: Fetch a SuperColumn based on value of column
Thanks Aaron! The first request requires you to know the SuperColumn name. In my case I don't know the SuperColumn name cause if I know then I can read the super column. I need to find the SuperColumn that has column with given value for a given column. The usecase is that application allows querying object by two attributes. I have made one of the attribute as Supercolumn name. I need to keep the second attribute as subcolumn in super column. Now I need to perform search by subcolumn. I think the only option is to maintain another CF with column name as the second attribute with value as the name of super column in current CF. Is there any better way to handle this? Thanks, Naren On Thu, Dec 2, 2010 at 5:48 PM, Aaron Morton aa...@thelastpickle.comwrote: You can use column and super column names with the get_slice() function without 0.7 secondary indexes. I'm assuming that the original query was to test for the existence of a column by name. In the case below, to retrieve the full super column would require to request... First to test the condition. get_slice with a ColumnParent that specifies the CF and the Super Column and a slice predicate with the column_names[] containing the name of the col you want. This query would only return the one column. If you then wanted to get all columns in the super column you would make another request. If making two requests is a pain or too slow, consider changing the data model to better support the requests you need to make. AFAIK a lot of super columns will not impact performance any more than a lot of column. There are however limitations to the number of columns in a super column http://wiki.apache.org/cassandra/CassandraLimitations http://wiki.apache.org/cassandra/CassandraLimitations Hope that helps. Aaron On 03 Dec, 2010,at 01:10 PM, Nick Santini nick.sant...@kaseya.com wrote: actually, the solution would be something like my last mail, but pointing to the name of the super column and the row key Nicolas Santini Director of Cloud Computing Auckland - New Zealand (64) 09 914 9426 ext 2629 (64) 021 201 3672 On Fri, Dec 3, 2010 at 1:08 PM, Nick Santini nick.sant...@kaseya.comwrote: Hi, as I got answered on my mail, secondary indexes for super column families is not supported yet, so you have to implement your own easy way: keep another column family where the row key is the value of your field and the columns are the row keys of your super column family (inverted index) Nicolas Santini Director of Cloud Computing Auckland - New Zealand (64) 09 914 9426 ext 2629 (64) 021 201 3672 On Fri, Dec 3, 2010 at 1:00 PM, Narendra Sharma narendra.sha...@gmail.com wrote: Hi, My schema has a row that has thousands of Super Columns. The size of each super column is around 500B (20 columns). I need to query 1 SuperColumn based on value of one of its column. Something like SELECT SuperColumn FROM Row WHERE SuperColumn.column=value Questions: 1. Is this possible with current Cassandra APIs? If yes, could you please show with a sample. 2. How would such a query perform if the number of SuperColumns is high ( 10K)? Cassandra version 0.7. Thanks, Naren
C++ client for Cassandra
Are there any C++ clients out there similar to Hector (in terms of features) for Cassandra? I am looking for C++ Client for Cassandra 0.7. Thanks, Naren
batch_mutate vs number of write operations on CF
Hi, I am using Cassandra 0.7 beta3 and Hector. I create a mutation map. The mutation involves adding few columns for a given row. After that I use batch_mutate API to send the changes to Cassandra. Question: If there are multiple column writes on same row in a mutation_map, does Cassandra show (on JMX write count stats for CF) that as 1 write operation or as N write operations where N is the number of entries in mutation map for that row. Assume all the changes in mutation map are for one row. Thanks, Naren
Cassandra 0.7 - documentation on Secondary Indexes
Is there any documentation available on what is possible with secondary indexes? For eg - Is it possible to define secondary index on columns within a SuperColumn? - If I define a secondary index at run time, does Cassandra index all the existing data or only new data is indexed? Some documentation along with examples will be highly useful. Thanks, Naren
Re: Cassandra 0.7 - documentation on Secondary Indexes
Thanks Jonathan. Couple of more questions: 1. Is there any technical limit on the number of secondary indexes that can be created? 2. Is it possible to execute join queries spanning multiple secondary indexes? Thanks, Naren On Mon, Nov 29, 2010 at 6:02 PM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Nov 29, 2010 at 7:59 PM, Narendra Sharma narendra.sha...@gmail.com wrote: Is there any documentation available on what is possible with secondary indexes? Not yet. - Is it possible to define secondary index on columns within a SuperColumn? No. - If I define a secondary index at run time, does Cassandra index all the existing data or only new data is indexed? The former. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Cassandra 0.7 - documentation on Secondary Indexes
On Mon, Nov 29, 2010 at 9:32 PM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Nov 29, 2010 at 11:26 PM, Narendra Sharma narendra.sha...@gmail.com wrote: Thanks Jonathan. Couple of more questions: 1. Is there any technical limit on the number of secondary indexes that can be created? Just as with traditional databases, the more indexes there are the slower writes to that CF will be. 2. Is it possible to execute join queries spanning multiple secondary indexes? What do secondary indexes have to do with joins? For eg if I want to get all employees that are male and have age = 35 years. How can secondary indexes be useful in such scenario? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
working of get_range_slices
Hi, I am using Cassandra 0.6.5. Our application uses the get_range_slices to get rows in the given range. Could someone please explain how get_range_slices works internally esp when a count parameter (value = 1) is also specified in the SlicePredicate? Does Cassandra first search all in the given range and then return top 1 or it some how reads only 1 and return them? What is the performance I/O impact if we pass start key = end key in the SlicePredicate? Will it perform better than passing a range as [Start key,] with count = 1? Thanks, Naren
Re: working of get_range_slices
Thanks Jonathan. Another related question is if I need to fetch only 1 row then what will be the difference between the performance of get_slice vs get_range_slices. The reason for this question is that we are using some code that uses get_range_slices. We have option of forcing it to use count=1 with get_range_slices or change the code to use get_slice. What would you recommend? What will be the net gain on the Cassandra side in computing the result? Thanks, Naren On Thu, Oct 14, 2010 at 11:12 AM, Jonathan Ellis jbel...@gmail.com wrote: get_range_slices never does searching. the performance of those two predicates is equivalent, assuming a row start key actually exists. On Thu, Oct 14, 2010 at 1:09 PM, Narendra Sharma narendra.sha...@gmail.com wrote: Hi, I am using Cassandra 0.6.5. Our application uses the get_range_slices to get rows in the given range. Could someone please explain how get_range_slices works internally esp when a count parameter (value = 1) is also specified in the SlicePredicate? Does Cassandra first search all in the given range and then return top 1 or it some how reads only 1 and return them? What is the performance I/O impact if we pass start key = end key in the SlicePredicate? Will it perform better than passing a range as [Start key,] with count = 1? Thanks, Naren -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Retaining commit logs
Cassandra Version: 0.6.5 I am running a long duration test and I need to keep the commit log to see the sequence of operations to debug few application issues. Is it possible to retain the commit logs? Apart from increasing the value of CommitLogRotationThresholdInMB what is the other way to achieve this? The commit logs are deleted when Memtable is flushed. Thanks, Naren
Re: Query on sstable2json - possible bug
Has any one used sstable2json on 0.6.5 and noticed the issue I described in my email below? This doesn't look like data corruption issue as sstablekeys shows the keys. Thanks, Naren On Tue, Oct 5, 2010 at 8:09 PM, Narendra Sharma narendra.sha...@gmail.comwrote: 0.6.5 -Naren On Tue, Oct 5, 2010 at 6:56 PM, Jonathan Ellis jbel...@gmail.com wrote: Version? On Tue, Oct 5, 2010 at 7:28 PM, Narendra Sharma narendra.sha...@gmail.com wrote: Hi, I am using sstable2json to extract row data for debugging some application issue. I first ran sstablekeys to find the list of keys in the sstable. Then I use the key to fetch row from sstable. The sstable is from Lucandra deployment. I get following. -bash-3.2$ ./sstablekeys Documents-37-Data.db | more jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec jhwKcHZx���120fc562-cf9f-4204-963d-0ed0d8cd2d09 jhwKcHZx���93d78bce-7713-4ff9-bc83-b02663a1a55c jhwKcHZx���e6f6f5ef-a09f-4e84-9727-56867e81be00 jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76 jqCF6zxM���917f66a6-7a95-4789-82ca-aaa511f6b56e //This returns correct data -bash-3.2$ ./sstable2json Documents-38-Data.db -k jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec { jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec: [[5f3a46514944, 30646635613534612d363164382d343430652d393461392d62343630363162613266656380, 1296272041356884, false], [5f3a504152454e54, 65373466316138632d313934652d343939652d383835362d64316536343939613862636180, 1296272041369884, false], [5f3a4944, 30646635613534612d363164382d343430652d393461392d62343630363162613266656380, 1296272041342884, false], [efbfbf4d455441efbfbf, aced0005737200136a6176612e7574696c2e41727261794c6973747881d21d99c7619d03000149000473697a6578767704000a74002d5f3a4944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002d5f3a46514944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002f5f3a504152454e54efbfbf65373466316138632d313934652d343939652d383835362d643165363439396138626361740032313a76706172656e746964efbfbf65373466316138632d313934652d343939652d383835362d64316536343939613862636174000e333a6e616d65efbfbf656d61696c740016333a7072696d61727954797065efbfbf31313a61707078, 1296272041458884, false]] } //Look at the key in the json output. It doesn't match the key passed as argument -bash-3.2$ ./sstable2json Documents-38-Data.db -k jhwKcHZx���120fc562-cf9f-4204-963d-0ed0d8cd2d09 { jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec: [[5f3a46514944, 30646635613534612d363164382d343430652d393461392d62343630363162613266656380, 1296272041356884, false], [5f3a504152454e54, 65373466316138632d313934652d343939652d383835362d64316536343939613862636180, 1296272041369884, false], [5f3a4944, 30646635613534612d363164382d343430652d393461392d62343630363162613266656380, 1296272041342884, false], [efbfbf4d455441efbfbf, aced0005737200136a6176612e7574696c2e41727261794c6973747881d21d99c7619d03000149000473697a6578767704000a74002d5f3a4944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002d5f3a46514944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002f5f3a504152454e54efbfbf65373466316138632d313934652d343939652d383835362d643165363439396138626361740032313a76706172656e746964efbfbf65373466316138632d313934652d343939652d383835362d64316536343939613862636174000e333a6e616d65efbfbf656d61696c740016333a7072696d61727954797065efbfbf31313a61707078, 1296272041458884, false]] } -bash-3.2$ //This returns correct data -bash-3.2$ ./sstable2json Documents-38-Data.db -k jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76 { jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76: [[31313a6d73732e626c6f622e73697a65, 373780, 1296278215537884, false], [31313a6d73732e6d73672e3173742e7365656e2e73656373, 3080, 1296278215526884, false], [31313a6d73732e6d73672e61727674696d65, 3132383632363630373180, 1296278215627884, false], [31313a6d73732e6d73672e626f756e6365, 66616c736580, 1296278215653884, false], [31313a6d73732e6d73672e64656c2e6e6472, 66616c736580, 1296278215543884, false], [31313a6d73732e6d73672e6578702e73656373, 3080, 1296278215549884, false], [31313a6d73732e6d73672e666c616773, 3080, 1296278215679884, false], [31313a6d73732e6d73672e6964, 30346632663464612d373234642d343066312d393562662d34373939623937616465373680, 1296278215673884, false], [31313a6d73732e6d73672e6b6579776f726473, 80, 1296278215520884, false], [31313a6d73732e6d73672e6c6173745f616363, 3080, 1296278215569884, false], [31313a6d73732e6d73672e6d756c7469706c652e6d736773, 46c2900ec3a780, 1296278215691884, false], [31313a6d73732e6d73672e7072696f72, 80, 1296278215697884, false], [31313a6d73732e6d73672e70726976617465, 66616c736580, 1296278215592884, false], [31313a6d73732e6d73672e73697a65, 3636383180, 1296278215532884, false], [31313a6d73732e6d73672e74696d65317374616363, 3080
Re: Retaining commit logs
Thanks Oleg! Could you please share the patch. I have build Cassandra before from source. I can definitely give it try. -Naren On Wed, Oct 6, 2010 at 3:55 AM, Oleg Anastasyev olega...@gmail.com wrote: Is it possible to retain the commit logs? In off-the-shelf cassandra 0.6.5 this is not possible, AFAIK. I developed a patch we use internally in our company for commit log archivation and replay. I can share a patch with you, if you dare patching cassandra sources by yourself ;-) PS. Are other ppl interested in this functionality ? I could file it to JIRA as well...
Query on sstable2json - possible bug
Hi, I am using sstable2json to extract row data for debugging some application issue. I first ran sstablekeys to find the list of keys in the sstable. Then I use the key to fetch row from sstable. The sstable is from Lucandra deployment. I get following. -bash-3.2$ ./sstablekeys Documents-37-Data.db | more jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec jhwKcHZx���120fc562-cf9f-4204-963d-0ed0d8cd2d09 jhwKcHZx���93d78bce-7713-4ff9-bc83-b02663a1a55c jhwKcHZx���e6f6f5ef-a09f-4e84-9727-56867e81be00 jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76 jqCF6zxM���917f66a6-7a95-4789-82ca-aaa511f6b56e //This returns correct data -bash-3.2$ ./sstable2json Documents-38-Data.db -k jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec { jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec: [[5f3a46514944, 30646635613534612d363164382d343430652d393461392d62343630363162613266656380, 1296272041356884, false], [5f3a504152454e54, 65373466316138632d313934652d343939652d383835362d64316536343939613862636180, 1296272041369884, false], [5f3a4944, 30646635613534612d363164382d343430652d393461392d62343630363162613266656380, 1296272041342884, false], [efbfbf4d455441efbfbf, aced0005737200136a6176612e7574696c2e41727261794c6973747881d21d99c7619d03000149000473697a6578767704000a74002d5f3a4944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002d5f3a46514944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002f5f3a504152454e54efbfbf65373466316138632d313934652d343939652d383835362d643165363439396138626361740032313a76706172656e746964efbfbf65373466316138632d313934652d343939652d383835362d64316536343939613862636174000e333a6e616d65efbfbf656d61696c740016333a7072696d61727954797065efbfbf31313a61707078, 1296272041458884, false]] } //Look at the key in the json output. It doesn't match the key passed as argument -bash-3.2$ ./sstable2json Documents-38-Data.db -k jhwKcHZx���120fc562-cf9f-4204-963d-0ed0d8cd2d09 { jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec: [[5f3a46514944, 30646635613534612d363164382d343430652d393461392d62343630363162613266656380, 1296272041356884, false], [5f3a504152454e54, 65373466316138632d313934652d343939652d383835362d64316536343939613862636180, 1296272041369884, false], [5f3a4944, 30646635613534612d363164382d343430652d393461392d62343630363162613266656380, 1296272041342884, false], [efbfbf4d455441efbfbf, aced0005737200136a6176612e7574696c2e41727261794c6973747881d21d99c7619d03000149000473697a6578767704000a74002d5f3a4944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002d5f3a46514944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002f5f3a504152454e54efbfbf65373466316138632d313934652d343939652d383835362d643165363439396138626361740032313a76706172656e746964efbfbf65373466316138632d313934652d343939652d383835362d64316536343939613862636174000e333a6e616d65efbfbf656d61696c740016333a7072696d61727954797065efbfbf31313a61707078, 1296272041458884, false]] } -bash-3.2$ //This returns correct data -bash-3.2$ ./sstable2json Documents-38-Data.db -k jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76 { jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76: [[31313a6d73732e626c6f622e73697a65, 373780, 1296278215537884, false], [31313a6d73732e6d73672e3173742e7365656e2e73656373, 3080, 1296278215526884, false], [31313a6d73732e6d73672e61727674696d65, 3132383632363630373180, 1296278215627884, false], [31313a6d73732e6d73672e626f756e6365, 66616c736580, 1296278215653884, false], [31313a6d73732e6d73672e64656c2e6e6472, 66616c736580, 1296278215543884, false], [31313a6d73732e6d73672e6578702e73656373, 3080, 1296278215549884, false], [31313a6d73732e6d73672e666c616773, 3080, 1296278215679884, false], [31313a6d73732e6d73672e6964, 30346632663464612d373234642d343066312d393562662d34373939623937616465373680, 1296278215673884, false], [31313a6d73732e6d73672e6b6579776f726473, 80, 1296278215520884, false], [31313a6d73732e6d73672e6c6173745f616363, 3080, 1296278215569884, false], [31313a6d73732e6d73672e6d756c7469706c652e6d736773, 46c2900ec3a780, 1296278215691884, false], [31313a6d73732e6d73672e7072696f72, 80, 1296278215697884, false], [31313a6d73732e6d73672e70726976617465, 66616c736580, 1296278215592884, false], [31313a6d73732e6d73672e73697a65, 3636383180, 1296278215532884, false], [31313a6d73732e6d73672e74696d65317374616363, 3080, 1296278215647884, false], [31313a6d73732e6d73672e74797065, 80, 1296278215685884, false], [31313a6d73732e6d73672e756964, 3130303480, 1296278215563884, false], [31313a6d73732e6d73672e756e72656164, 7472756580, 1296278215659884, false], [31313a6d73732e766572, 3080, 1296278215633884, false], [5f3a46514944, 30346632663464612d373234642d343066312d393562662d34373939623937616465373680, 1296278215500884, false], [5f3a504152454e54, 62646638666262622d323265392d343830302d623533612d35373032333838303436616680, 1296278215514884, false], [5f3a4944,
Re: Query on sstable2json - possible bug
0.6.5 -Naren On Tue, Oct 5, 2010 at 6:56 PM, Jonathan Ellis jbel...@gmail.com wrote: Version? On Tue, Oct 5, 2010 at 7:28 PM, Narendra Sharma narendra.sha...@gmail.com wrote: Hi, I am using sstable2json to extract row data for debugging some application issue. I first ran sstablekeys to find the list of keys in the sstable. Then I use the key to fetch row from sstable. The sstable is from Lucandra deployment. I get following. -bash-3.2$ ./sstablekeys Documents-37-Data.db | more jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec jhwKcHZx���120fc562-cf9f-4204-963d-0ed0d8cd2d09 jhwKcHZx���93d78bce-7713-4ff9-bc83-b02663a1a55c jhwKcHZx���e6f6f5ef-a09f-4e84-9727-56867e81be00 jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76 jqCF6zxM���917f66a6-7a95-4789-82ca-aaa511f6b56e //This returns correct data -bash-3.2$ ./sstable2json Documents-38-Data.db -k jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec { jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec: [[5f3a46514944, 30646635613534612d363164382d343430652d393461392d62343630363162613266656380, 1296272041356884, false], [5f3a504152454e54, 65373466316138632d313934652d343939652d383835362d64316536343939613862636180, 1296272041369884, false], [5f3a4944, 30646635613534612d363164382d343430652d393461392d62343630363162613266656380, 1296272041342884, false], [efbfbf4d455441efbfbf, aced0005737200136a6176612e7574696c2e41727261794c6973747881d21d99c7619d03000149000473697a6578767704000a74002d5f3a4944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002d5f3a46514944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002f5f3a504152454e54efbfbf65373466316138632d313934652d343939652d383835362d643165363439396138626361740032313a76706172656e746964efbfbf65373466316138632d313934652d343939652d383835362d64316536343939613862636174000e333a6e616d65efbfbf656d61696c740016333a7072696d61727954797065efbfbf31313a61707078, 1296272041458884, false]] } //Look at the key in the json output. It doesn't match the key passed as argument -bash-3.2$ ./sstable2json Documents-38-Data.db -k jhwKcHZx���120fc562-cf9f-4204-963d-0ed0d8cd2d09 { jhwKcHZx���0df5a54a-61d8-440e-94a9-b46061ba2fec: [[5f3a46514944, 30646635613534612d363164382d343430652d393461392d62343630363162613266656380, 1296272041356884, false], [5f3a504152454e54, 65373466316138632d313934652d343939652d383835362d64316536343939613862636180, 1296272041369884, false], [5f3a4944, 30646635613534612d363164382d343430652d393461392d62343630363162613266656380, 1296272041342884, false], [efbfbf4d455441efbfbf, aced0005737200136a6176612e7574696c2e41727261794c6973747881d21d99c7619d03000149000473697a6578767704000a74002d5f3a4944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002d5f3a46514944efbfbf30646635613534612d363164382d343430652d393461392d62343630363162613266656374002f5f3a504152454e54efbfbf65373466316138632d313934652d343939652d383835362d643165363439396138626361740032313a76706172656e746964efbfbf65373466316138632d313934652d343939652d383835362d64316536343939613862636174000e333a6e616d65efbfbf656d61696c740016333a7072696d61727954797065efbfbf31313a61707078, 1296272041458884, false]] } -bash-3.2$ //This returns correct data -bash-3.2$ ./sstable2json Documents-38-Data.db -k jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76 { jqCF6zxM���04f2f4da-724d-40f1-95bf-4799b97ade76: [[31313a6d73732e626c6f622e73697a65, 373780, 1296278215537884, false], [31313a6d73732e6d73672e3173742e7365656e2e73656373, 3080, 1296278215526884, false], [31313a6d73732e6d73672e61727674696d65, 3132383632363630373180, 1296278215627884, false], [31313a6d73732e6d73672e626f756e6365, 66616c736580, 1296278215653884, false], [31313a6d73732e6d73672e64656c2e6e6472, 66616c736580, 1296278215543884, false], [31313a6d73732e6d73672e6578702e73656373, 3080, 1296278215549884, false], [31313a6d73732e6d73672e666c616773, 3080, 1296278215679884, false], [31313a6d73732e6d73672e6964, 30346632663464612d373234642d343066312d393562662d34373939623937616465373680, 1296278215673884, false], [31313a6d73732e6d73672e6b6579776f726473, 80, 1296278215520884, false], [31313a6d73732e6d73672e6c6173745f616363, 3080, 1296278215569884, false], [31313a6d73732e6d73672e6d756c7469706c652e6d736773, 46c2900ec3a780, 1296278215691884, false], [31313a6d73732e6d73672e7072696f72, 80, 1296278215697884, false], [31313a6d73732e6d73672e70726976617465, 66616c736580, 1296278215592884, false], [31313a6d73732e6d73672e73697a65, 3636383180, 1296278215532884, false], [31313a6d73732e6d73672e74696d65317374616363, 3080, 1296278215647884, false], [31313a6d73732e6d73672e74797065, 80, 1296278215685884, false], [31313a6d73732e6d73672e756964, 3130303480, 1296278215563884, false], [31313a6d73732e6d73672e756e72656164, 7472756580, 1296278215659884, false], [31313a6d73732e766572, 3080
Re: Preventing Swapping.
Read Use mlockall via JNA, if present, to prevent Linux from swapping out parts of the JVM https://issues.apache.org/jira/browse/CASSANDRA-1214 on following link: http://www.riptano.com/blog/whats-new-cassandra-065 -Naren On Wed, Sep 29, 2010 at 5:21 PM, Jeremy Davis jerdavis.cassan...@gmail.comwrote: Did anyone else see this article on preventing swapping? Seems like it would also apply to Cassandra. http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/ -JD
High number of DigestMismatchException
We are seeing high number of DigestMismatchException on our Cassandra deployment. We have a cluster of 4 nodes with RF=3 and we read/write in Quorum. I understand some DigestMismatchException is normal and is the mechanism for Cassandra to ensure consistency by doing read-repair. In our case, even though we have 4 clients, only the client that writes the data, read the data because of request sharding at client end. So I would expect the replication to happen fast and data be consistent on the 3 copies before the read hits the cluster. The size of column value is approx 128K. We verified multiple times that the timestamp of all the clients is in sync. Is this something to worry about? How do we troubleshoot if this an issue? Thanks, Naren
Cassandra client - clock sync
Hi, We have an application that uses Cassandra to store data. The application is deployed on multiple nodes that are part of an application cluster. We are at present using single Cassandra node. We have noticed few errors in application and our analysis revealed that the root cause was that the clock on different application nodes was off by few miliseconds (approx 3.5 ms). AFAIK all the application nodes using Cassandra should have clock synched. Is this understanding correct? If yes, what is the recommended way to keep the clocks in sync? Even if we use NTP the clocks go out of sync after few hours. Should we write a cron job to sync time every N minutes or hours? What is the recommendation in production? How are other Cassandra users handling the clock sync in production environment? Thanks, Naren