Re: Cassandra Agent
Now that's a fun problem to solve! On 11 באוק 2013, at 17:17, David Schairer dschai...@humbaba.net wrote: http://en.wikipedia.org/wiki/List_of_children_of_Priam You've got plenty of children of Priam to go around. Doesn't anyone read the Iliad any more? :) --DRS On Oct 11, 2013, at 6:55 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Stick sandra on the end. Restsandra. On Friday, October 11, 2013, Ran Tavory ran...@gmail.com wrote: Seems like the greeks are all used out, how about moving the the japanese mythology? it's a brand new pool of names... http://en.wikipedia.org/wiki/Japanese_mythology On Fri, Oct 11, 2013 at 8:29 AM, Blair Zajac bl...@orcaware.com wrote: On 10/10/2013 10:28 PM, Blair Zajac wrote: On 10/10/2013 08:53 PM, Sean McCully wrote: On Thursday, October 10, 2013 08:30:42 PM Blair Jacuzzi wrote: On 10/10/2013 07:54 PM, Sean McCully wrote: Hello Cassandra Users, I've recently created a Cassandra Agent as part of Netflix's Cloud Prize competition, the submission which I've named Hector is largely based on Netflix's Priam. I would be very interested in getting feedback, from anyone willing to give Hector (https://github.com/seanmccully/hector) a try. I am very interested in seeing if this is something the Cassandra Community is interested in using with their Cassandra installs. For one, there's a name conflict with the well known Hector Cassandra client project: http://hector-client.github.io/hector/build/html/index.html Any suggestions on a new name? Helenus, the twin brother of the prophetess Cassandra??? http://en.wikipedia.org/wiki/Helenus Oops, should have Googled myself before suggesting this, they are NodeJS Bindings for Cassandra: https://github.com/simplereach/helenus Well, I'll leave it to you to find a free name ;) Blair -- /Ran http://tavory.com
Re: Cassandra Agent
Seems like the greeks are all used out, how about moving the the japanese mythology? it's a brand new pool of names... http://en.wikipedia.org/wiki/Japanese_mythology On Fri, Oct 11, 2013 at 8:29 AM, Blair Zajac bl...@orcaware.com wrote: On 10/10/2013 10:28 PM, Blair Zajac wrote: On 10/10/2013 08:53 PM, Sean McCully wrote: On Thursday, October 10, 2013 08:30:42 PM Blair Jacuzzi wrote: On 10/10/2013 07:54 PM, Sean McCully wrote: Hello Cassandra Users, I've recently created a Cassandra Agent as part of Netflix's Cloud Prize competition, the submission which I've named Hector is largely based on Netflix's Priam. I would be very interested in getting feedback, from anyone willing to give Hector (https://github.com/**seanmccully/hectorhttps://github.com/seanmccully/hector) a try. I am very interested in seeing if this is something the Cassandra Community is interested in using with their Cassandra installs. For one, there's a name conflict with the well known Hector Cassandra client project: http://hector-client.github.**io/hector/build/html/index.**htmlhttp://hector-client.github.io/hector/build/html/index.html Any suggestions on a new name? Helenus, the twin brother of the prophetess Cassandra??? http://en.wikipedia.org/wiki/**Helenushttp://en.wikipedia.org/wiki/Helenus Oops, should have Googled myself before suggesting this, they are NodeJS Bindings for Cassandra: https://github.com/**simplereach/helenushttps://github.com/simplereach/helenus Well, I'll leave it to you to find a free name ;) Blair -- /Ran http://tavory.com
[no subject]
Hi, I have a small cluster of 1.2.6 and after some config changes I started seeing errors int the logs. Not sure that's related, but the changes I performed were to disable hinted handoff and disable auto snapshot. I'll try to reverte these, see if the picture changes. But anyway, that seems like a bug, right? I see this across many nodes, not only one. ERROR [ReplicateOnWriteStage:105] 2013-10-06 16:13:13,799 CassandraDaemon.java (line 192) Exception in thread Thread[ReplicateOnWriteStage:105,5,main] java.lang.AssertionError: DecoratedKey(-9223372036854775808, ) != DecoratedKey(-1854619418400985942, 00033839390a4769676f707469782d3100) in /raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db at org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:119) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:60) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:272) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1214) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1126) at org.apache.cassandra.db.Table.getRow(Table.java:347) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:64) at org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMutation.java:90) at org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:772) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1593) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) ERROR [ReplicateOnWriteStage:82] 2013-10-06 16:13:14,249 CassandraDaemon.java (line 192) Exception in thread Thread[ReplicateOnWriteStage:82,5,main] java.lang.RuntimeException: java.lang.IllegalArgumentException: unable to seek to position 2171332 in /raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db (1250125 bytes) in read-only mode at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1597) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.IllegalArgumentException: unable to seek to position 2171332 in /raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db (1250125 bytes) in read-only mode at org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:306) at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:42) at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1054) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.createFileDataInput(SSTableNamesIterator.java:94) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:112) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:60) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:272) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1214) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1126) at org.apache.cassandra.db.Table.getRow(Table.java:347) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:64) at org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMutation.java:90) at org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:772) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1593) -- /Ran http://tavory.com
AssertionError: DecoratedKey(... ) != DecoratedKey (...)
Pardon me, now with the appropriate subject line... Hi, I have a small cluster of 1.2.6 and after some config changes I started seeing errors int the logs. Not sure that's related, but the changes I performed were to disable hinted handoff and disable auto snapshot. I'll try to reverte these, see if the picture changes. But anyway, that seems like a bug, right? I see this across many nodes, not only one. ERROR [ReplicateOnWriteStage:105] 2013-10-06 16:13:13,799 CassandraDaemon.java (line 192) Exception in thread Thread[ReplicateOnWriteStage:105,5,main] java.lang.AssertionError: DecoratedKey(-9223372036854775808, ) != DecoratedKey(-1854619418400985942, 00033839390a4769676f707469782d3100) in /raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db at org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:119) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:60) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:272) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1214) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1126) at org.apache.cassandra.db.Table.getRow(Table.java:347) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:64) at org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMutation.java:90) at org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:772) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1593) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) ERROR [ReplicateOnWriteStage:82] 2013-10-06 16:13:14,249 CassandraDaemon.java (line 192) Exception in thread Thread[ReplicateOnWriteStage:82,5,main] java.lang.RuntimeException: java.lang.IllegalArgumentException: unable to seek to position 2171332 in /raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db (1250125 bytes) in read-only mode at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1597) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.IllegalArgumentException: unable to seek to position 2171332 in /raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db (1250125 bytes) in read-only mode at org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:306) at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:42) at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1054) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.createFileDataInput(SSTableNamesIterator.java:94) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:112) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:60) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:272) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1214) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1126) at org.apache.cassandra.db.Table.getRow(Table.java:347) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:64) at org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMutation.java:90) at org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:772) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1593) -- /Ran http://tavory.com
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query
Hi all, when using the java-driver I see this error on the client, for reads (as well as for writes). Many of the ops succeed, however I do see a significant amount of errors. com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.ResultSetFuture.convertException(ResultSetFuture.java:243) at com.datastax.driver.core.ResultSetFuture$ResponseCallback.onSet(ResultSetFuture.java:119) at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:202) at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:331) at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:484) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) The cluster itself isn't working very hard and seems to be in good shape... CPU Load is around 0.1, IO wait is below 1%, all hosts are up, not flapping of anything and the logs don't indicate any special GC activity... So I'm a bit puzzled as to where to look next. Any hints?... -- /Ran http://tavory.com
Re: AssertionError: DecoratedKey(... ) != DecoratedKey (...)
Update: I've reverted hinted_handoff_enabled back to its default value of true and the errors stopped. Is this just a coincidence, or could be related? On Sun, Oct 6, 2013 at 7:23 PM, Ran Tavory ran...@gmail.com wrote: Pardon me, now with the appropriate subject line... Hi, I have a small cluster of 1.2.6 and after some config changes I started seeing errors int the logs. Not sure that's related, but the changes I performed were to disable hinted handoff and disable auto snapshot. I'll try to reverte these, see if the picture changes. But anyway, that seems like a bug, right? I see this across many nodes, not only one. ERROR [ReplicateOnWriteStage:105] 2013-10-06 16:13:13,799 CassandraDaemon.java (line 192) Exception in thread Thread[ReplicateOnWriteStage:105,5,main] java.lang.AssertionError: DecoratedKey(-9223372036854775808, ) != DecoratedKey(-1854619418400985942, 00033839390a4769676f707469782d3100) in /raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db at org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:119) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:60) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:272) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1214) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1126) at org.apache.cassandra.db.Table.getRow(Table.java:347) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:64) at org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMutation.java:90) at org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:772) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1593) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) ERROR [ReplicateOnWriteStage:82] 2013-10-06 16:13:14,249 CassandraDaemon.java (line 192) Exception in thread Thread[ReplicateOnWriteStage:82,5,main] java.lang.RuntimeException: java.lang.IllegalArgumentException: unable to seek to position 2171332 in /raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db (1250125 bytes) in read-only mode at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1597) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.IllegalArgumentException: unable to seek to position 2171332 in /raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db (1250125 bytes) in read-only mode at org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:306) at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:42) at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1054) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.createFileDataInput(SSTableNamesIterator.java:94) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:112) at org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:60) at org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:272) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1214) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1126) at org.apache.cassandra.db.Table.getRow(Table.java:347) at org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:64
Re: 0.7.0 mx4j, get attribute
Try adding this to the end of the URL: ?template=identity On Thu, Feb 3, 2011 at 4:23 PM, Chris Burroughs chris.burrou...@gmail.comwrote: On 02/02/2011 01:41 PM, Ryan King wrote: On Wed, Feb 2, 2011 at 10:40 AM, Chris Burroughs chris.burrou...@gmail.com wrote: I'm using 0.7.0 and experimenting with the new mx4j support. http://host:port /mbean?objectname=org.apache.cassandra.request%3Atype%3DReadStage Returns a nice pretty html page. For purposes of monitoring I would like to get a single attribute as xml. The docs [1] decribe a getattribute endpoint. But I have been unable to get anything other than a blank response from that. mx4j does not seem to include any logging for troubleshooting. Example: http://host:port /getattribute?objectname=org.apache.cassandra.request%3atype%3dReadStageattribute=PendingTasks returns 200 OK with no data. If anyone could point out what embarrassingly simple mistake I am making I would be much obliged. [1] http://mx4j.sourceforge.net/docs/ch05.html Note that many objects in cassandra aren't initialized until they're used for the first time. -ryan But if I can access them through jconsole just fine I don't see what would be stopping mx4j. -- /Ran
Re: Do you have a site in production environment with Cassandra? What client do you use?
I use Hector, if that counts. .. On Jan 14, 2011 7:25 PM, Ertio Lew ertio...@gmail.com wrote: Hey, If you have a site in production environment or considering so, what is the client that you use to interact with Cassandra. I know that there are several clients available out there according to the language you use but I would love to know what clients are being used widely in production environments and are best to work with(support most required features for performance). Also preferably tell about the technology stack for your applications. Any suggestions, comments appreciated ? Thanks Ertio
Re: Do you have a site in production environment with Cassandra? What client do you use?
Java On Jan 14, 2011 8:25 PM, Ertio Lew ertio...@gmail.com wrote: what is the technology stack do you use? On 1/14/11, Ran Tavory ran...@gmail.com wrote: I use Hector, if that counts. .. On Jan 14, 2011 7:25 PM, Ertio Lew ertio...@gmail.com wrote: Hey, If you have a site in production environment or considering so, what is the client that you use to interact with Cassandra. I know that there are several clients available out there according to the language you use but I would love to know what clients are being used widely in production environments and are best to work with(support most required features for performance). Also preferably tell about the technology stack for your applications. Any suggestions, comments appreciated ? Thanks Ertio
Re: maven cassandra plugin
Stephen, just FYI cassandra cannot be stopped cleanly. It's jvm must be taken down. So the plugin would need to probably fork a jvm and kill it when it's done. On Thursday, January 6, 2011, B. Todd Burruss bburr...@real.com wrote: would u like some testers? we were about to write one. On 01/06/2011 12:43 PM, Stephen Connolly wrote: I nearly have one ready... my plan is to have it added to contrib... if the cassandra devs agree -stephen - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 6 Jan 2011 19:38, B. Todd Burruss bburr...@real.com wrote: has anyone created a maven plugin, like cargo for tomcat, for automating starting/stopping a cassandra instance? -- /Ran
Re: Bootstrapping taking long
In storage-conf I see this comment [1] from which I understand that the recommended way to bootstrap a new node is to set AutoBootstrap=true and remove itself from the seeds list. Moreover, I did try to set AutoBootstrap=true and have the node in its own seeds list, but it would not bootstrap. I don't recall the exact message but it was something like I found myself in the seeds list therefore I'm not going to bootstrap even though AutoBootstrap is true. [1] !-- ~ Turn on to make new [non-seed] nodes automatically migrate the right data ~ to themselves. (If no InitialToken is specified, they will pick one ~ such that they will get half the range of the most-loaded node.) ~ If a node starts up without bootstrapping, it will mark itself bootstrapped ~ so that you can't subsequently accidently bootstrap a node with ~ data on it. (You can reset this by wiping your data and commitlog ~ directories.) ~ ~ Off by default so that new clusters and upgraders from 0.4 don't ~ bootstrap immediately. You should turn this on when you start adding ~ new nodes to a cluster that already has data on it. (If you are upgrading ~ from 0.4, start your cluster with it off once before changing it to true. ~ Otherwise, no data will be lost but you will incur a lot of unnecessary ~ I/O before your cluster starts up.) -- AutoBootstrapfalse/AutoBootstrap On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com wrote: If seed list should be the same across the cluster that means that nodes *should* have themselves as a seed. If that doesn't work for Ran, then that is the first problem, no? On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote: Well your ring issues don't make sense to me, seed list should be the same across the cluster. I'm just thinking of other things to try, non-boostrapped nodes should join the ring instantly but reads will fail if you aren't using quorum. On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote: I haven't tried repair. Should I? On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote: Have you tried not bootstrapping but setting the token and manually calling repair? On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote: My conclusion is lame: I tried this on several hosts and saw the same behavior, the only way I was able to join new nodes was to first start them when they are *not in* their own seeds list and after they finish transferring the data, then restart them with themselves *in* their own seeds list. After doing that the node would join the ring. This is either my misunderstanding or a bug, but the only place I found it documented stated that the new node should not be in its own seeds list. Version 0.6.6. On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.com wrote: My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its own storage-conf.xml file. Then restart the server and then I finally see it in the ring... If I had added the node to the seeds list of itself when first joining it, it would not join the ring but if I do it in two phases it did work. So it's either my misunderstanding or a bug... On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote: The new node does not see itself as part of the ring, it sees all others but itself, so from that perspective the view is consistent. The only problem is that the node never finishes to bootstrap. It stays in this state for hours (It's been 20 hours now...) $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com wrote: Does the new node have itself in the list of seeds per chance? This could cause some issues if so. On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote: I'm still at lost. I haven't been able to resolve this. I tried adding another node at a different location on the ring but this node too remains stuck in the bootstrapping state for many hours without any of the other nodes being busy with anti compaction or anything else. I don't know what's keeping it from finishing the bootstrap,no CPU, no io, files were already streamed so
Re: Bootstrapping taking long
@Thibaut wrong email? Or how's Avoid dropping messages off the client request path (CASSANDRA-1676) related to the bootstrap questions I had? On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz thibaut.br...@trendiction.com wrote: https://issues.apache.org/jira/browse/CASSANDRA-1676 you have to use at least 0.6.7 On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory ran...@gmail.com wrote: In storage-conf I see this comment [1] from which I understand that the recommended way to bootstrap a new node is to set AutoBootstrap=true and remove itself from the seeds list. Moreover, I did try to set AutoBootstrap=true and have the node in its own seeds list, but it would not bootstrap. I don't recall the exact message but it was something like I found myself in the seeds list therefore I'm not going to bootstrap even though AutoBootstrap is true. [1] !-- ~ Turn on to make new [non-seed] nodes automatically migrate the right data ~ to themselves. (If no InitialToken is specified, they will pick one ~ such that they will get half the range of the most-loaded node.) ~ If a node starts up without bootstrapping, it will mark itself bootstrapped ~ so that you can't subsequently accidently bootstrap a node with ~ data on it. (You can reset this by wiping your data and commitlog ~ directories.) ~ ~ Off by default so that new clusters and upgraders from 0.4 don't ~ bootstrap immediately. You should turn this on when you start adding ~ new nodes to a cluster that already has data on it. (If you are upgrading ~ from 0.4, start your cluster with it off once before changing it to true. ~ Otherwise, no data will be lost but you will incur a lot of unnecessary ~ I/O before your cluster starts up.) -- AutoBootstrapfalse/AutoBootstrap On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com wrote: If seed list should be the same across the cluster that means that nodes *should* have themselves as a seed. If that doesn't work for Ran, then that is the first problem, no? On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote: Well your ring issues don't make sense to me, seed list should be the same across the cluster. I'm just thinking of other things to try, non-boostrapped nodes should join the ring instantly but reads will fail if you aren't using quorum. On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote: I haven't tried repair. Should I? On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote: Have you tried not bootstrapping but setting the token and manually calling repair? On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote: My conclusion is lame: I tried this on several hosts and saw the same behavior, the only way I was able to join new nodes was to first start them when they are *not in* their own seeds list and after they finish transferring the data, then restart them with themselves *in* their own seeds list. After doing that the node would join the ring. This is either my misunderstanding or a bug, but the only place I found it documented stated that the new node should not be in its own seeds list. Version 0.6.6. On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.comwrote: My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its own storage-conf.xml file. Then restart the server and then I finally see it in the ring... If I had added the node to the seeds list of itself when first joining it, it would not join the ring but if I do it in two phases it did work. So it's either my misunderstanding or a bug... On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote: The new node does not see itself as part of the ring, it sees all others but itself, so from that perspective the view is consistent. The only problem is that the node never finishes to bootstrap. It stays in this state for hours (It's been 20 hours now...) $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams
Re: Bootstrapping taking long
OK, thanks, so I see we had the same problem (I too had multiple keyspace, not that I know why it matters to the problem at hand) and I see that by upgrading to 0.6.7 you solved your problem (I didn't try it, had a different workaround) but frankly, I don't understand how https://issues.apache.org/jira/browse/CASSANDRA-1676 would relate the the stuck bootstrap problem (I'm not saying that it isn't, I'd just like to understand why...) On Wed, Jan 5, 2011 at 5:42 PM, Thibaut Britz thibaut.br...@trendiction.com wrote: Had the same Problem a while ago. Upgrading solved the problem (Don't know if you have to redeploy your cluster though) http://www.mail-archive.com/user@cassandra.apache.org/msg07106.html On Wed, Jan 5, 2011 at 4:29 PM, Ran Tavory ran...@gmail.com wrote: @Thibaut wrong email? Or how's Avoid dropping messages off the client request path (CASSANDRA-1676) related to the bootstrap questions I had? On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz thibaut.br...@trendiction.com wrote: https://issues.apache.org/jira/browse/CASSANDRA-1676 you have to use at least 0.6.7 On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory ran...@gmail.com wrote: In storage-conf I see this comment [1] from which I understand that the recommended way to bootstrap a new node is to set AutoBootstrap=true and remove itself from the seeds list. Moreover, I did try to set AutoBootstrap=true and have the node in its own seeds list, but it would not bootstrap. I don't recall the exact message but it was something like I found myself in the seeds list therefore I'm not going to bootstrap even though AutoBootstrap is true. [1] !-- ~ Turn on to make new [non-seed] nodes automatically migrate the right data ~ to themselves. (If no InitialToken is specified, they will pick one ~ such that they will get half the range of the most-loaded node.) ~ If a node starts up without bootstrapping, it will mark itself bootstrapped ~ so that you can't subsequently accidently bootstrap a node with ~ data on it. (You can reset this by wiping your data and commitlog ~ directories.) ~ ~ Off by default so that new clusters and upgraders from 0.4 don't ~ bootstrap immediately. You should turn this on when you start adding ~ new nodes to a cluster that already has data on it. (If you are upgrading ~ from 0.4, start your cluster with it off once before changing it to true. ~ Otherwise, no data will be lost but you will incur a lot of unnecessary ~ I/O before your cluster starts up.) -- AutoBootstrapfalse/AutoBootstrap On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com wrote: If seed list should be the same across the cluster that means that nodes *should* have themselves as a seed. If that doesn't work for Ran, then that is the first problem, no? On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote: Well your ring issues don't make sense to me, seed list should be the same across the cluster. I'm just thinking of other things to try, non-boostrapped nodes should join the ring instantly but reads will fail if you aren't using quorum. On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote: I haven't tried repair. Should I? On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote: Have you tried not bootstrapping but setting the token and manually calling repair? On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote: My conclusion is lame: I tried this on several hosts and saw the same behavior, the only way I was able to join new nodes was to first start them when they are *not in* their own seeds list and after they finish transferring the data, then restart them with themselves *in* their own seeds list. After doing that the node would join the ring. This is either my misunderstanding or a bug, but the only place I found it documented stated that the new node should not be in its own seeds list. Version 0.6.6. On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.comwrote: My nodes all have themselves in their list of seeds - always did - and everything works. (You may ask why I did this. I don't know, I must have copied it from an example somewhere.) On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote: I was able to make the node join the ring but I'm confused. What I did is, first when adding the node, this node was not in the seeds list of itself. AFAIK this is how it's supposed to be. So it was able to transfer all data to itself from other nodes but then it stayed in the bootstrapping state. So what I did (and I don't know why it works), is add this node to the seeds list in its
Re: Bootstrapping taking long
I see. Thanks for claryfing Jonathan. On Wednesday, January 5, 2011, Jonathan Ellis jbel...@gmail.com wrote: 1676 says Avoid dropping messages off the client request path. Bootstrap messages are off the client requst path. So, if some of the nodes involved were loaded enough that they were dropping messages older than RPC_TIMEOUT to cope, it could lose part of the bootstrap communication permanently. On Wed, Jan 5, 2011 at 10:01 AM, Ran Tavory ran...@gmail.com wrote: OK, thanks, so I see we had the same problem (I too had multiple keyspace, not that I know why it matters to the problem at hand) and I see that by upgrading to 0.6.7 you solved your problem (I didn't try it, had a different workaround) but frankly, I don't understand how https://issues.apache.org/jira/browse/CASSANDRA-1676 would relate the the stuck bootstrap problem (I'm not saying that it isn't, I'd just like to understand why...) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- /Ran
Re: Bootstrapping taking long
Thanks Shimi, so indeed anticompaction was run on one of the other nodes from the same DC but to my understanding it has already ended. A few hour ago... I plenty of log messages such as [1] which ended a couple of hours ago, and I've seen the new node streaming and accepting the data from the node which performed the anticompaction and so far it was normal so it seemed that data is at its right place. But now the new node seems sort of stuck. None of the other nodes is anticompacting right now or had been anticompacting since then. The new node's CPU is close to zero, it's iostats are almost zero so I can't find another bottleneck that would keep it hanging. On the IRC someone suggested I'd maybe retry to join this node, e.g. decommission and rejoin it again. I'll try it now... [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote: In my experience most of the time it takes for a node to join the cluster is the anticompaction on the other nodes. The streaming part is very fast. Check the other nodes logs to see if there is any node doing anticompaction. I don't remember how much data I had in the cluster when I needed to add/remove nodes. I do remember that it took a few hours. The node will join the ring only when it will finish the bootstrap. Shimi On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote: I asked the same question on the IRC but no luck there, everyone's asleep ;)... Using 0.6.6 I'm adding a new node to the cluster. It starts out fine but then gets stuck on the bootstrapping state for too long. More than an hour and still counting. $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. It seemed to have streamed data from other nodes and indeed the load is non-zero but I'm not clear what's keeping it right now from finishing. $ bin/nodetool -p 9004 -h localhost info 51042355038140769519506191114765231716 Load : 22.49 GB Generation No: 1294133781 Uptime (seconds) : 1795 Heap Memory (MB) : 315.31 / 6117.00 nodetool ring does not list this new node in the ring, although nodetool can happily talk to the new node, it's just not listing itself as a member of the ring. This is expected when the node is still bootstrapping, so the question is still how long might the bootstrap take and whether is it stuck. The data ins't huge so I find it hard to believe that streaming or anti compaction are the bottlenecks. I have ~20G on each node and the new node already has just about that so it seems that all data had already been streamed to it successfully, or at least most of the data... So what is it waiting for now? (same question, rephrased... ;) I tried: 1. Restarting the new node. No good. All logs seem normal but at the end the node is still in bootstrap mode. 2. As someone suggested I increased the rpc timeout from 10k to 30k (RpcTimeoutInMillis) but that didn't seem to help. I did this only on the new node. Should I have done that on all (old) nodes as well? Or maybe only on the ones that were supposed to stream data to that node. 3. Logging level at DEBUG now but nothing interesting going on except for occasional messages such as [1] or [2] So the question is: what's keeping the new node from finishing the bootstrap and how can I check its status? Thanks [1] DEBUG [Timer-1] 2011-01-04 05:21:24,402 LoadDisseminator.java (line 36) Disseminating load info ... [2] DEBUG [RMI TCP Connection(22)-192.168.252.88] 2011-01-04 05:12:48,033 StorageService.java (line 1189) computing ranges for 28356863910078205288614550619314017621, 56713727820156410577229101238628035242
Re: Bootstrapping taking long
Running nodetool decommission didn't help. Actually the node refused to decommission itself (b/c it wasn't part of the ring). So I simply stopped the process, deleted all the data directories and started it again. It worked in the sense of the node bootstrapped again but as before, after it had finished moving the data nothing happened for a long time (I'm still waiting, but nothing seems to be happening). Any hints how to analyze a stuck bootstrapping node?? thanks On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote: Thanks Shimi, so indeed anticompaction was run on one of the other nodes from the same DC but to my understanding it has already ended. A few hour ago... I plenty of log messages such as [1] which ended a couple of hours ago, and I've seen the new node streaming and accepting the data from the node which performed the anticompaction and so far it was normal so it seemed that data is at its right place. But now the new node seems sort of stuck. None of the other nodes is anticompacting right now or had been anticompacting since then. The new node's CPU is close to zero, it's iostats are almost zero so I can't find another bottleneck that would keep it hanging. On the IRC someone suggested I'd maybe retry to join this node, e.g. decommission and rejoin it again. I'll try it now... [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote: In my experience most of the time it takes for a node to join the cluster is the anticompaction on the other nodes. The streaming part is very fast. Check the other nodes logs to see if there is any node doing anticompaction. I don't remember how much data I had in the cluster when I needed to add/remove nodes. I do remember that it took a few hours. The node will join the ring only when it will finish the bootstrap. Shimi On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote: I asked the same question on the IRC but no luck there, everyone's asleep ;)... Using 0.6.6 I'm adding a new node to the cluster. It starts out fine but then gets stuck on the bootstrapping state for too long. More than an hour and still counting. $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. It seemed to have streamed data from other nodes and indeed the load is non-zero but I'm not clear what's keeping it right now from finishing. $ bin/nodetool -p 9004 -h localhost info 51042355038140769519506191114765231716 Load : 22.49 GB Generation No: 1294133781 Uptime (seconds) : 1795 Heap Memory (MB) : 315.31 / 6117.00 nodetool ring does not list this new node in the ring, although nodetool can happily talk to the new node, it's just not listing itself as a member of the ring. This is expected when the node is still bootstrapping, so the question is still how long might the bootstrap take and whether is it stuck. The data ins't huge so I find it hard to believe that streaming or anti compaction are the bottlenecks. I have ~20G on each node and the new node already has just about that so it seems that all data had already been streamed to it successfully, or at least most of the data... So what is it waiting for now? (same question, rephrased... ;) I tried: 1. Restarting the new node. No good. All logs seem normal but at the end the node is still in bootstrap mode. 2. As someone suggested I increased the rpc timeout from 10k to 30k (RpcTimeoutInMillis) but that didn't seem to help. I did this only on the new node. Should I have done that on all (old) nodes as well? Or maybe only on the ones that were supposed to stream data to that node. 3
Re: Bootstrapping taking long
Thanks Jake, but unfortunately the streams directory is empty so I don't think that any of the nodes is anti-compacting data right now or had been in the past 5 hours. It seems that all the data was already transferred to the joining host but the joining node, after having received the data would still remain in bootstrapping mode and not join the cluster. I'm not sure that *all* data was transferred (perhaps other nodes need to transfer more data) but nothing is actually happening so I assume all has been moved. Perhaps it's a configuration error from my part. Should I use I use AutoBootstrap=true ? Anything else I should look out for in the configuration file or something else? On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote: In 0.6, locate the node doing anti-compaction and look in the streams subdirectory in the keyspace data dir to monitor the anti-compaction progress (it puts new SSTables for bootstrapping node in there) On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote: Running nodetool decommission didn't help. Actually the node refused to decommission itself (b/c it wasn't part of the ring). So I simply stopped the process, deleted all the data directories and started it again. It worked in the sense of the node bootstrapped again but as before, after it had finished moving the data nothing happened for a long time (I'm still waiting, but nothing seems to be happening). Any hints how to analyze a stuck bootstrapping node?? thanks On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote: Thanks Shimi, so indeed anticompaction was run on one of the other nodes from the same DC but to my understanding it has already ended. A few hour ago... I plenty of log messages such as [1] which ended a couple of hours ago, and I've seen the new node streaming and accepting the data from the node which performed the anticompaction and so far it was normal so it seemed that data is at its right place. But now the new node seems sort of stuck. None of the other nodes is anticompacting right now or had been anticompacting since then. The new node's CPU is close to zero, it's iostats are almost zero so I can't find another bottleneck that would keep it hanging. On the IRC someone suggested I'd maybe retry to join this node, e.g. decommission and rejoin it again. I'll try it now... [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote: In my experience most of the time it takes for a node to join the cluster is the anticompaction on the other nodes. The streaming part is very fast. Check the other nodes logs to see if there is any node doing anticompaction. I don't remember how much data I had in the cluster when I needed to add/remove nodes. I do remember that it took a few hours. The node will join the ring only when it will finish the bootstrap. Shimi On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote: I asked the same question on the IRC but no luck there, everyone's asleep ;)... Using 0.6.6 I'm adding a new node to the cluster. It starts out fine but then gets stuck on the bootstrapping state for too long. More than an hour and still counting. $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. It seemed to have streamed data from other nodes and indeed the load is non-zero but I'm not clear what's keeping it right now from finishing. $ bin/nodetool -p 9004 -h localhost info 51042355038140769519506191114765231716 Load : 22.49 GB Generation No: 1294133781 Uptime (seconds) : 1795 Heap Memory (MB) : 315.31 / 6117.00 nodetool
Re: Bootstrapping taking long
I'm still at lost. I haven't been able to resolve this. I tried adding another node at a different location on the ring but this node too remains stuck in the bootstrapping state for many hours without any of the other nodes being busy with anti compaction or anything else. I don't know what's keeping it from finishing the bootstrap,no CPU, no io, files were already streamed so what is it waiting for? I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to be anything addressing a similar issue so I figured there was no point in upgrading. But let me know if you think there is. Or any other advice... On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote: Thanks Jake, but unfortunately the streams directory is empty so I don't think that any of the nodes is anti-compacting data right now or had been in the past 5 hours. It seems that all the data was already transferred to the joining host but the joining node, after having received the data would still remain in bootstrapping mode and not join the cluster. I'm not sure that *all* data was transferred (perhaps other nodes need to transfer more data) but nothing is actually happening so I assume all has been moved. Perhaps it's a configuration error from my part. Should I use I use AutoBootstrap=true ? Anything else I should look out for in the configuration file or something else? On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote: In 0.6, locate the node doing anti-compaction and look in the streams subdirectory in the keyspace data dir to monitor the anti-compaction progress (it puts new SSTables for bootstrapping node in there) On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote: Running nodetool decommission didn't help. Actually the node refused to decommission itself (b/c it wasn't part of the ring). So I simply stopped the process, deleted all the data directories and started it again. It worked in the sense of the node bootstrapped again but as before, after it had finished moving the data nothing happened for a long time (I'm still waiting, but nothing seems to be happening). Any hints how to analyze a stuck bootstrapping node??thanks On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote: Thanks Shimi, so indeed anticompaction was run on one of the other nodes from the same DC but to my understanding it has already ended. A few hour ago... I plenty of log messages such as [1] which ended a couple of hours ago, and I've seen the new node streaming and accepting the data from the node which performed the anticompaction and so far it was normal so it seemed that data is at its right place. But now the new node seems sort of stuck. None of the other nodes is anticompacting right now or had been anticompacting since then. The new node's CPU is close to zero, it's iostats are almost zero so I can't find another bottleneck that would keep it hanging. On the IRC someone suggested I'd maybe retry to join this node, e.g. decommission and rejoin it again. I'll try it now... [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote: In my experience most of the time it takes for a node to join the cluster is the anticompaction on the other nodes. The streaming part is very fast. Check the other nodes logs to see if there is any node doing anticompaction.I don't remember how much data I had in the cluster when I needed to add/remove nodes. I do remember that it took a few hours. The node will join the ring only when it will finish the bootstrap. -- /Ran -- /Ran
Re: Bootstrapping taking long
The new node does not see itself as part of the ring, it sees all others but itself, so from that perspective the view is consistent. The only problem is that the node never finishes to bootstrap. It stays in this state for hours (It's been 20 hours now...) $ bin/nodetool -p 9004 -h localhost streams Mode: Bootstrapping Not sending any streams. Not receiving any streams. On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com wrote: Does the new node have itself in the list of seeds per chance? This could cause some issues if so. On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote: I'm still at lost. I haven't been able to resolve this. I tried adding another node at a different location on the ring but this node too remains stuck in the bootstrapping state for many hours without any of the other nodes being busy with anti compaction or anything else. I don't know what's keeping it from finishing the bootstrap,no CPU, no io, files were already streamed so what is it waiting for? I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to be anything addressing a similar issue so I figured there was no point in upgrading. But let me know if you think there is. Or any other advice... On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote: Thanks Jake, but unfortunately the streams directory is empty so I don't think that any of the nodes is anti-compacting data right now or had been in the past 5 hours. It seems that all the data was already transferred to the joining host but the joining node, after having received the data would still remain in bootstrapping mode and not join the cluster. I'm not sure that *all* data was transferred (perhaps other nodes need to transfer more data) but nothing is actually happening so I assume all has been moved. Perhaps it's a configuration error from my part. Should I use I use AutoBootstrap=true ? Anything else I should look out for in the configuration file or something else? On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote: In 0.6, locate the node doing anti-compaction and look in the streams subdirectory in the keyspace data dir to monitor the anti-compaction progress (it puts new SSTables for bootstrapping node in there) On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote: Running nodetool decommission didn't help. Actually the node refused to decommission itself (b/c it wasn't part of the ring). So I simply stopped the process, deleted all the data directories and started it again. It worked in the sense of the node bootstrapped again but as before, after it had finished moving the data nothing happened for a long time (I'm still waiting, but nothing seems to be happening). Any hints how to analyze a stuck bootstrapping node??thanks On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote: Thanks Shimi, so indeed anticompaction was run on one of the other nodes from the same DC but to my understanding it has already ended. A few hour ago... I plenty of log messages such as [1] which ended a couple of hours ago, and I've seen the new node streaming and accepting the data from the node which performed the anticompaction and so far it was normal so it seemed that data is at its right place. But now the new node seems sort of stuck. None of the other nodes is anticompacting right now or had been anticompacting since then. The new node's CPU is close to zero, it's iostats are almost zero so I can't find another bottleneck that would keep it hanging. On the IRC someone suggested I'd maybe retry to join this node, e.g. decommission and rejoin it again. I'll try it now... [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java (line 338) AntiCompacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')] INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java (line 338) AntiCompacting
Re: Hector version
Use 0.6.0-19 On Friday, December 31, 2010, Zhidong She zhidong@gmail.com wrote: Hi guys, We are trying Cassandra 0.6.8, and could you please kindly tell me which Hector Java client is suitable for 0.6.8? The Hector 0.7.0 says it's for Cassandra 0.7.X, and shall we use Hector 0.6.0? Thanks, Br Zhidong -- /Ran
Re: Cassandra Monitoring
FYI, I just added an mx4j section to the bottom of this page http://wiki.apache.org/cassandra/Operations On Sun, Dec 19, 2010 at 4:30 PM, Jonathan Ellis jbel...@gmail.com wrote: mx4j? https://issues.apache.org/jira/browse/CASSANDRA-1068 On Sun, Dec 19, 2010 at 8:36 AM, Peter Schuller peter.schul...@infidyne.com wrote: How / what are you monitoring? Best practices someone? I recently set up monitoring using the cassandra-munin-plugins (https://github.com/jamesgolick/cassandra-munin-plugins). However, due to various little details that wasn't too fun to integrate properly with munin-node-configure and automated configuration management. A problem is also the starting of a JVM for each use of jmxquery, which can become a problem with many column families. I like your web server idea. Something persistent that can sit there and do the JMX acrobatics, and expose something more easily consumed for stuff like munin/zabbix/etc. It would be pretty nice to have that out of the box with Cassandra, though I expect that would be considered bloat. :) -- / Peter Schuller -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- /Ran
Re: iterate over all the rows with RP
This should be the case, yes, semantics isn't affected by the connection and state isn't kept. What might happen if you read/write with low consistency levels then when you hit a different host on the ring it might have an inconsistent state in case of partition. On Sunday, December 12, 2010, shimi shim...@gmail.com wrote: So if I will use a different connection (thrift via Hector), will I get the same results? It's make sense when you use OPP and I assume it is the same with RP. I just wanted to make sure this is the case and there is no state which is kept. Shimi On Sun, Dec 12, 2010 at 8:14 PM, Peter Schuller peter.schul...@infidyne.com wrote: Is the same connection is required when iterating over all the rows with Random Paritioner or is it possible to use a different connection for each iteration? In general, the choice of RPC connection (I assume you mean the underlying thrift connection) does not affect the semantics of the RPC calls. -- / Peter Schuller -- /Ran
Re: understanding the cassandra storage scaling
there are two numbers to look at, N the numbers of hosts in the ring (cluster) and R the number of replicas for each data item. R is configurable per column family. Typically for large clusters N R. For very small clusters if makes sense for R to be close to N in which case cassandra is useful so the database doesn't have a single a single point of failure but not so much b/c of the size of the data. But for large clusters it rarely makes sense to have N=R, usually N R. On Thu, Dec 9, 2010 at 12:28 PM, Jonathan Colby jonathan.co...@gmail.comwrote: I have a very basic question which I have been unable to find in online documentation on cassandra. It seems like every node in a cassandra cluster contains all the data ever stored in the cluster (i.e., all nodes are identical). I don't understand how you can scale this on commodity servers with merely internal hard disks. In other words, if I want to store 5 TB of data, does that each node need a hard disk capacity of 5 TB?? With HBase, memcached and other nosql solutions it is more clear how data is spilt up in the cluster and replicated for fault tolerance. Again, please excuse the rather basic question. -- /Ran
Re: understanding the cassandra storage scaling
So is it not true that each node contains all the data in the cluster? No, not in the general case, in fact rarely is it the case. Usually RN. In my case I have N=6 and R=2. You configure R per CF under ReplicationFactor (v0.6.*) or replication_factor (v0.7.*). http://wiki.apache.org/cassandra/StorageConfiguration On Thu, Dec 9, 2010 at 12:43 PM, Jonathan Colby jonathan.co...@gmail.comwrote: Thanks Ran. This helps a little but unfortunately I'm still a bit fuzzy for me. So is it not true that each node contains all the data in the cluster? I haven't come across any information on how clustered data is coordinated in cassandra. how does my query get directed to the right node? On Thu, Dec 9, 2010 at 11:35 AM, Ran Tavory ran...@gmail.com wrote: there are two numbers to look at, N the numbers of hosts in the ring (cluster) and R the number of replicas for each data item. R is configurable per column family. Typically for large clusters N R. For very small clusters if makes sense for R to be close to N in which case cassandra is useful so the database doesn't have a single a single point of failure but not so much b/c of the size of the data. But for large clusters it rarely makes sense to have N=R, usually N R. On Thu, Dec 9, 2010 at 12:28 PM, Jonathan Colby jonathan.co...@gmail.com wrote: I have a very basic question which I have been unable to find in online documentation on cassandra. It seems like every node in a cassandra cluster contains all the data ever stored in the cluster (i.e., all nodes are identical). I don't understand how you can scale this on commodity servers with merely internal hard disks. In other words, if I want to store 5 TB of data, does that each node need a hard disk capacity of 5 TB?? With HBase, memcached and other nosql solutions it is more clear how data is spilt up in the cluster and replicated for fault tolerance. Again, please excuse the rather basic question. -- /Ran -- /Ran
Re: Taking down a node in a 3-node cluster, RF=2
to me it makes sense that if hinted handoff is off then cassandra cannot satisfy 2 out of every 3rd writes writes when one of the nodes is down since this node is the designated node of 2/3 writes. But I don't remember reading this somewhere. Does hinted handoff affect David's situation? (David, did you disable HH in your storage-config? HintedHandoffEnabledfalse/HintedHandoffEnabled) On Sun, Nov 28, 2010 at 4:32 PM, David Boxenhorn da...@lookin2.com wrote: For the vast majority of my data usage eventual consistency is fine (i.e. CL=ONE) but I have a small amount of critical data for which I read and write using CL=QUORUM. If I have a cluster with 3 nodes and RF=2, and CL=QUORUM does that mean that a value can be read from or written to any 2 nodes, or does it have to be the particular 2 nodes that store the data? If it is the particular 2 nodes that store the data, that means that I can't even take down one node, since it will be the mandatory 2nd node for 1/3 of my data... -- /Ran
Re: Hector question under cassandra 0.7
u...@cass to bcc Indeed, the KeyspaceOperator isn't thread safe. (and in recent revisions it was extracted to an interface at http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/hector/api/Keyspace.javaand implementation at http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/model/ExecutingKeyspace.java On Thu, Oct 21, 2010 at 12:10 AM, Ned Wolpert ned.wolp...@imemories.comwrote: I figure I'd reply to my own question in case this helps others. Talking on the IRC, having one KeyspaceOperator per thread (via ThreadLocal) makes sense. On Wed, Oct 20, 2010 at 9:13 AM, Ned Wolpert ned.wolp...@imemories.comwrote: Folks- I'm finally upgrading the grails-cassandra plugin for 0.7, and wanted to understand a bit more on the usage of the Cluster and KeyspaceOperator . Is the cluster object retrieved from HFactor.createCluster() thread safe, and is the KeyspaceOperator required to only be in one thread? Or are both thread safe objects? My assumption is I can call createCluster any time as it will only create one cluster object. I'm trying to decide if the KeyspaceOperator should be unique to each thread (threadlocal variable) or unique to each web request. Thanks -- Virtually, Ned Wolpert Settle thy studies, Faustus, and begin... --Marlowe -- Virtually, Ned Wolpert Settle thy studies, Faustus, and begin... --Marlowe -- /Ran
Cassandra users meetup in Israel
http://cassandra-il.eventbrite.com/ Hi all, I'm organizing a users meetup in Israel, if you happen to be around you're most welcome to join. Event Details: The first Cassandra users meetup in Israel will take place at outbrain, Natanya on Tuesday Nov 16th 4pm. Please register and get yourself a (free) ticket from eventbrite. Space is limited and we want to make sure we have enough space for everyone, so please register. Once you register simply arrive at outbrain (directions below) and call Ran from Saifun's reception desk. We will cover: o Operations (Ran, unless someone else wants to jump in) o Internals and implementation overview (me again...) o Success stories and pain points - an open roundtable. Please advise, if you have specific subjects you're interested in or talks you're willing to propose (15-30min) we''re naturally very open to suggestions. -- /Ran
Re: Cassandra 0.6.6
it's not official yet, in voting now. On Tue, Oct 12, 2010 at 8:41 PM, marinko pasic marinko_pa...@hotmail.comwrote: Hi there, I'm just wandering is the Cassandra 0.6.6 release, which I found on: http://people.apache.org/~eevans/ is an official release? If it is not, can you tell me where I can find official 0.6.6 Cassandra release? Thx Marinko -- /Ran
Re: Nodes getting slowed down after a few days of smooth operation
Thanks Peter, Robert and Brandon. So it seems that the only suspect by now is my excessive caching ;) I'll get a better look at the GC activity next time shit starts to happen, but in the mean time, as for the cache size (cassandra's internal cache), it's row cache capacity is set to 10,000,000. I actually wanted to say 100% but at the time there was a bug that interpreted 100% as just 1 so I used 10M instead. My motivation was that since I don't have too much data (10G each node) then why don't I cache the hell out of it, so I started with a cache size of 100% and a much larger heap size (started with 12G out of the 16G ram). Over time I've learned that too much heap for the JVM is like a kid in a candy shop, it'll eat as much as it can and then throw up (the kid was GC storming), so I started lowering the max heap until I reached 6G. with 4G I ran OOM BTW. So now I have row cach capacity of effectively 100%, a heap size of 6G, data of 10G and so I wonder how come the heap doesn't explode? Well, as it turns out, although I have 10G data on each node, the row cache effective size is only about 681 * 2377203 = 1.6G (bytes) Key cache: disabled Row cache capacity: 1000 Row cache size: *2377203* Row cache hit rate: 0.7017551635100059 Compacted row minimum size: 392 Compacted row maximum size: 102961 Compacted row mean size: *681* This strengthens what both Peter and Brandon have suggested that the row cache is generating too much GC b/c it gets invalidated too frequently. That's certainly possible, so I'll try to set a 50% row cache size on one of the nodes (and wait about a week...) and see what happens, and if this proves to be the answer then this means that my dream of I have so little data and so much ram, why don't I cache the hell out of it isn't going to come true b/c too much of the row cache gets invalidated and hence GCed which creates too much overhead for the JVM. (well, at least I was getting nice read performance while it lasted ;) If this is true, then how would you recommend optimizing the row cache size for maximum utility and minimum GC overhead? I've pasted here a log snippet from one of the servers while it was at high CPU and GCing http://pastebin.com/U1cszFKv You can see a large number of pending reads as well as other pending tasks (response stage or consistency manager). GC runs every like 20-40 seconds and almost for the entire duration of that 20-40 secs. I'm not sure what to make of all the other numbers such as: GC for ConcurrentMarkSweep: 22742 ms, 181335192 reclaimed leaving 6254994856 used; max is 6552551424 Thanks! On Mon, Oct 11, 2010 at 7:42 PM, Peter Schuller peter.schul...@infidyne.com wrote: 170141183460469231731687303715884105727 192.168.252.88Up 10.07 GB Firstly, I second the point raised about the row cache size (very frequent concurrent GC:s is definitely an indicator that the JVM heap size is too small, and the row cache seems like a likely contender - especially given that you say it builds up over days). Note though that you have to look at the GCInspector's output with respect to the concurrent mark/sweep GC phases to judge the live set in your heap, rather than system memory. Attaching with jconsole or visualvm to the JVM will also give you a pretty good view of what's going on. Look for the heap usage as it appears after one of the major dips in the graph (not the regular sawtooth dips, which are young generation collections and won't help indicate actual live set). That said, with respect to caching effects: Your total data size seems to be about in the same ballpark as memory. Your maximum heap size is 6 gig; on a 16 gig machine, taking into account varous overhead, maybe you've got something like 8 GB for buffer cache? It doesn't sound strange at all that there would be a significant difference between a 32 GB machine and a 16 GB machine given your ~ 10 GB data size given that buffer cache size goes from slightly below data size to almost three times data size. Especially when major or almost-major compactions are triggered; on the small machine you would expect to evict everything from cache during a compaction (except that touched *during* the compaction) while on the larger machine the newly written sstables effectively fit the cache too. But note that these are two pretty different conditions; the first is about making sure your JVM heap size is appropriate. The second can be tested for by observing I/O load (iostat -x -k 1) and correlating with compactions. So e.g., what's the average utilization and queue size in iostat just before a compaction vs. just after it? That difference should be due to cache eviction (assuming you're not servicing a built-up backlog). There is also the impact of compaction itself, as it is happening, and the I/O it generates. In general, the higher your disk
Re: Nodes getting slowed down after a few days of smooth operation
Peter, you're my JVM GC hero! Thank you! On Tue, Oct 12, 2010 at 12:38 AM, Peter Schuller peter.schul...@infidyne.com wrote: My motivation was that since I don't have too much data (10G each node) then why don't I cache the hell out of it, so I started with a cache size of 100% and a much larger heap size (started with 12G out of the 16G ram). Over time I've learned that too much heap for the JVM is like a kid in a candy shop, it'll eat as much as it can and then throw up (the kid was GC storming), In general CMS will tend to gobble up the maximum heap size unless your workload is such that the heuristics really work well and don't expand the heap beyond some level, but it won't magically fill the heap with data that doesn't exist. If you were reaching the maximum heap size with 12 GB, making the heap 6 GB instead won't make it better. Also, just be sure that you're really having an issue with GC. For example frequent young-generation GC:s are fully expected and normal. If you are seeing extremely frequent concurrent mark/sweep phases that do not free up a lot of data - that is an indication that the heap is too small. So, with respect to GC storming, a bigger heap is generally better. The bigger the heap, the more effective GC is and the less often a concurrent mark/sweep has to happen. But this does not mean you want to give it too big a heap either, since whatever is gobbled up by the heap *won't* be used by the operating system for buffer caching. Keeping a big row cache may or may not be a good idea depending on circumstances, but if you have one, that directly implies additional heap usage and the heap must be sized accordingly. The row cache are just objects in memory; there is no automatic row cache size adjustment in response to heap pressure. If 10 million rows is your entire data set, and if that dataset is 10 GB on disk (without in-memory object overhead), then I am not surprised at all that you're seeing issues after a few days of uptime. Likely the row cache is just much too big for the heap. so I started lowering the max heap until I reached 6G. with 4G I ran OOM BTW. Note that OOM and GC storming are often equivalent in terms of their cause (unless the OOM is caused by a single huge allocation or something). It's just that actually determining whether you are out of memory is difficult for the JVM, so there are heuristics involved. You may be sufficiently out of memory that you see excessive GC activity, but not so much as to trigger the threshold of GC inefficiency at which the JVM decides to actually through an OOM. So now I have row cach capacity of effectively 100%, a heap size of 6G, data of 10G and so I wonder how come the heap doesn't explode? Well, everything up to now has suggested to me that it *is* exploding ;) But: Well, as it turns out, although I have 10G data on each node, the row cache effective size is only about 681 * 2377203 = 1.6G (bytes) Key cache: disabled Row cache capacity: 1000 Row cache size: 2377203 Row cache hit rate: 0.7017551635100059 Compacted row minimum size: 392 Compacted row maximum size: 102961 Compacted row mean size: 681 This strengthens what both Peter and Brandon have suggested that the row cache is generating too much GC b/c it gets invalidated too frequently. Note that the compacted row size is not directly indicative of in-memory row size. I'm not sure what the overhead is expected to be though off hand; but you can probably assume a factor of 2 just from general fragmentation issue. Add to that overhead from the representation in object form itself etc. 1.6x2 = 3.2. Now we're starting to get close, especially taking into account additional overhead and other things on the heap. That's certainly possible, so I'll try to set a 50% row cache size on one of the nodes (and wait about a week...) and see what happens, and if this proves to be the answer then this means that my dream of I have so little data and so much ram, why don't I cache the hell out of it isn't going to come true b/c too much of the row cache gets invalidated and hence GCed which creates too much overhead for the JVM. (well, at least I was getting nice read performance while it lasted ;) Given that you're not hitting your maximum cache size, data isn't evicted from the cache except as it is updated. Presumably that means you're actually not hitting the worst-case scenario, which is LRU eviction. Even then though, it's not as simple as it just being too much for the JVM. Especially given the rows/second that you'd expect to be evicted in Cassandra. A high rate of eviction does mean you need more margin in terms of free heap, but I seriously doubt the fundamental problem here is GC throughput vs. eviction rate. In general, I cannot
Re: Cassandra for graph data structure
Courtney this certainly sounds interesting and as Nate suggested we're always looking for valuable contributions. A few things to keep in mind: - I'm curious, as Lucas has asked - is it possible to create an efficient graph API over cassandra and what are the tradeoffs? - If the API is general enough and the functionality is reusable then we'd be happy to add it to hector. If not, you can create a library that uses hector as a layer. On Friday, September 24, 2010, Courtney Robinson sa...@live.co.uk wrote: ?Nate Lucas thanks for the responses. Nate, I think it would be asking a bit much to suggest the hector team implement convenience methods for a graph representations. But if we went ahead and forked hector, I'd be sure to contribute back what i can and just release it as another client or if the final product can be merged with hector... I'd like thoughts on any features outside my own usecase though so that we can build it to handle other use cases as well. Lucas, I understand what you're saying but i've had a quick play with neo4j and the expense we'd pay for reads offsets a lot of the setbacks i'd run into using neo4j, not to mention having to learn it... -- From: Nate McCall n...@riptano.com Sent: Friday, September 24, 2010 4:14 PM To: user@cassandra.apache.org Subject: Re: Cassandra for graph data structure My idea however was to fork hector, remove all the stuff i don't need and turn it into a graph API sitting on top of Cassandra. We are always looking for ideas and design feedback regarding Hector. Please feel free to make suggestions or fork and send pull requests. http://groups.google.com/group/hector-users
Re: Client developer mailing list
awesome, thanks, I'm subscribed :) On Mon, Aug 30, 2010 at 10:05 PM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote: There has been a new mailing list created for those who are working on Cassandra clients above thrift and/or avro. You can subscribe by sending an email to client-dev-subscr...@cassandra.apache.org or using the link at the bottom of http://cassandra.apache.org The list is meant to give client authors a discussion forum as well as a place to interact with core cassandra developers about the roadmap and upcoming features. Thanks to Cliff Moon (@moonpolysoft) for starting a discussion about client quality at the Cassandra Summit.
Re: Read before Write
I haven't benchmarked so it's purely theoretical. If there's no caching then I'm pretty sure just writing would yield better performance. If you do cache rows/keys it really depends on your hit ratio. Naturally if you have a small data set and high cache ratio and use row caching I'm pretty sure it's better to read first. Although writes are order of magnitude faster than reads, if you have high write rate then cassandra might throttle you at different bottlenecks, depending on your hardware and data so for example disk is many times a bottleneck (and you can teak storage-conf to improve that), sometimes memory is pressing and I have seen also CPU pressure although it's less common. You need to also keep in mind that even if you write the same value but with a newer timestamp then cassandra will have to run compactions and that's where disk/mem is usually bottlenecking. Bottom line - if you can cache (have enough mem) and there's good hit ratio, cache entire rows and read first. If not, always write first and make sure compactions aren't killing you, if they are, tweak storage-conf to do less compactions. On Fri, Aug 27, 2010 at 5:44 PM, Chen Xinli chen.d...@gmail.com wrote: I think Just writing all the time is much better, as most of replacements will be done in memtable. also you should set a large memtable size, in compared with the average row size. 2010/8/27 Daniel Doubleday daniel.double...@gmx.net Hi people I was wondering if anyone already benchmarked such a situation: I have: day of year (row key) - SomeId (column key) - byte[0] I need to make sure that I write SomeId, but in around 80% of the cases it will be already present (so I would essentially replace it with itself). RF will be 2. So should I rather just write all the time (given that cassandra is so fast on write) or should I read and write only if not present? Cheers, Daniel -- Best Regards, Chen Xinli
Re: Calls block when using Thrift API
did you try connecting to a real cassandra instance, not an embedded one? I use an embedded one for testing and it works, but just to narrow down your problem. On Fri, Aug 27, 2010 at 6:13 PM, Ruben de Laat ru...@logic-labs.nl wrote: Hi, I am new to cassandra, so maybe I am missing something obvious... Version: Latest nightly build (2010-08-23_13-57-40), but same results with 0.7.0b1 Server code (default configuration file): System.setProperty(cassandra.config, conf/cassandra.yaml); EmbeddedCassandraService embeddedCassandraService = new EmbeddedCassandraService(); embeddedCassandraService.init(); Client code: Socket socket = new Socket(127.0.0.1, 9160); TSocket transport = new TSocket(socket); TBinaryProtocol tBinaryProtocol = new TBinaryProtocol(transport); Client client = new Client(tBinaryProtocol); System.out.println(client.describe_cluster_name()); The problem is that it hangs/blocks on the client.describe_cluster_name() call, actually it hangs on any call I have tried. I was first trying with the Pelops client, but that one is using the Thrift API as well, so this is narrowed down. I have already tried multiple different combination of creating the client (different transports). I have also tried with thrift_framed_transport_size_in_mb: 0 disabling framed transports. Starting the client without a running server gives a proper Connection refused, so some sort of connection is definitely made. Thanks and Kind regards, Ruben
Re: [RELEASE] 0.7.0 beta1
[cross posting to u...@cass and hector-use...@googlegroups] Happy to announce hector's support in 0.7.0. Hector is a java client for cassandra which wraps the low level thrift interface with a nicer API, adds monitoring, connection pooling and more. I didn't do anything... The amazing 0.7.0 work was done by Ed (thanks Ed) with support from Nate (thanks) This version includes support in a 0.7.0 cluster with feature parity to 0.6.0 (all API calls that used to work in 0.6.* now work with 0.7.0-beta1). Complete support for all 0.7.0 new features is in the works and will be available soon (the system_ calls). A good place to start with some examples is here http://github.com/rantav/hector/blob/0.7.0/src/test/java/me/prettyprint/cassandra/model/ApiV2SystemTest.javaand here http://github.com/rantav/hector/blob/0.7.0/src/main/java/me/prettyprint/cassandra/examples/ExampleDaoV2.java The code is here http://github.com/rantav/hector/tree/0.7.0, a zip file with all dependencies is on the downloads page ( http://github.com/rantav/hector/downloads) and here's a direct link http://github.com/downloads/rantav/hector/hector-0.7.0-16.zip Enjoy On Fri, Aug 13, 2010 at 11:24 PM, Eric Evans eev...@rackspace.com wrote: Happy Friday the 13th. Are you feeling lucky? I know I am. Ok, first off, a disclaimer. As the suffix on the version indicates this is *beta* software. If you run off and upgrade a production server with this there is a very good chance that you are going to be sad/fired/mocked/ridiculed/laughed at/sorry. FUD aside, any help testing 0.7.0-beta1 would be very appreciated. The list of changes is enormous[1] and we want to make sure we shake as many bugs out before the final as possible. If you're coming from 0.6, there are some things to keep in mind, and they're documented in the release notes[2], so be sure to read them. If you find bugs, please file a report[3], and if you have questions, don't hesitate to ask them. Have fun! [1]: http://bit.ly/d4HOMw [2]: http://bit.ly/9fcewt [3]: https://issues.apache.org/jira/browse/CASSANDRA -- Eric Evans eev...@rackspace.com
KeyRange.token in 0.7.0
I'm a bit confused WRT KeyRange's tokens in 0.7.0 When making a range query you can either use KeyRange.key or KeyRange.token. In 0.7.0 key was typed as byte[]. tokens remain strings. What does this string represent in case of a RP and in case of an OPP? Did this change in 0.7.0? AFAIK in 0.6.0 if the partitioner is OPP then the tokens are actual strings and they might just be actual subset of the keys. When using a RP tokens are BigIntegers (keys are still strings) and I'm not actually sure if you're allowed to shoot a range query using tokens... In 0.7.0 since keys are now bytes, when using an OPP, how do those bytes translate to strings? I'd assume it'd just be byte[] - UTF8 conversion, only that this may result in illegal UTF8 chars when keys are just random bytes, so I guess not... Perhaps md5 hashing? But then if using an OPP and keys are actual strings, I want to have the same 0.6.0 functionality in place, meaning tokens are strings like the keys. I actually tested this scenario and it looks working, so it seems like the String keys are translated to UTF8, but what happens when they are invalid UTF8? Another question is what's the story with RP in 0.7.0? Should range query even be supported with tokens? If so, then are the tokens expected to be string of integers? (e.g. 1234567890) Thanks.
Re: KeyRange.token in 0.7.0
On Wed, Aug 18, 2010 at 4:30 PM, Jonathan Ellis jbel...@gmail.com wrote: (a) if you're using token queries and you're not hadoop, you're doing it wrong ah, didn't know that, so I guess I'll remove support for it from hector... (b) they are expected to be of the form generated by TokenFactory.toString and fromString. You should not be generating them yourself. On Wed, Aug 18, 2010 at 7:56 AM, Ran Tavory ran...@gmail.com wrote: I'm a bit confused WRT KeyRange's tokens in 0.7.0 When making a range query you can either use KeyRange.key or KeyRange.token. In 0.7.0 key was typed as byte[]. tokens remain strings. What does this string represent in case of a RP and in case of an OPP? Did this change in 0.7.0? AFAIK in 0.6.0 if the partitioner is OPP then the tokens are actual strings and they might just be actual subset of the keys. When using a RP tokens are BigIntegers (keys are still strings) and I'm not actually sure if you're allowed to shoot a range query using tokens... In 0.7.0 since keys are now bytes, when using an OPP, how do those bytes translate to strings? I'd assume it'd just be byte[] - UTF8 conversion, only that this may result in illegal UTF8 chars when keys are just random bytes, so I guess not... Perhaps md5 hashing? But then if using an OPP and keys are actual strings, I want to have the same 0.6.0 functionality in place, meaning tokens are strings like the keys. I actually tested this scenario and it looks working, so it seems like the String keys are translated to UTF8, but what happens when they are invalid UTF8? Another question is what's the story with RP in 0.7.0? Should range query even be supported with tokens? If so, then are the tokens expected to be string of integers? (e.g. 1234567890) Thanks. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: File write errors but cassandra isn't crashing
I opened as an improvement suggestion: https://issues.apache.org/jira/browse/CASSANDRA-1409 On Mon, Aug 16, 2010 at 8:26 PM, Benjamin Black b...@b3k.us wrote: Useful config option, perhaps? On Mon, Aug 16, 2010 at 8:51 AM, Jonathan Ellis jbel...@gmail.com wrote: That's a tough call -- you can also come up with scenarios where you'd rather have it read-only than completely dead. On Wed, Aug 11, 2010 at 12:38 PM, Ran Tavory ran...@gmail.com wrote: Due to administrative error one of the hosts in the cluster lost permission to write to it's data directory. So I started seeing errors in the log, however, the server continued serving traffic. It wasn't able to compact and do other write operations but it didn't crash. I was wondering wether that's by design and if so, is this a good one... I guess I want to know if really bad things happen to my cluster... logs look like that... INFO [FLUSH-TIMER] 2010-08-11 07:53:14,683 ColumnFamilyStore.java (line 357) KvAds has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/outbrain/cassandra/commitlog/Commi tLog-1281505164614.log', position=88521163) INFO [FLUSH-TIMER] 2010-08-11 07:53:14,683 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(KvAds)@851225759 INFO [FLUSH-WRITER-POOL:1] 2010-08-11 07:53:14,684 Memtable.java (line 148) Writing Memtable(KvAds)@851225759 ERROR [FLUSH-WRITER-POOL:1] 2010-08-11 07:53:14,688 DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.FileNotFoundException: /outbrain/cassandra/data/outbrain_kvdb/KvAds-tmp-249-Data.db (Permission denied) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /outbrain/cassandra/data/outbrain_kvdb/KvAds-tmp-249-Data.db (Permission denied) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... more -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: The entry of Cassandra
The common practice is to connect to a few hosts and send request in round robin or other lb tactic. The hosts are symmetric so any host will do. There are also higher lever libraries that help with that as well as connection pooling and other goodies On Mon, Aug 16, 2010 at 1:14 PM, Ying Tang ivytang0...@gmail.com wrote: After reading the docs and the thrift demo , i found that if the demo ,if we want to connect to the database , we must first do TTransport tr = new TSocket(localhost, 9160) . Then we operate on the database through this TTransport . But this operation assigns a fixed IP , so all requests would transformed to this IP . And the cassandra node of this ip would load a heavy reading load and proxy load . Do i understand this wrong , or cassandra client has other way to access cassandra and doesn't need to assign a fixed IP? -- Best regards, Ivy Tang -- Best regards, Ivy Tang
File write errors but cassandra isn't crashing
Due to administrative error one of the hosts in the cluster lost permission to write to it's data directory. So I started seeing errors in the log, however, the server continued serving traffic. It wasn't able to compact and do other write operations but it didn't crash. I was wondering wether that's by design and if so, is this a good one... I guess I want to know if really bad things happen to my cluster... logs look like that... INFO [FLUSH-TIMER] 2010-08-11 07:53:14,683 ColumnFamilyStore.java (line 357) KvAds has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/outbrain/cassandra/commitlog/Commi tLog-1281505164614.log', position=88521163) INFO [FLUSH-TIMER] 2010-08-11 07:53:14,683 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(KvAds)@851225759 INFO [FLUSH-WRITER-POOL:1] 2010-08-11 07:53:14,684 Memtable.java (line 148) Writing Memtable(KvAds)@851225759 ERROR [FLUSH-WRITER-POOL:1] 2010-08-11 07:53:14,688 DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask *java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.FileNotFoundException: /outbrain/cassandra/data/outbrain_kvdb/KvAds-tmp-249-Data.db (Permission denied) *at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) *Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /outbrain/cassandra/data/outbrain_kvdb/KvAds-tmp-249-Data.db (Permission denied) *at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... more
Re: RuntimeException: Cannot service reads while bootstrapping!
ok, so I don't send writes to bootstrapping or decommissioned nodes, that's cool, but what about the inconsistent ring view after nodetool move, isn't this strange? After the move, the moved node has the correct view of the ring but all other nodes have the old view. I waited a few minutes after the log said that Bootstrap/move completed! Now serving reads but this didn't help, view was still inconsistent. Only restarting the moved node helped other nodes realize the change. On Wed, Aug 4, 2010 at 3:24 PM, Jonathan Ellis jbel...@gmail.com wrote: Don't point clients at nodes that aren't part of the ring. Cassandra rejecting requests when you do is a feature. On Wed, Aug 4, 2010 at 6:52 AM, Ran Tavory ran...@gmail.com wrote: Is this a known issue? Running 0.6.2 I moved a node to different token and eventually saw errors in the log. ERROR [ROW-READ-STAGE:116804] 2010-08-04 06:34:29,699 DebuggableThreadPoolExecutor.java (line 101) Error in ThreadPoolExecutor java.lang.RuntimeException: Cannot service reads while bootstrapping! at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:66) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) ERROR [ROW-READ-STAGE:116805] 2010-08-04 06:34:29,700 CassandraDaemon.java (line 82) Fatal exception in thread Thread[ROW-READ-STAGE:116805,5,main] java.lang.RuntimeException: Cannot service reads while bootstrapping! at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:66) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) ... many more of those and then... INFO [MESSAGE-DESERIALIZER-POOL:1] 2010-08-04 06:34:29,709 StorageService.java (line 181) Bootstrap/move completed! Now serving reads. The move ended up ok but during the operation the log was filled with those errors and at the end of it the ring state was inconsistent. If I ask the moved node where it is in the ring it tells me something but other nodes tell something else... (ob1124)(cassan...@cass24:apache-cassandra-0.6.2)$ nodetool -h 192.168.254.58 -p 9004 ring Address Status Load Range Ring 170141183460469231731687303715884105727 192.168.252.88Up 5.7 GB 14131484407726020523932116250949797205 |--| 192.168.252.124Up 2.44 GB 56713727820156410577229101238628035242 | ^ 192.168.254.58Up 8.13 GB 113427455640312821154458202477256070484v | 192.168.254.57Up 6.52 GB 113427455640312821154458202477256070485| ^ 192.168.252.125Up 6.52 GB 141784319550391026443072753096570088105v | 192.168.254.59Up 1.63 GB 170141183460469231731687303715884105727|--| (ob1124)(cassan...@cass24:apache-cassandra-0.6.2)$ nodetool -h 192.168.252.124 -p 9004 ring Address Status Load Range Ring 170141183460469231731687303715884105727 192.168.252.88Up 5.7 GB 14131484407726020523932116250949797205 |--| 192.168.252.124Up 2.46 GB 56713727820156410577229101238628035242 | ^ 192.168.254.57Up 6.52 GB 113427455640312821154458202477256070485v | 192.168.252.125Up 6.52 GB 141784319550391026443072753096570088105| ^ 192.168.254.58Up 1.63 GB 141784319550391026443072753096570088106v | 192.168.254.59Up 1.63 GB 170141183460469231731687303715884105727|--| Restarting the moved node fixes the ring view by other hosts. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: about cassandra compression
cassandra doesn't compress before storing, no. It may be beneficial to compress, depending on the size of your data, network latency, disk size and data compressability... You'll need to test. I sometimes compress, depending on data size but it's done in the client, On Mon, Jul 26, 2010 at 1:31 PM, john xie shanfengg...@gmail.com wrote: is cassandra compression before stored? when I stored the data, is compression beneficial to reduce the storage space?
Re: CRUD test
Oleg, note that the unofficial recommendation is to use microsec, not mili. As jonathan notes, although there isn't a real way to get microsec in java, at the very least you should take the mili and multiply it by 1000. If you use hector then just use Keyspace.createTimestamp() ( http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/Keyspace.java#L236 ) On Sun, Jul 25, 2010 at 8:54 AM, Oleg Tsvinev oleg.tsvi...@gmail.comwrote: Thank you guys for your help! Yes, I am using System.currentTimeMillis() in my CRUD test. Even though I'm still using it my tests now run as expected. I do not use cassandra-cli anymore. @Ran great job on Hector, I wish there was more documentation but I managed. @Jonathan, what is the recommended time source? I use batch_mutation to insert and update multiple columns atomically. Do I have to use the batch_mutation for deletion, too? On Sat, Jul 24, 2010 at 2:36 PM, Jonathan Shook jsh...@gmail.com wrote: Just to clarify, microseconds may be used, but they provide the same behavior as milliseconds if they aren't using a higher time resolution underneath. In some cases, the microseconds are generated simply as milliseconds * 1000, which doesn't actually fix any sequencing bugs. On Sat, Jul 24, 2010 at 3:46 PM, Ran Tavory ran...@gmail.com wrote: Hi Oleg, I didn't follow up the entire thread, but just to let you know that the 0.6.* version of the CLI uses microsec as the time unit for timestamps. Hector also uses micros to match that, however, previous versions of hector (as well as the CLI) used milliseconds, not micro. So if you're using hector version 0.6.0-11 or earlier, or by any chance in some other ways are mixing milisec in your app (are you using System.currentTimeMili() somewhere?) then the behavior you're seeing is expected. On Sat, Jul 24, 2010 at 1:06 AM, Jonathan Shook jsh...@gmail.com wrote: I think you are getting it. As far as what means what at which level, it's really about using them consistently in every case. The [row] key (or [row] key range) is a top-level argument for all of the operations, since it is the key to mapping the set of responsible nodes. The key is the part of the name of any column which most affects how the load is apportioned in the cluster, so it is used very early in request processing. On Fri, Jul 23, 2010 at 4:22 PM, Peter Minearo peter.mine...@reardencommerce.com wrote: Consequentially the remove should look like: ColumnPath cp1 = new ColumnPath(Super2); cp1.setSuper_column(Best Western.getBytes()); client.remove(KEYSPACE, hotel, cp1, System.currentTimeMillis(), ConsistencyLevel.ONE); ColumnPath cp2 = new ColumnPath(Super2); cp2.setSuper_column(Econolodge.getBytes()); client.remove(KEYSPACE, hotel, cp2, System.currentTimeMillis(), ConsistencyLevel.ONE); -Original Message- From: Peter Minearo [mailto:peter.mine...@reardencommerce.com] Sent: Fri 7/23/2010 2:17 PM To: user@cassandra.apache.org Subject: RE: CRUD test CORRECTION: ColumnPath cp1 = new ColumnPath(Super2); cp1.setSuper_column(Best Western.getBytes()); cp1.setColumn(name.getBytes()); client.insert(KEYSPACE, hotel, cp1, Best Western of SF.getBytes(), System.currentTimeMillis(), ConsistencyLevel.ALL); -Original Message- From: Peter Minearo [mailto:peter.mine...@reardencommerce.com] Sent: Friday, July 23, 2010 2:14 PM To: user@cassandra.apache.org Subject: RE: CRUD test Interesting!! Let me rephrase to make sure I understood what is going on: When Inserting data via the insert function/method: void insert(string keyspace, string key, ColumnPath column_path, binary value, i64 timestamp, ConsistencyLevel consistency_level) The key parameter is the actual Key to the Row, which contains SuperColumns. The 'ColumnPath' gives the path within the Key. INCORRECT: ColumnPath cp1 = new ColumnPath(Super2); cp1.setSuper_column(hotel.getBytes()); cp1.setColumn(Best Western.getBytes()); client.insert(KEYSPACE, name, cp1, Best Western of SF.getBytes(), System.currentTimeMillis(), ConsistencyLevel.ALL); CORRECT: ColumnPath cp1 = new ColumnPath(Super2); cp1.setSuper_column(name.getBytes()); cp1.setColumn(Best Western.getBytes()); client.insert(KEYSPACE, hotel, cp1, Best Western of SF.getBytes(), System.currentTimeMillis(), ConsistencyLevel.ALL); -Original Message- From: Jonathan Shook [mailto:jsh
Re: Question on Eventual Consistency
if your test case is correct then it sounds like a bug to me. With one node, unless you're writing with CL=0 you should get full consistency. On Mon, Jul 19, 2010 at 10:14 PM, Hugo h...@unitedgames.com wrote: Hi, Being fairly new to Cassandra I have a question on the eventual consistency. I'm currently performing experiments with a single-node Cassandra system and a single client. In some of my tests I perform an update to an existing subcolumn in a row and subsequently read it back from the same thread. More often than not I get back the value I've written (and expected), but sometimes it can occur that I get back the old value of the subcolumn. Is this a bug or does it fall into the eventual consistency? I'm using Hector 0.6.0-14 on Cassandra 0.6.3 on a single disk, double-core Windows machine with a Sun 1.6 JVM. All reads and writes are quorum (the default), but I don't think this matters in my setup. Groets, Hugo.
Re: How to stop Cassandra running in embeded mode
look at my pom. it has forkModealways/ http://github.com/rantav/hector/blob/master/pom.xml#L95 On Wed, Jul 14, 2010 at 3:02 PM, Andriy Kopachevsky kopachev...@gmail.comwrote: Ran, I do know to run jest in own thread with maven surefire plugin, but don't sure how can I do this with own JVM for each test. How are you doing this? Thanks. On Fri, Jul 9, 2010 at 10:33 PM, Ran Tavory ran...@gmail.com wrote: The workaround I do is fork always. Each test pulls up its own jvm. On Jul 9, 2010 9:51 PM, Jonathan Ellis jbel...@gmail.com wrote: there's some support for this in 0.7 (see http://issues.apache.org/jira/browse/CASSANDRA-1018) but fundamentally it's not really designed to be started and stopped multiple times within the same process. On Thu, Jul 8, 2010 at 3:44 AM, Andriy Kopachevsky kopachev...@gmail.com wrote: Hi, we are tryi... -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Performance Issues
Since you're using hector hector-users@ is a good place to be, so u...@cassandra to bcc operateWithFailover is one stop before sending the request over the network and waiting, so it makes lots of sense that a significant part of the application is spent in it. On Tue, Jul 13, 2010 at 6:22 PM, Samuru Jackson samurujack...@googlemail.com wrote: Hi, I have set up a ring with a couple of servers and wanted to run some stress tests. Unfortunately, there is some kind of bottleneck at the client side. I'm using Hector and Cassandra 0.6.1. The subsequent profile results are based on a small Java program that inserts sequentially records, with a couple of columns, into Cassandra (no-multithreading or something that increases the stress). The nodes are not too busy while inserting the records (approx. 20%-25% CPU utilization). Log-Level is on Info and I don't see any exception flying around. The client has also registered all available node IPs. According to my Profiler operateWithFailover(me.prettyprint.cassandra.service.Operation) consumes ~86% of the execution time and further down the hierarchy the method executeAndSetResult(org.apache.cassandra.thrift.Cassandra$Client) ist responsible for ~73%. I'm inserting the columns one-by-one is such way: ColumnPath cp = new ColumnPath(colFamilyName); cp.setColumn(bytes(colName)); cp.setSuper_column(bytes(superColName)); keySpace.insert(key, cp, value.getBytes()); Can anyone point me out in what I could look into to resolve this issue? /SJ
Re: Using Pelops with Cassandra 0.7.X
Hector doesn't have 0.7 support yet On Jul 14, 2010 1:34 AM, Peter Harrison cheetah...@gmail.com wrote: I know Cassandra 0.7 isn't released yet, but I was wondering if anyone has used Pelops with the latest builds of Cassandra? I'm having some issues, but I wanted to make sure that somebody else isn't working on a branch of Pelops to support Cassandra 7. I have downloaded and built the latest code from GitHub, trunk of Pelops, and this works with 6.3, but not Cassandra Trunk. Is Pelops worth updating or should I use other client libraries for Java such as Hector?
Re: How to stop Cassandra running in embeded mode
The workaround I do is fork always. Each test pulls up its own jvm. On Jul 9, 2010 9:51 PM, Jonathan Ellis jbel...@gmail.com wrote: there's some support for this in 0.7 (see http://issues.apache.org/jira/browse/CASSANDRA-1018) but fundamentally it's not really designed to be started and stopped multiple times within the same process. On Thu, Jul 8, 2010 at 3:44 AM, Andriy Kopachevsky kopachev...@gmail.com wrote: Hi, we are tryi... -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
http://scale.metaoptimize.com/
Just found this site and thought it might be interesting to folks on this list. http://scale.metaoptimize.com/ It's a stack-overflow style qna site, in their words: A community interested in scalability, high availability, data stores, NoSQL, distributed computing, parallel computing, cloud computing, elastic computing, HPC, grid computing, AWS, crawling, failover, redundancy, and concurrency.
Re: Hector Client Failover errors
ttransport exception usually happens when the server cannot respond or there's a network error. Can you send more context to from your code? More context from the exception? Is the insertion rate about the same in the thrift or hector versions? If insertion with hector is faster than thrift (connection pooling) then maybe your server is freaking out. On Sun, Jun 27, 2010 at 1:53 PM, Atul Gosain atul.gos...@gmail.com wrote: I am trying to insert the data using hector client. Using only one host in the pool ie. localhost. like as follows CassandraClientPool pool = CassandraClientPoolFactory.INSTANCE.get(); client = pool.borrowClient(localhost, 9160); global = client.getKeyspace(keyspace, ConsistencyLevel.ONE); After some(5-6) iterations of insertions of data (every 5 min insertion of about 40 MB of data), the program starts emitting this 10/06/27 09:55:28 WARN service.FailoverOperator: Got a TTransportException from localhost. Num of retries: 1 10/06/27 09:55:28 INFO service.FailoverOperator: Skipping to next host. Current host is: localhost 10/06/27 09:55:28 INFO service.FailoverOperator: Skipped host. New host is: localhost I couldnt understand the reason for this, when im using only device in the pool. If i remove the ConsistencyLevel.ONE, then the error starts from the first iteration itself. The same program through Thrift runs without any problems. Actually, i just modified the thrift program and replaced the calls to thrift api to corresponding Hector calls. Thanks Atul
Re: Call for input of cassandra, thrift , hector, pelops example / sample / test code snippets
these classes are from a newer version and they should not exist in version 14. On Fri, Jun 25, 2010 at 3:42 PM, Gavan Hood gavan.h...@gmail.com wrote: Hi Ran, I downloaded the git code, I think I have something up with my versioning, i have the latest build 0.6.0.14 of hector and the git download, i have a bunch of classes that do not appear to resolve, some of them are: * KeyspaceOperatorFactory ** ClusterFactory *Cluster Are these classes from a newer or older version of hector maybe, or am I missing some step ? Regards Gavan On Thu, Jun 24, 2010 at 11:09 PM, Ran Tavory ran...@gmail.com wrote: Thanks for this effort Gavan :) On Thu, Jun 24, 2010 at 3:47 PM, Gavan Hood gavan.h...@gmail.comwrote: Thanks Ran, I downloaded those, ReadAllKeys worked straight up, very good example, I have already got ExampleClient working so ditto there :-) I am searching for the defintion of Commandvoid in ExampleDAO getallkey slices and keyspace test have a few more unresolved externals like junit, mockito and other items. I tried downloading the code stack fom git but I am not sure that was a good idea, but it did have some of the files in that download. if you use git that should be straight forward, many developers have done that already. If you just downloaded one of the released versions then lmk if I forgot to include one dependency or another... I noticed a file IterateOverKeysOnly.java on the site too, but that has some issues, some undefined KeySpace entries and other syntax errors. It was contributed by another developer so I don't know. Gavan On Thu, Jun 24, 2010 at 10:18 PM, Ran Tavory ran...@gmail.com wrote: Here's what we have for hector: wiki: http://wiki.github.com/rantav/hector/ blog posts: http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/ http://prettyprint.me/2010/03/03/load-balancing-and-improved-failover-in-hector/ http://prettyprint.me/2010/04/03/jmx-in-hector/ Examples: Example http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.java DAOhttp://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.java http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.javaExample simple clienthttp://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.java http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.javaExample read all keyshttp://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.java http://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.javaget all key slices for… in groups http://pastie.org/957661 http://pastie.org/957661and KeyspaceTesthttp://github.com/rantav/hector/blob/master/src/test/java/me/prettyprint/cassandra/service/KeyspaceTest.java On Thu, Jun 24, 2010 at 1:45 AM, Gavan Hood gavan.h...@gmail.comwrote: Hi all, I have been researching the samples with some success but its taken a while. I am very keen on Cassandra and love the work thats been done, well done everyone involved. I would like to get as many of the samples I can get organized into something that makes it easier to kick of with for people taking the road I am on. If people on this list have code snippets, full example apps, test apps, API test functions etc I would like to hear about them please. My work is in Java so I really want to see those, the others are still of high interest as I will post them all out as I mention below. Ideally I would like to get a small test container set up to allow people to poke and prod API's and see what happens, but like most of us time is the challenge. If I do not get that far I would at least post the findings to page(s) that people can continue to add to, maybe if successful it could then be consumed back into the apachi wiki... If someone has already done this I would love to see the site. Let me know your thoughts, and better yet show me the code :-) Regards Gavan
Re: Call for input of cassandra, thrift , hector, pelops example / sample / test code snippets
On Sat, Jun 26, 2010 at 4:58 PM, GH gavan.h...@gmail.com wrote: I believe that the code I grabbed from git is later than the latest download published on there then. true. All the new classes you mentioned are new. Where would I find that code ? is it available to public (aka me :-) ), if not when would that be. If its a way off, i think I need a version of git that matches the released code base. Check out the branch which is currently at version 14 http://github.com/rantav/hector/tree/0.6.0 (or use the sources from the packaged downloads section http://github.com/rantav/hector/downloads) On Sat, Jun 26, 2010 at 11:42 PM, Ran Tavory ran...@gmail.com wrote: these classes are from a newer version and they should not exist in version 14. On Fri, Jun 25, 2010 at 3:42 PM, Gavan Hood gavan.h...@gmail.com wrote: Hi Ran, I downloaded the git code, I think I have something up with my versioning, i have the latest build 0.6.0.14 of hector and the git download, i have a bunch of classes that do not appear to resolve, some of them are: * KeyspaceOperatorFactory ** ClusterFactory *Cluster Are these classes from a newer or older version of hector maybe, or am I missing some step ? Regards Gavan On Thu, Jun 24, 2010 at 11:09 PM, Ran Tavory ran...@gmail.com wrote: Thanks for this effort Gavan :) On Thu, Jun 24, 2010 at 3:47 PM, Gavan Hood gavan.h...@gmail.comwrote: Thanks Ran, I downloaded those, ReadAllKeys worked straight up, very good example, I have already got ExampleClient working so ditto there :-) I am searching for the defintion of Commandvoid in ExampleDAO getallkey slices and keyspace test have a few more unresolved externals like junit, mockito and other items. I tried downloading the code stack fom git but I am not sure that was a good idea, but it did have some of the files in that download. if you use git that should be straight forward, many developers have done that already. If you just downloaded one of the released versions then lmk if I forgot to include one dependency or another... I noticed a file IterateOverKeysOnly.java on the site too, but that has some issues, some undefined KeySpace entries and other syntax errors. It was contributed by another developer so I don't know. Gavan On Thu, Jun 24, 2010 at 10:18 PM, Ran Tavory ran...@gmail.comwrote: Here's what we have for hector: wiki: http://wiki.github.com/rantav/hector/ blog posts: http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/ http://prettyprint.me/2010/03/03/load-balancing-and-improved-failover-in-hector/ http://prettyprint.me/2010/04/03/jmx-in-hector/ Examples: Example http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.java DAOhttp://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.java http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.javaExample simple clienthttp://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.java http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.javaExample read all keyshttp://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.java http://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.javaget all key slices for… in groups http://pastie.org/957661 http://pastie.org/957661and KeyspaceTesthttp://github.com/rantav/hector/blob/master/src/test/java/me/prettyprint/cassandra/service/KeyspaceTest.java On Thu, Jun 24, 2010 at 1:45 AM, Gavan Hood gavan.h...@gmail.comwrote: Hi all, I have been researching the samples with some success but its taken a while. I am very keen on Cassandra and love the work thats been done, well done everyone involved. I would like to get as many of the samples I can get organized into something that makes it easier to kick of with for people taking the road I am on. If people on this list have code snippets, full example apps, test apps, API test functions etc I would like to hear about them please. My work is in Java so I really want to see those, the others are still of high interest as I will post them all out as I mention below. Ideally I would like to get a small test container set up to allow people to poke and prod API's and see what happens, but like most of us time is the challenge. If I do not get that far I would at least post the findings to page(s) that people can continue to add to, maybe if successful it could then be consumed back into the apachi wiki... If someone has already done this I would love to see the site. Let me know your thoughts, and better yet show me the code :-) Regards Gavan
Re: hector or pelops
on the wiki http://wiki.github.com/rantav/hector/ you can find: Example http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.java DAOhttp://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.java http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.javaExample simple clienthttp://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.java http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.javaExample read all keyshttp://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.java http://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.javaget all key slices for… in groups http://pastie.org/957661 http://pastie.org/957661and KeyspaceTesthttp://github.com/rantav/hector/blob/master/src/test/java/me/prettyprint/cassandra/service/KeyspaceTest.java On Thu, Jun 24, 2010 at 1:12 AM, Gavan Hood gavan.h...@gmail.com wrote: Hi Ran, I have been trialling hector but have not found the the samples you refer to, I found your basic ExampleClient but it does not excercise many functions for instance the getSlice, fields usage etc I want to develop a solid set of tests for each API call, do you have some code that will help me build that. If my code ends up useful I intend to publish it on my website for others to use. Regards Gavan On Thu, Jun 24, 2010 at 4:43 AM, Ran Tavory ran...@gmail.com wrote: As the developer of hector I can only speak in favor of my child of love and I haven't tried pelops so take the following with a grain of salt... Hector sees wide adoption and has been coined the de-facto java client. It's been in use in production critical systems since version 0.5.0 by a few companies. The development team is responsive and accepts patches from the community and is busy with new features and improvements all the time. There's a bug tracking system and all bugs are fixed very fast. There are two active mailing lists one for the developers and one for the users http://wiki.github.com/rantav/hector/mailing-lists (85 members) The project is maintained on github (http://github.com/rantav/hector) and the process in all is very transparent and open to the community. Code is well tested with an embedded version of cassandra which I contributed back to the main cassandra repository, it runs a mvn and an ant build and all release versions are available at http://github.com/rantav/hector/downloads including source code. We love contributions and want to make it as easy as possible to contribute back. I myself have made a few contributions to cassandra core so I'm well familiar with its internals, which doesn't hurt when you write a client... ...and finally the features (just the high level): - connection pooling - datacenter friendly - high level API - all public cassandra versions in the last 6 months - failover - simple LB - extensive JMX - well documented, many examples, wiki, mailing list, team of developers and contributors. ... and of course there's also thrift if you're into hacking on it... On Wed, Jun 23, 2010 at 5:38 PM, Serdar Irmak sir...@protel.com.trwrote: Hi Which java client library do you reccommend, hector or pelops and why ? Best Regards, http://www.protel.com.tr/ -- *- Bu e-posta mesaji kisiye özel olup, gizli bilgiler iceriyor olabilir. Eger bu e-posta mesaji size yanlislikla ulasmissa, e-posta mesajini kullaniciya hemen geri gonderiniz ve mesaj kutunuzdan siliniz. **Bu e- posta mesaji, **hicbir sekilde, herhangi bir amac için dagitilamaz, yayinlanamaz ve para karsiligi satilamaz. Yollayici, bu e-posta mesajinin - **virus koruma sistemleri ile kontrol ediliyor olsa bile - **virus içermedigini garanti etmez ve meydana gelebilecek zararlardan dogacak hiçbir sorumlulugu kabul etmez. - The information contained in this message is confidential, intended solely for the use of the individual or entity to whom it is addressed and may be protected by professional secrecy. You should not copy, disclose or distribute this information for any purpose. If you are not the intended recipient of this message or you receive this mail in error, you should refrain from making any use of the contents and from opening any attachment. In that case, please notify the sender immediately and return the message to the sender, then, delete and destroy all copies. This e-mail message has been swept by anti-virus systems for the presence of computer viruses. In doing so, however, we cannot warrant that virus or other forms of data corruption may not be present and we do not take any responsibility in any occurrence.* --
Re: Hector vs cassandra-java-client
Hector has a pom.xml which and deals with its dependencies as gracefully as it can, but the problem is that hector's dependencies such as cassandra and libthrift aren't in public maven repos. Any suggestions how to deal with that? On Thu, Jun 24, 2010 at 6:00 AM, Kenneth Bartholet kennethbartho...@hotmail.com wrote: Agreed, but at what cost? It's my understanding that the big deterrent is the lack of 3rd party dependencies in maven public repos (e.g. Thrift itself). The option would be to publish a public maven repo containing all dependencies, which ends up being more responsibility then the client developers want to accept. Any volunteers? -Ken To: user@cassandra.apache.org From: bbo...@gmail.com Subject: Re: Hector vs cassandra-java-client Date: Tue, 22 Jun 2010 17:14:53 +0200 Dop Sun su...@dopsun.com writes: Updated. the first Cassandra client lib to make it into the Maven repositories will probably end up with a big audience. :-) -Bjørn -- Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. Learn more.http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
Re: Call for input of cassandra, thrift , hector, pelops example / sample / test code snippets
Thanks for this effort Gavan :) On Thu, Jun 24, 2010 at 3:47 PM, Gavan Hood gavan.h...@gmail.com wrote: Thanks Ran, I downloaded those, ReadAllKeys worked straight up, very good example, I have already got ExampleClient working so ditto there :-) I am searching for the defintion of Commandvoid in ExampleDAO getallkey slices and keyspace test have a few more unresolved externals like junit, mockito and other items. I tried downloading the code stack fom git but I am not sure that was a good idea, but it did have some of the files in that download. if you use git that should be straight forward, many developers have done that already. If you just downloaded one of the released versions then lmk if I forgot to include one dependency or another... I noticed a file IterateOverKeysOnly.java on the site too, but that has some issues, some undefined KeySpace entries and other syntax errors. It was contributed by another developer so I don't know. Gavan On Thu, Jun 24, 2010 at 10:18 PM, Ran Tavory ran...@gmail.com wrote: Here's what we have for hector: wiki: http://wiki.github.com/rantav/hector/ blog posts: http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/ http://prettyprint.me/2010/03/03/load-balancing-and-improved-failover-in-hector/ http://prettyprint.me/2010/04/03/jmx-in-hector/ Examples: Example http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.java DAOhttp://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.java http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.javaExample simple clienthttp://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.java http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.javaExample read all keyshttp://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.java http://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.javaget all key slices for… in groups http://pastie.org/957661 http://pastie.org/957661and KeyspaceTesthttp://github.com/rantav/hector/blob/master/src/test/java/me/prettyprint/cassandra/service/KeyspaceTest.java On Thu, Jun 24, 2010 at 1:45 AM, Gavan Hood gavan.h...@gmail.com wrote: Hi all, I have been researching the samples with some success but its taken a while. I am very keen on Cassandra and love the work thats been done, well done everyone involved. I would like to get as many of the samples I can get organized into something that makes it easier to kick of with for people taking the road I am on. If people on this list have code snippets, full example apps, test apps, API test functions etc I would like to hear about them please. My work is in Java so I really want to see those, the others are still of high interest as I will post them all out as I mention below. Ideally I would like to get a small test container set up to allow people to poke and prod API's and see what happens, but like most of us time is the challenge. If I do not get that far I would at least post the findings to page(s) that people can continue to add to, maybe if successful it could then be consumed back into the apachi wiki... If someone has already done this I would love to see the site. Let me know your thoughts, and better yet show me the code :-) Regards Gavan
Re: hector or pelops
As the developer of hector I can only speak in favor of my child of love and I haven't tried pelops so take the following with a grain of salt... Hector sees wide adoption and has been coined the de-facto java client. It's been in use in production critical systems since version 0.5.0 by a few companies. The development team is responsive and accepts patches from the community and is busy with new features and improvements all the time. There's a bug tracking system and all bugs are fixed very fast. There are two active mailing lists one for the developers and one for the users http://wiki.github.com/rantav/hector/mailing-lists (85 members) The project is maintained on github (http://github.com/rantav/hector) and the process in all is very transparent and open to the community. Code is well tested with an embedded version of cassandra which I contributed back to the main cassandra repository, it runs a mvn and an ant build and all release versions are available at http://github.com/rantav/hector/downloads including source code. We love contributions and want to make it as easy as possible to contribute back. I myself have made a few contributions to cassandra core so I'm well familiar with its internals, which doesn't hurt when you write a client... ...and finally the features (just the high level): - connection pooling - datacenter friendly - high level API - all public cassandra versions in the last 6 months - failover - simple LB - extensive JMX - well documented, many examples, wiki, mailing list, team of developers and contributors. ... and of course there's also thrift if you're into hacking on it... On Wed, Jun 23, 2010 at 5:38 PM, Serdar Irmak sir...@protel.com.tr wrote: Hi Which java client library do you reccommend, hector or pelops and why ? Best Regards, http://www.protel.com.tr/ -- *- Bu e-posta mesaji kisiye özel olup, gizli bilgiler iceriyor olabilir. Eger bu e-posta mesaji size yanlislikla ulasmissa, e-posta mesajini kullaniciya hemen geri gonderiniz ve mesaj kutunuzdan siliniz. **Bu e- posta mesaji, **hicbir sekilde, herhangi bir amac için dagitilamaz, yayinlanamaz ve para karsiligi satilamaz. Yollayici, bu e-posta mesajinin- **virus koruma sistemleri ile kontrol ediliyor olsa bile - **virus içermedigini garanti etmez ve meydana gelebilecek zararlardan dogacak hiçbir sorumlulugu kabul etmez. - The information contained in this message is confidential, intended solely for the use of the individual or entity to whom it is addressed and may be protected by professional secrecy. You should not copy, disclose or distribute this information for any purpose. If you are not the intended recipient of this message or you receive this mail in error, you should refrain from making any use of the contents and from opening any attachment. In that case, please notify the sender immediately and return the message to the sender, then, delete and destroy all copies. This e-mail message has been swept by anti-virus systems for the presence of computer viruses. In doing so, however, we cannot warrant that virus or other forms of data corruption may not be present and we do not take any responsibility in any occurrence.* --
Re: Instability and memory problems
I don't have the answer but if you provide jmap output, cfstats output that may help. Are you using mmap files? Do you see swap? Gc in the logs? On Jun 20, 2010 7:25 PM, James Golick jamesgol...@gmail.com wrote: As I alluded to in another post, we just moved from 2-4 nodes. Since then, the cluster has been incredibly The memory problems I've posted about before have gotten much worse and our nodes are becoming incredibly slow/unusable every 24 hours or so. Basically, the JVM reports that only 14GB is committed, but the RSS of the process is 22GB, and cassandra is completely unresponsive, but still having requests routed to it internally, so it completely destroys performance. I'm at a loss for how to diagnose this issue. In addition to that, read performance has gone way downhill, and query latency is much slower than it was with a 2 node cluster. Perhaps this was to be expected, though. We really like cassandra for the most part, but these stability issues are going to force us to abandon it. Our application is like a yoyo right now, and we can't live with that. Help resolving these issues would be greatly appreciated.
Re: Client connection and data distribution across nodes
On Thu, Jun 17, 2010 at 8:52 AM, Mubarak Seyed se...@apple.com wrote: Hi All, Regarding client thrift connection, i have 4 nodes which formed a ring, but client only knows the IP address of an one node (and thrift RPC port number), how does client can connect to any one other node without getting ring information? Can we keep the load balancer and bind all the four nodes or client needs to know the IP address of all the 4 nodes? If you use java there are higher level libraries that manage ring information for you, so they may help. If not, I guess you'll need to call the describe_ring thrift api. Regarding storage management, for instance, if we want to store 100k records, but each 25k records on each node, something like node 1 - 25K node 2 - 25K node 3 - 25K node 4 - 25K Can we accomplish using OrderPreservingPartitioner (OPP)? How does replication happen between nodes if we keep only 25k records in one node? Can someone please let me know. Thanks in advance. Thanks, Mubarak
Re: Cassandra questions
On Thu, Jun 17, 2010 at 9:09 PM, F. Hugo Zwaal hzw...@yahoo.com wrote: Hi, Being fairly new to Cassandra I have a couple of questions: 1) Is there a way to remove multiple keys/rows in one operation (batch) or must keys be removed one by one? yes, batch_mutate 2) I see API references to version 0.7, but I couldn't find a alpha or beta anywhere? Does it exist already and if so, where can I get it? Or else, when is it planned to be public/released? 0.7 is still in development and is the trunk. Latest stable is 0.6.2. I don't know what the planned date for 0.7.0, but there will also be an 0.6.3 before it. Thanks in advance, Hugo.
Re: batch_mutate atomic?
no, it's not atomic. it just shortens the roundtrip of many update requests. Some may fail and some may succeed On Mon, Jun 14, 2010 at 2:40 PM, Per Olesen p...@trifork.com wrote: Can I expect batch_mutate to work in what I would think of as an atomic operation? That either all the mutations in the batch_mutate call are executed or none of them are? Or can some of them fail while some of them succeeds?
Re: Pelops - a new Java client library paradigm
Nice going, Dominic, having a clear API for cassandra is a big step forward :) Interestingly, at hector we came up with similar approach, just didn't find the time for code that, as production systems keep me busy at nights as well... We started with the implementation of BatchMutation, but the rest of the API improvements are still TODO Keep up the good work, competition keeps us healthy ;) On Fri, Jun 11, 2010 at 4:41 PM, Dominic Williams thedwilli...@googlemail.com wrote: Pelops is a new high quality Java client library for Cassandra. It has a design that: * reveals the full power of Cassandra through an elegant Mutator and Selector paradigm * generates better, cleaner, less bug prone code * reduces the learning curve for new users * drives rapid application development * encapsulates advanced pooling algorithms An article introducing Pelops can be found at http://ria101.wordpress.com/2010/06/11/pelops-the-beautiful-cassandra-database-client-for-java/ Thanks for reading. Best, Dominic
Re: cassandra out of heap space crash
Gary fwiw I get oom with Cl one quite commonly if I'm not careful with my writes On Jun 11, 2010 8:48 PM, Jonathan Ellis jbel...@gmail.com wrote: We give you enough rope to hang yourself. Don't use ZERO if that's not what you want. :) On Fri, Jun 11, 2010 at 9:23 AM, William Ashley wash...@gmail.com wrote: Would it be reasonable... -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Passing client as parameter
You can look at http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/CassandraClientFactory.java so, to close the client you can just get the transport out of the client (bold): private void closeClient(CassandraClient cclient) { log.debug(Closing client {}, cclient); ((CassandraClientPoolImpl) pool).reportDestroyed(cclient); Cassandra.Client client = cclient.getCassandra(); *client.getInputProtocol().getTransport().close();* *client.getOutputProtocol().getTransport().close();* cclient.markAsClosed(); } But to create a client you need a transport (bold): private Cassandra.Client createThriftClient(String url, int port) throws TTransportException , TException { log.debug(Creating a new thrift connection to {}:{}, url, port); TTransport tr; if (useThriftFramedTransport) { tr = new TFramedTransport(new TSocket(url, port, timeout)); } else { tr = new TSocket(url, port, timeout); } TProtocol proto = new TBinaryProtocol(tr); *Cassandra.Client client = new Cassandra.Client(proto);* try { tr.open(); } catch (TTransportException e) { // Thrift exceptions aren't very good in reporting, so we have to catch the exception here and // add details to it. log.error(Unable to open transport to + url + : + port, e); clientMonitor.incCounter(Counter.CONNECT_ERROR); throw new TTransportException(Unable to open transport to + url + : + port + , + e.getLocalizedMessage(), e); } return client; } So what you can do is instead of passing a client to the method, pass a URL to the method. The method would open the transport, create a client, make some cassandra operations and then close the transport. On Wed, Jun 9, 2010 at 10:35 PM, Steven Haar sh...@vintagesoftware.comwrote: C# On Wed, Jun 9, 2010 at 2:34 PM, Ran Tavory ran...@gmail.com wrote: Some languages have higher level clients that might help you. What language are you using? On Jun 9, 2010 9:01 PM, Steven Haar sh...@vintagesoftware.com wrote: What is the best way to pass a Cassandra client as a parameter? If you pass it as a parameter, do you also have to pass the transport in order to be able to close the connection? Is there any way to open or close the transport direclty from the client? Essentailly what I want to do is pass a Cassandra client to a method and then within that method be able to open the transport, execute a get or set to the Cassandra database, and then close the transport all witihin the method. The only way I see to do this is to also pass the transport to the method.
Re: cassandra out of heap space crash
I can't say exactly how much memory is the correct amount, but surely 1G is very little. By replicating 3 times your cluster now makes 3 times more work than it used to do, both on reads and on writes while the readers/writers continue hammering it the same pace. So once you've upped your memory (try 4g, if not enough 8g etc) if this still doesn't help, you want to look at either adding capacity or slowing down your writes. Which consistency level are you writing with? You can try ALL, this will slow down your writes just as much needed by the cluster to catch its breath (or so I hope, I never actually tried that...) On Fri, Jun 11, 2010 at 12:26 AM, Julie julie.su...@nextcentury.com wrote: I am running an 8 node cassandra cluster with each node on its own dedicated VM. My app very quickly populates the database with about 100,000 rows of data (each row is about 100K bytes) times the number of nodes in my cluster so there's about 100,000 rows of data on each node (seems very evenly distributed). I have been running my app fairly successfully but today changed the replication factor from 1 to 3. (I first took down the servers, nuked their data directories, copied over the new storage-conf.xml to each node, then restarted the servers.) My app begins by populating the database with fresh data. During the writing phase, all the cassandra servers, one by one, started getting an out-of-memory exception. Here's the output from the first to die: INFO [COMMIT-LOG-WRITER] 2010-06-10 14:18:54,609 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/var/lib/cassandra/commitlog/CommitLog-1276193883235.log) INFO [ROW-MUTATION-STAGE:5] 2010-06-10 14:18:55,499 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(Standard1)@19571399 INFO [GMFD:1] 2010-06-10 14:19:01,556 Gossiper.java (line 568) InetAddress /10.210.69.221 is now UP INFO [GMFD:1] 2010-06-10 14:20:35,136 Gossiper.java (line 568) InetAddress /10.254.242.228 is now UP INFO [GMFD:1] 2010-06-10 14:20:35,137 Gossiper.java (line 568) InetAddress /10.201.207.129 is now UP INFO [GMFD:1] 2010-06-10 14:20:36,922 Gossiper.java (line 568) InetAddress /10.198.37.241 is now UP INFO [GC inspection] 2010-06-10 14:19:03,722 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 2164 ms, 8754168 reclaimed leaving 1070909048 used; max is 1174339584 INFO [GC inspection] 2010-06-10 14:21:09,068 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 2151 ms, 78896080 reclaimed leaving 994679752 used; max is 1174339584 INFO [Timer-1] 2010-06-10 14:21:09,068 Gossiper.java (line 179) InetAddress /10.198.37.241 is now dead. INFO [Timer-1] 2010-06-10 14:21:12,045 Gossiper.java (line 179) InetAddress /10.210.69.221 is now dead. INFO [GMFD:1] 2010-06-10 14:21:12,046 Gossiper.java (line 568) InetAddress /10.210.203.210 is now UP INFO [GMFD:1] 2010-06-10 14:21:12,306 Gossiper.java (line 568) InetAddress /10.210.69.221 is now UP INFO [GMFD:1] 2010-06-10 14:21:12,306 Gossiper.java (line 568) InetAddress /10.192.218.117 is now UP INFO [GMFD:1] 2010-06-10 14:21:12,306 Gossiper.java (line 568) InetAddress /10.198.37.241 is now UP INFO [GMFD:1] 2010-06-10 14:21:12,307 Gossiper.java (line 568) InetAddress /10.254.138.226 is now UP ERROR [ROW-MUTATION-STAGE:25] 2010-06-10 14:21:15,127 CassandraDaemon.java (line 78) Fatal exception in thread Thread[ROW-MUTATION-STAGE:25,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:84) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:29) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns (ColumnFamilySerializer.java:117) at org.apache.cassandra.db.ColumnFamilySerializer.deserialize (ColumnFamilySerializer.java:108) at org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps (RowMutation.java:359) at org.apache.cassandra.db.RowMutationSerializer.deserialize (RowMutation.java:369) at org.apache.cassandra.db.RowMutationSerializer.deserialize (RowMutation.java:322) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb (RowMutationVerbHandler.java:45) at org.apache.cassandra.net.MessageDeliveryTask.run (MessageDeliveryTask.java:40) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask (ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) ERROR [ROW-MUTATION-STAGE:18] 2010-06-10 14:21:15,129 CassandraDaemon.java (line 78) Fatal exception in thread Thread[ROW-MUTATION-STAGE:18,5,main] Within 15 minutes, all 8 nodes died while my app continued trying to populate the database. Is there something I am doing wrong? I am populating the database very quickly by writing 100 rows at once in each of 8 clients,
Re: Tree Search in Cassandra
Quote from Gary: batch_mutate makes no atomicity guarantees. It s intended to help avoiding many round-trips. It can fail half-way through leaving you with a partially completed batch. On Mon, Jun 7, 2010 at 9:39 AM, David Boxenhorn da...@lookin2.com wrote: Is batch mutate atomic? If not, can we make it so? On Mon, Jun 7, 2010 at 4:11 AM, Tatu Saloranta tsalora...@gmail.comwrote: Yeah, or maybe just clustering, since there is no branching structure. It's quite commonly useful even on regular b-tree style storage (BDB et al), as it can reduce per-entry overhead quite a bit. And allows very efficient compression, if entries have lots of redundancy (xml or json serialized data). I doubt this can be done reliably from client perspective. While a good idea from functionality perspective, problem is that it requires some level of atomic operations or locking, since updates are multi-step operations. From server side I guess it would be similar to work on allowing atomic multi-part operations (like ones being worked on to implement counters?). -+ Tatu +- On Sun, Jun 6, 2010 at 2:19 AM, Ran Tavory ran...@gmail.com wrote: sounds interesting... btree on top of cassandra ;) On Sun, Jun 6, 2010 at 12:16 PM, David Boxenhorn da...@lookin2.com wrote: I'm still thinking about the problem of how to handle range queries on very large sets of data, using Random Partitioning. Has anyone used tree search to solve this? What do you think? More specifically, something like this: - Store a maximum of 1000 values per supercolumn (or some other fixed number) - Each supercolumn has a greaterChild and a lessChild in addition to the values - When the number of values in the supercolumn grows beyond the maximum, split it into 3 parts, with the top third going into greaterChild and the bottom third into lessChild - To find a value, look at greaterChild and lessChild to find out whether your key is within the current range, and if not, where to look next - Range searches mean finding the first value, then looking at greaterChild or lessChild (depending on the direction of your search) until you reach the end of the range. Super Column Family: index [ columnFamilyId [ firstVal : val , lastVal : val , val : dataId, lessChild : columnFamilyId , greaterChild : columnFamilyId ]
Re: Is ReplicationFactor values number of replicas or number of copies of data?
to have two copies you need RF=2. RF=0 doesn't make sense as far as I understand it. On Mon, Jun 7, 2010 at 2:16 PM, Per Olesen p...@trifork.com wrote: Hi, I am unclear about what the ReplicationFactor value means. Does RF=1 mean that there is only one single node that has the data in the cluster (actually no replication), or, does it mean, that there are two copies of the data - one actual and one replica (as in replicated one time)? I noticed, that I CAN start a node with RF=0, but I get UnavailableException when trying to insert, so I assume RF=0 is wrong then? Put another way: If I want my data to always live on exactly 2 nodes in the cluster, do I set RF=2 or RF=1? :-) /Per
Re: nodetool cleanup isn't cleaning up?
getRangeToEndpointMap is very useful, thanks, I didn't know about it... however, I've reconfigured my cluster since (moved some nodes and tokens) so not the problem is gone. I guess I'll use getRangeToEndpointMap next time I see something like this... On Thu, Jun 3, 2010 at 9:15 AM, Jonathan Ellis jbel...@gmail.com wrote: Then the next step is to check StorageService.getRangeToEndpointMap via jmx On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory ran...@gmail.com wrote: I'm using RackAwareStrategy. But it still doesn't make sense I think... let's see what did I miss... According to http://wiki.apache.org/cassandra/Operations RackAwareStrategy: replica 2 is placed in the first node along the ring the belongs in another data center than the first; the remaining N-2 replicas, if any, are placed on the first nodes along the ring in the same rack as the first 192.168.252.124Up803.33 MB 56713727820156410577229101238628035242 |--| 192.168.252.99Up 352.85 MB 56713727820156410577229101238628035243 | ^ 192.168.252.125Up134.24 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 676.41 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 99.74 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 99.94 MB 170141183460469231731687303715884105727|--| Alright, so I made a mistake and didn't use the alternate-datacenter suggestion on the page so the first node of every DC is overloaded with replicas. However, the current situation still doesn't make sense to me. .252.124 will be overloaded b/c it has the first token in the 252 dc. .254.57 will also be overloaded since it has the first token in the .254 DC. But for which node does 252.99 serve as a replicator? It's not the first in the DC and it's just one single token more than it's predecessor (which is in the same DC). On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis jbel...@gmail.com wrote: I'm saying that .99 is getting a copy of all the data for which .124 is the primary. (If you are using RackUnawarePartitioner. If you are using RackAware it is some other node.) On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory ran...@gmail.com wrote: ok, let me try and translate your answer ;) Are you saying that the data that was left on the node is non-primary-replicas of rows from the time before the move? So this implies that when a node moves in the ring, it will affect distribution of: - new keys - old keys primary node -- but will not affect distribution of old keys non-primary replicas. If so, still I don't understand something... I would expect even the non-primary replicas of keys to be moved since if they don't, how would they be found? I mean upon reads the serving node should not care about whether the row is new or old, it should have a consistent and global mapping of tokens. So I guess this ruins my theory... What did you mean then? Is this deletions of non-primary replicated data? How does the replication factor affect the load on the moved host then? On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis jbel...@gmail.com wrote: well, there you are then. On Mon, May 31, 2010 at 2:34 PM, Ran Tavory ran...@gmail.com wrote: yes, replication factor = 2 On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis jbel...@gmail.com wrote: you have replication factor 1 ? On Mon, May 31, 2010 at 7:23 AM, Ran Tavory ran...@gmail.com wrote: I hope I understand nodetool cleanup correctly - it should clean up all data that does not (currently) belong to this node. If so, I think it might not be working correctly. Look at nodes 192.168.252.124 and 192.168.252.99 below 192.168.252.99Up 279.35 MB 3544607988759775661076818827414252202 |--| 192.168.252.124Up 167.23 MB 56713727820156410577229101238628035242 | ^ 192.168.252.125Up 82.91 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 366.6 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 88.44 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 88.45 MB 170141183460469231731687303715884105727|--| I wanted 124 to take all the load from 99. So I issued a move command. $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243 This command tells 99 to take the space b/w (56713727820156410577229101238628035242, 56713727820156410577229101238628035243] which is basically just one item in the token space, almost nothing... I wanted it to be very slim (just playing around). So, next I get this: 192.168.252.124Up 803.33 MB
Re: Number of client connections
as far as I know, only the os level limitations, e.g. typically ~60k On Thu, Jun 3, 2010 at 9:34 AM, Lev Stesin lev.ste...@gmail.com wrote: Hi, Is there a limit on the number of client connections to a node? Thanks. -- Lev
Re: nodetool cleanup isn't cleaning up?
ok, let me try and translate your answer ;) Are you saying that the data that was left on the node is non-primary-replicas of rows from the time before the move? So this implies that when a node moves in the ring, it will affect distribution of: - new keys - old keys primary node -- but will not affect distribution of old keys non-primary replicas. If so, still I don't understand something... I would expect even the non-primary replicas of keys to be moved since if they don't, how would they be found? I mean upon reads the serving node should not care about whether the row is new or old, it should have a consistent and global mapping of tokens. So I guess this ruins my theory... What did you mean then? Is this deletions of non-primary replicated data? How does the replication factor affect the load on the moved host then? On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis jbel...@gmail.com wrote: well, there you are then. On Mon, May 31, 2010 at 2:34 PM, Ran Tavory ran...@gmail.com wrote: yes, replication factor = 2 On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis jbel...@gmail.com wrote: you have replication factor 1 ? On Mon, May 31, 2010 at 7:23 AM, Ran Tavory ran...@gmail.com wrote: I hope I understand nodetool cleanup correctly - it should clean up all data that does not (currently) belong to this node. If so, I think it might not be working correctly. Look at nodes 192.168.252.124 and 192.168.252.99 below 192.168.252.99Up 279.35 MB 3544607988759775661076818827414252202 |--| 192.168.252.124Up 167.23 MB 56713727820156410577229101238628035242 | ^ 192.168.252.125Up 82.91 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 366.6 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 88.44 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 88.45 MB 170141183460469231731687303715884105727|--| I wanted 124 to take all the load from 99. So I issued a move command. $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243 This command tells 99 to take the space b/w (56713727820156410577229101238628035242, 56713727820156410577229101238628035243] which is basically just one item in the token space, almost nothing... I wanted it to be very slim (just playing around). So, next I get this: 192.168.252.124Up 803.33 MB 56713727820156410577229101238628035242 |--| 192.168.252.99Up 352.85 MB 56713727820156410577229101238628035243 | ^ 192.168.252.125Up 134.24 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 676.41 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 99.74 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 99.94 MB 170141183460469231731687303715884105727|--| The tokens are correct, but it seems that 99 still has a lot of data. Why? OK, that might be b/c it didn't delete its moved data. So next I issued a nodetool cleanup, which should have taken care of that. Only that it didn't, the node 99 still has 352 MB of data. Why? So, you know what, I waited for 1h. Still no good, data wasn't cleaned up. I restarted the server. Still, data wasn't cleaned up... I issued a cleanup again... still no good... what's up with this node? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: nodetool cleanup isn't cleaning up?
I'm using RackAwareStrategy. But it still doesn't make sense I think... let's see what did I miss... According to http://wiki.apache.org/cassandra/Operations - RackAwareStrategy: replica 2 is placed in the first node along the ring the belongs in *another* data center than the first; the remaining N-2 replicas, if any, are placed on the first nodes along the ring in the * same* rack as the first 192.168.252.124Up803.33 MB 56713727820156410577229101238628035242 |--| 192.168.252.99Up 352.85 MB 56713727820156410577229101238628035243 | ^ 192.168.252.125Up134.24 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 676.41 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 99.74 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 99.94 MB 170141183460469231731687303715884105727|--| Alright, so I made a mistake and didn't use the alternate-datacenter suggestion on the page so the first node of every DC is overloaded with replicas. However, the current situation still doesn't make sense to me. .252.124 will be overloaded b/c it has the first token in the 252 dc. .254.57 will also be overloaded since it has the first token in the .254 DC. But for which node does 252.99 serve as a replicator? It's not the first in the DC and it's just one single token more than it's predecessor (which is in the same DC). On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis jbel...@gmail.com wrote: I'm saying that .99 is getting a copy of all the data for which .124 is the primary. (If you are using RackUnawarePartitioner. If you are using RackAware it is some other node.) On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory ran...@gmail.com wrote: ok, let me try and translate your answer ;) Are you saying that the data that was left on the node is non-primary-replicas of rows from the time before the move? So this implies that when a node moves in the ring, it will affect distribution of: - new keys - old keys primary node -- but will not affect distribution of old keys non-primary replicas. If so, still I don't understand something... I would expect even the non-primary replicas of keys to be moved since if they don't, how would they be found? I mean upon reads the serving node should not care about whether the row is new or old, it should have a consistent and global mapping of tokens. So I guess this ruins my theory... What did you mean then? Is this deletions of non-primary replicated data? How does the replication factor affect the load on the moved host then? On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis jbel...@gmail.com wrote: well, there you are then. On Mon, May 31, 2010 at 2:34 PM, Ran Tavory ran...@gmail.com wrote: yes, replication factor = 2 On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis jbel...@gmail.com wrote: you have replication factor 1 ? On Mon, May 31, 2010 at 7:23 AM, Ran Tavory ran...@gmail.com wrote: I hope I understand nodetool cleanup correctly - it should clean up all data that does not (currently) belong to this node. If so, I think it might not be working correctly. Look at nodes 192.168.252.124 and 192.168.252.99 below 192.168.252.99Up 279.35 MB 3544607988759775661076818827414252202 |--| 192.168.252.124Up 167.23 MB 56713727820156410577229101238628035242 | ^ 192.168.252.125Up 82.91 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 366.6 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 88.44 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 88.45 MB 170141183460469231731687303715884105727|--| I wanted 124 to take all the load from 99. So I issued a move command. $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243 This command tells 99 to take the space b/w (56713727820156410577229101238628035242, 56713727820156410577229101238628035243] which is basically just one item in the token space, almost nothing... I wanted it to be very slim (just playing around). So, next I get this: 192.168.252.124Up 803.33 MB 56713727820156410577229101238628035242 |--| 192.168.252.99Up 352.85 MB 56713727820156410577229101238628035243 | ^ 192.168.252.125Up 134.24 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 676.41 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 99.74 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 99.94 MB 170141183460469231731687303715884105727|--| The tokens are correct, but it seems that 99 still has a lot of data. Why? OK, that might be b
HintedHandoffEnabled
In 0.6.2 I disabled hinted handoff, however tpstats and cfstats report seems odd. On all servers in the cluster I have: HintedHandoffEnabledfalse/HintedHandoffEnabled tpstats reports 5 completed handoffs. $ nodetool -h cass25 -p 9004 tpstats Pool NameActive Pending Completed FILEUTILS-DELETE-POOL 0 0 2 STREAM-STAGE 0 0 0 RESPONSE-STAGE0 05903099 ROW-READ-STAGE0 0 669093 LB-OPERATIONS 0 0 0 MESSAGE-DESERIALIZER-POOL 1 06595504 GMFD 0 0 35947 LB-TARGET 0 0 0 CONSISTENCY-MANAGER 0 0 669095 ROW-MUTATION-STAGE0 0 644360 MESSAGE-STREAMING-POOL0 0 0 LOAD-BALANCER-STAGE 0 0 0 FLUSH-SORTER-POOL 0 0 0 MEMTABLE-POST-FLUSHER 0 0 7 FLUSH-WRITER-POOL 0 0 7 AE-SERVICE-STAGE 0 0 1 HINTED-HANDOFF-POOL 0 0 5 In data/system/* there are only LocationInfo files, so looks like hinted handoff is indeed disabled and cfstats does indicate there are 0 bytes, however it also indicates of 32 reads which I didn't expect (cluster has been up for a few hours). $ nodetool -h cass25 -p 9004 tpstats ... Column Family: HintsColumnFamily SSTable count: 0 Space used (live): 0 Space used (total): 0 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 32 Read Latency: 0.062 ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 1 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0 Any idea why is this happening?
nodetool cleanup isn't cleaning up?
I hope I understand nodetool cleanup correctly - it should clean up all data that does not (currently) belong to this node. If so, I think it might not be working correctly. Look at nodes 192.168.252.124 and 192.168.252.99 below 192.168.252.99Up 279.35 MB 3544607988759775661076818827414252202 |--| 192.168.252.124Up 167.23 MB 56713727820156410577229101238628035242 | ^ 192.168.252.125Up 82.91 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 366.6 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 88.44 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 88.45 MB 170141183460469231731687303715884105727|--| I wanted 124 to take all the load from 99. So I issued a move command. $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243 This command tells 99 to take the space b/w (56713727820156410577229101238628035242, 56713727820156410577229101238628035243] which is basically just one item in the token space, almost nothing... I wanted it to be very slim (just playing around). So, next I get this: 192.168.252.124Up 803.33 MB 56713727820156410577229101238628035242 |--| 192.168.252.99Up 352.85 MB 56713727820156410577229101238628035243 | ^ 192.168.252.125Up 134.24 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 676.41 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 99.74 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 99.94 MB 170141183460469231731687303715884105727|--| The tokens are correct, but it seems that 99 still has a lot of data. Why? OK, that might be b/c it didn't delete its moved data. So next I issued a nodetool cleanup, which should have taken care of that. Only that it didn't, the node 99 still has 352 MB of data. Why? So, you know what, I waited for 1h. Still no good, data wasn't cleaned up. I restarted the server. Still, data wasn't cleaned up... I issued a cleanup again... still no good... what's up with this node?
Re: nodetool cleanup isn't cleaning up?
Do you think it's the tombstones that take up the disk space? Shouldn't the tombstones be moved along with the data? On Mon, May 31, 2010 at 3:29 PM, Maxim Kramarenko maxi...@trackstudio.comwrote: Hello! You likely need wait for GCGraceSeconds seconds or modify this param. http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html === Thus, a delete operation can't just wipe out all traces of the data being removed immediately: if we did, and a replica did not receive the delete operation, when it becomes available again it will treat the replicas that did receive the delete as having missed a write update, and repair them! So, instead of wiping out data on delete, Cassandra replaces it with a special value called a tombstone. The tombstone can then be propagated to replicas that missed the initial remove request. ... Here, we defined a constant, GCGraceSeconds, and had each node track tombstone age locally. Once it has aged past the constant, it can be GC'd. === On 31.05.2010 16:23, Ran Tavory wrote: I hope I understand nodetool cleanup correctly - it should clean up all data that does not (currently) belong to this node. If so, I think it might not be working correctly. Look at nodes 192.168.252.124 and 192.168.252.99 below 192.168.252.99Up 279.35 MB 3544607988759775661076818827414252202 |--| 192.168.252.124Up 167.23 MB 56713727820156410577229101238628035242 | ^ 192.168.252.125Up 82.91 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 366.6 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 88.44 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 88.45 MB 170141183460469231731687303715884105727|--| I wanted 124 to take all the load from 99. So I issued a move command. $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243 This command tells 99 to take the space b/w (56713727820156410577229101238628035242, 56713727820156410577229101238628035243] which is basically just one item in the token space, almost nothing... I wanted it to be very slim (just playing around). So, next I get this: 192.168.252.124Up 803.33 MB 56713727820156410577229101238628035242 |--| 192.168.252.99Up 352.85 MB 56713727820156410577229101238628035243 | ^ 192.168.252.125Up 134.24 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 676.41 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 99.74 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 99.94 MB 170141183460469231731687303715884105727|--| The tokens are correct, but it seems that 99 still has a lot of data. Why? OK, that might be b/c it didn't delete its moved data. So next I issued a nodetool cleanup, which should have taken care of that. Only that it didn't, the node 99 still has 352 MB of data. Why? So, you know what, I waited for 1h. Still no good, data wasn't cleaned up. I restarted the server. Still, data wasn't cleaned up... I issued a cleanup again... still no good... what's up with this node? -- Best regards, Maximmailto:maxi...@trackstudio.com LinkedIn Profile: http://www.linkedin.com/in/maximkr Google Talk/Jabber: maxi...@gmail.com ICQ number: 307863079 Skype Chat: maxim.kramarenko Yahoo! Messenger: maxim_kramarenko
Re: nodetool cleanup isn't cleaning up?
yes, replication factor = 2 On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis jbel...@gmail.com wrote: you have replication factor 1 ? On Mon, May 31, 2010 at 7:23 AM, Ran Tavory ran...@gmail.com wrote: I hope I understand nodetool cleanup correctly - it should clean up all data that does not (currently) belong to this node. If so, I think it might not be working correctly. Look at nodes 192.168.252.124 and 192.168.252.99 below 192.168.252.99Up 279.35 MB 3544607988759775661076818827414252202 |--| 192.168.252.124Up 167.23 MB 56713727820156410577229101238628035242 | ^ 192.168.252.125Up 82.91 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 366.6 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 88.44 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 88.45 MB 170141183460469231731687303715884105727|--| I wanted 124 to take all the load from 99. So I issued a move command. $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243 This command tells 99 to take the space b/w (56713727820156410577229101238628035242, 56713727820156410577229101238628035243] which is basically just one item in the token space, almost nothing... I wanted it to be very slim (just playing around). So, next I get this: 192.168.252.124Up 803.33 MB 56713727820156410577229101238628035242 |--| 192.168.252.99Up 352.85 MB 56713727820156410577229101238628035243 | ^ 192.168.252.125Up 134.24 MB 85070591730234615865843651857942052863 v | 192.168.254.57Up 676.41 MB 113427455640312821154458202477256070485| ^ 192.168.254.58Up 99.74 MB 141784319550391026443072753096570088106v | 192.168.254.59Up 99.94 MB 170141183460469231731687303715884105727|--| The tokens are correct, but it seems that 99 still has a lot of data. Why? OK, that might be b/c it didn't delete its moved data. So next I issued a nodetool cleanup, which should have taken care of that. Only that it didn't, the node 99 still has 352 MB of data. Why? So, you know what, I waited for 1h. Still no good, data wasn't cleaned up. I restarted the server. Still, data wasn't cleaned up... I issued a cleanup again... still no good... what's up with this node? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: RE: Hector samples -- where?
it's here http://github.com/rantav/hector/blob/master/src/test/java/me/prettyprint/cassandra/service/KeyspaceTest.java On Wed, May 26, 2010 at 8:18 AM, Nicholas Sun nick@raytheon.com wrote: Could you please provide some indication as to their location? Thanks. Nick *From:* Ran Tavory [mailto:ran...@gmail.com] *Sent:* Tuesday, May 25, 2010 9:15 PM *To:* user@cassandra.apache.org *Subject:* Re: RE: Hector samples -- where? The best examples are in KeyspaceTest but don't include all scenarios On May 26, 2010 2:27 AM, Nicholas Sun nick@raytheon.com wrote: I am also interested in this. It seems like adding multiple Cols into a CF or SuperCols would be very useful. Like a dataload type capability? Nick -Original Message- From: Bill de hOra [mailto:b...@dehora.net] Sent: Tuesday, May 25, 2010...
Re: Questions regarding batch mutates and transactions
The summary of your question is: is batch_mutate atomic in the general sense, meaning when used with multiple keys, multiple column families etc, correct? On Wed, May 26, 2010 at 12:45 PM, Todd Nine t...@spidertracks.co.nz wrote: Hey guys, I originally asked this on the Hector group, but no one was sure of the answer. Can I get some feedback on this. I'd prefer to avoid having to use something like Cages if I can for most of our use cases. Long term I can see we'll need to use something like Cages, especially when it comes to complex operations such as billing. However for a majority of our uses, I think it's a bit overkill. I've used transactions heavily in the workplace on SQL based app developments. To be honest, a majority of application's I've built utilize optimistic locking, and only the atomic, consistent, and durable functionality of transactional ACID properties. To encapsulate all 3, I essentially need all writes to cassandra for a given business invocation to occur in a single write. With Spring, I would implement my own transaction manager which simply adds all mutates and delete ops to a batch mutate. When my transaction commits, I would execute the mutation on the given keyspace. Now this would only work if the following semantics apply. I've tried searching for details in Cassandra's batch mutate, but I'm not finding what I need. Here are 2 use cases as an example. Case 1: Successful update : User adds new contact Transaction Start. Biz op 1. Row is created in contacts and all data is added via batch mutation Biz op 2. Row is created for an SMS message is created for queueing through the SMS gateway return op 2 return op 1 Transaction Commit (batch mutate executed) Case 2. Failed update: User adds new contact Biz op 1. Row is created in contacts Biz op 2. Row is created for SMS message queuing. Fails due to invalid international phone number format return op 2 return op 1 Transaction is rolled back (batch mutate never executed) Now, here is where I can't find what I need in the doc. In case 1, if my mutation from biz op 2 were to fail during a batch mutate operation encapsulating all mutations, does the batch mutation as a whole not get executed, or would I still have the mutation from op 1 written to cassandra while the op 2 write fails? Thanks,
Re: Error reporting Key cache hit rate with cfstats or with JMX
If I disable row cache the numbers look good - key cache hit rate is 0, so it seems to be related to row cache. Interestingly, after running for a really long time and with both row and keys caches I do start to see Key cache hit rate 0 but the numbers are so small that it doesn't make sense. I have capacity for 10M keys and 10M rows, the number of cached keys is ~5M and very similarly the number of cached rows is also ~5M, however the hit rates are very different, 0.7 for rows and 0.006 for keys. I'd expect the keys hit rate to be identical since none of them reached the limit yet. Key cache capacity: 1000 Key cache size: 5044097 Key cache hit rate: 0.0062089764058896576 Row cache capacity: 1000 Row cache size: 5057231 Row cache hit rate: 0.7361241352465543 On Tue, May 25, 2010 at 3:43 PM, Jonathan Ellis jbel...@gmail.com wrote: What happens if you disable row cache? On Tue, May 25, 2010 at 4:53 AM, Ran Tavory ran...@gmail.com wrote: It seems there's an error reporting the Key cache hit rate. The value is always 0.0 and I have a feeling it's incorrect. This is seen both by using notetool cfstats as well as accessing JMX directly (org.apache.cassandra.db:type=Caches,keyspace=outbrain_kvdb,cache=KvAdsKeyCache RecentHitRate) ColumnFamily CompareWith=BytesType Name=KvAds RowsCached=1000 KeysCached=1000/ Column Family: KvAds SSTable count: 7 Space used (live): 1288942061 Space used (total): 1559831566 Memtable Columns Count: 73698 Memtable Data Size: 17121092 Memtable Switch Count: 33 Read Count: 3614433 Read Latency: 0.068 ms. Write Count: 3503269 Write Latency: 0.024 ms. Pending Tasks: 0 Key cache capacity: 1000 Key cache size: 619624 Key cache hit rate: 0.0 Row cache capacity: 1000 Row cache size: 447154 Row cache hit rate: 0.8460295730014572 Compacted row minimum size: 387 Compacted row maximum size: 31430 Compacted row mean size: 631 The Row cache hit rate looks good, 0.8 but Key cache hit rate always seems to be 0.0 while the number of unique keys stays about 619624 for quite a while. Is it a real caching problem or just a reporting glitch? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Error reporting Key cache hit rate with cfstats or with JMX
so the row cache contains both rows and keys and if I have large enough row cache (in particular if row cache size equals key cache size) then it's just wasteful to keep another key cache and I should eliminate the key-cache, correct? On Thu, May 27, 2010 at 1:21 AM, Jonathan Ellis jbel...@gmail.com wrote: It sure sounds like you're seeing the my row cache contains the entire hot data set, so the key cache only gets the cold reads effect. On Wed, May 26, 2010 at 2:54 PM, Ran Tavory ran...@gmail.com wrote: If I disable row cache the numbers look good - key cache hit rate is 0, so it seems to be related to row cache. Interestingly, after running for a really long time and with both row and keys caches I do start to see Key cache hit rate 0 but the numbers are so small that it doesn't make sense. I have capacity for 10M keys and 10M rows, the number of cached keys is ~5M and very similarly the number of cached rows is also ~5M, however the hit rates are very different, 0.7 for rows and 0.006 for keys. I'd expect the keys hit rate to be identical since none of them reached the limit yet. Key cache capacity: 1000 Key cache size: 5044097 Key cache hit rate: 0.0062089764058896576 Row cache capacity: 1000 Row cache size: 5057231 Row cache hit rate: 0.7361241352465543 On Tue, May 25, 2010 at 3:43 PM, Jonathan Ellis jbel...@gmail.com wrote: What happens if you disable row cache? On Tue, May 25, 2010 at 4:53 AM, Ran Tavory ran...@gmail.com wrote: It seems there's an error reporting the Key cache hit rate. The value is always 0.0 and I have a feeling it's incorrect. This is seen both by using notetool cfstats as well as accessing JMX directly (org.apache.cassandra.db:type=Caches,keyspace=outbrain_kvdb,cache=KvAdsKeyCache RecentHitRate) ColumnFamily CompareWith=BytesType Name=KvAds RowsCached=1000 KeysCached=1000/ Column Family: KvAds SSTable count: 7 Space used (live): 1288942061 Space used (total): 1559831566 Memtable Columns Count: 73698 Memtable Data Size: 17121092 Memtable Switch Count: 33 Read Count: 3614433 Read Latency: 0.068 ms. Write Count: 3503269 Write Latency: 0.024 ms. Pending Tasks: 0 Key cache capacity: 1000 Key cache size: 619624 Key cache hit rate: 0.0 Row cache capacity: 1000 Row cache size: 447154 Row cache hit rate: 0.8460295730014572 Compacted row minimum size: 387 Compacted row maximum size: 31430 Compacted row mean size: 631 The Row cache hit rate looks good, 0.8 but Key cache hit rate always seems to be 0.0 while the number of unique keys stays about 619624 for quite a while. Is it a real caching problem or just a reporting glitch? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Hector vs cassandra-java-client
cassandra-java-client is up to cassandra's 0.4.2 version, so you probably can't use it out of the box. Hector is active and up to the latest 0.6.1 release with a bunch of committers, contributors and users. See http://wiki.github.com/rantav/hector/ and http://groups.google.com/group/hector-users On Tue, May 25, 2010 at 5:36 AM, Jeff Zhang zjf...@gmail.com wrote: I think hector is better, and seems the author of cassandra-java-client does not continue work on it. On Tue, May 25, 2010 at 10:21 AM, Peter Hsu pe...@motivecast.com wrote: Hi All, This may have been answered already, but I did a [quick] Google search and didn't find much. Which is the better Java client to use? Hector or cassandra-java-client or neither? it seems Hector is more fully featured and more active as a project in general. What are user experiences with either library? Any advice? Thanks, Peter -- Best Regards Jeff Zhang
Error reporting Key cache hit rate with cfstats or with JMX
It seems there's an error reporting the Key cache hit rate. The value is always 0.0 and I have a feeling it's incorrect. This is seen both by using notetool cfstats as well as accessing JMX directly (org.apache.cassandra.db:type=Caches,keyspace=outbrain_kvdb,cache=KvAdsKeyCache RecentHitRate) ColumnFamily CompareWith=BytesType Name=KvAds RowsCached=1000 KeysCached=1000/ Column Family: KvAds SSTable count: 7 Space used (live): 1288942061 Space used (total): 1559831566 Memtable Columns Count: 73698 Memtable Data Size: 17121092 Memtable Switch Count: 33 Read Count: 3614433 Read Latency: 0.068 ms. Write Count: 3503269 Write Latency: 0.024 ms. Pending Tasks: 0 Key cache capacity: 1000 Key cache size: 619624 Key cache hit rate: 0.0 Row cache capacity: 1000 Row cache size: 447154 Row cache hit rate: 0.8460295730014572 Compacted row minimum size: 387 Compacted row maximum size: 31430 Compacted row mean size: 631 The Row cache hit rate looks good, 0.8 but Key cache hit rate always seems to be 0.0 while the number of unique keys stays about 619624 for quite a while. Is it a real caching problem or just a reporting glitch?
Re: Key cache capacity: 1 when using KeysCached=50%
https://issues.apache.org/jira/browse/CASSANDRA-1129 On Tue, May 25, 2010 at 3:42 PM, Jonathan Ellis jbel...@gmail.com wrote: That does look like a bug. Can you create a ticket and upload a (preferably small-ish) sstable that illustrates the problem? On Mon, May 24, 2010 at 12:07 PM, Ran Tavory ran...@gmail.com wrote: I'd like to have 100% keys cached. Sorry if my example of Super2 wasn't correct, but I do think there's a problem. Here's with my own data: When using actual numbers (in this case for RowsCached) it works as expected, however when specifying KeysCached=100% I get only 1. ColumnFamily CompareWith=BytesType Name=KvAds KeysCached=100% RowsCached=1 / Column Family: KvAds SSTable count: 7 Space used (live): 797535964 Space used (total): 797535964 Memtable Columns Count: 42292 Memtable Data Size: 10514176 Memtable Switch Count: 24 Read Count: 2563704 Read Latency: 4.590 ms. Write Count: 1963804 Write Latency: 0.025 ms. Pending Tasks: 0 Key cache capacity: 1 Key cache size: 1 Key cache hit rate: 0.0 Row cache capacity: 1 Row cache size: 1 Row cache hit rate: 0.2206178354382234 Compacted row minimum size: 386 Compacted row maximum size: 9808 Compacted row mean size: 616 On Mon, May 24, 2010 at 6:30 PM, Jonathan Ellis jbel...@gmail.com wrote: If you really want a cache capacity of 0 then you need to use 0 explicitly, otherwise the % versions will give you at least 1. On Mon, May 24, 2010 at 12:34 AM, Ran Tavory ran...@gmail.com wrote: I've noticed that when defining KeysCached=50% (or KeysCached=100% and I didn't test other values with %) then cfstats reports Key cache capacity: 1 This looks weird... is this expected? (version 0.6.1) For example, in the default configuration: ColumnFamily Name=Super2 ColumnType=Super CompareWith=UTF8Type CompareSubcolumnsWith=UTF8Type RowsCached=1 KeysCached=50%/ Keyspace: Keyspace1 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Column Family: Super1 SSTable count: 0 Space used (live): 0 Space used (total): 0 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 20 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0 Column Family: Super2 SSTable count: 0 Space used (live): 0 Space used (total): 0 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 1 Key cache size: 0 Key cache hit rate: NaN Row cache capacity: 1 Row cache size: 0 Row cache hit rate: NaN Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Hector samples -- where?
http://wiki.github.com/rantav/hector/examples On May 25, 2010 10:43 PM, Asaf Lahav asaf.la...@gmail.com wrote: Hi, Where can I find Hector code samples?
Re: RE: Hector samples -- where?
The best examples are in KeyspaceTest but don't include all scenarios On May 26, 2010 2:27 AM, Nicholas Sun nick@raytheon.com wrote: I am also interested in this. It seems like adding multiple Cols into a CF or SuperCols would be very useful. Like a dataload type capability? Nick -Original Message- From: Bill de hOra [mailto:b...@dehora.net] Sent: Tuesday, May 25, 2010...
setcachecapacity is forgotten
I use nodetool to set cache capacity on a certain node but the settings are forgotten after a few minutes. I run: $ nodetool -h localhost -p 9004 setcachecapacity outbrain_kvdb KvImpressions 1000 100 And then run nodetool cfstats immediately after and the settings are effective, I see the correct cache settings. However, after a few minutes, and I'm not sure what the trigger really is, the settings are forgotten and the host returns to the cache settings it had read when it was booted. I even updated storage-config,xml thinking maybe the server re-reads the value from the actual file, but as it seems, it looks like it's reading values stored in its memory when booted. Of course I can just restart the server so values from the file will take effect, but I don't want to start with a cold cache again, I want to increase cache size while it's hot. ...or am I using the tool incorrectly? I'm setting the cache capacity for only one host in the ring, not all hosts. Thanks
Re: Key cache capacity: 1 when using KeysCached=50%
I'd like to have 100% keys cached. Sorry if my example of Super2 wasn't correct, but I do think there's a problem. Here's with my own data: When using actual numbers (in this case for RowsCached) it works as expected, however when specifying KeysCached=100% I get only 1. ColumnFamily CompareWith=BytesType Name=KvAds KeysCached=100% RowsCached=1 / Column Family: KvAds SSTable count: 7 Space used (live): 797535964 Space used (total): 797535964 Memtable Columns Count: 42292 Memtable Data Size: 10514176 Memtable Switch Count: 24 Read Count: 2563704 Read Latency: 4.590 ms. Write Count: 1963804 Write Latency: 0.025 ms. Pending Tasks: 0 Key cache capacity: 1 Key cache size: 1 Key cache hit rate: 0.0 Row cache capacity: 1 Row cache size: 1 Row cache hit rate: 0.2206178354382234 Compacted row minimum size: 386 Compacted row maximum size: 9808 Compacted row mean size: 616 On Mon, May 24, 2010 at 6:30 PM, Jonathan Ellis jbel...@gmail.com wrote: If you really want a cache capacity of 0 then you need to use 0 explicitly, otherwise the % versions will give you at least 1. On Mon, May 24, 2010 at 12:34 AM, Ran Tavory ran...@gmail.com wrote: I've noticed that when defining KeysCached=50% (or KeysCached=100% and I didn't test other values with %) then cfstats reports Key cache capacity: 1 This looks weird... is this expected? (version 0.6.1) For example, in the default configuration: ColumnFamily Name=Super2 ColumnType=Super CompareWith=UTF8Type CompareSubcolumnsWith=UTF8Type RowsCached=1 KeysCached=50%/ Keyspace: Keyspace1 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Column Family: Super1 SSTable count: 0 Space used (live): 0 Space used (total): 0 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 20 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0 Column Family: Super2 SSTable count: 0 Space used (live): 0 Space used (total): 0 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 1 Key cache size: 0 Key cache hit rate: NaN Row cache capacity: 1 Row cache size: 0 Row cache hit rate: NaN Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Is there a way to turn HH off?
For small clusters Hinted Handoff cost is not negligible. I'd like to test its effect. Is there a way to turn it off for my cluster?
Re: oom in ROW-MUTATION-STAGE
Is there another solution except adding capacity? How does the ConcurrentReads (default 8) affect that? If I expect to have similar number of reads and writes should I set the ConcurrentReads equal to ConcurrentWrites (default 32) ? thanks On Sun, May 23, 2010 at 5:43 PM, Jonathan Ellis jbel...@gmail.com wrote: looks like reads are backing up, which in turn is making deserialize back up On Sun, May 23, 2010 at 4:25 AM, Ran Tavory ran...@gmail.com wrote: Here's tpstats on a server with traffic that I think will get OOM shortly. We have 4k pending reads and 123k pending at MESSAGE-DESERIALIZER-POOL Is there something I can do to prevent that? (other than adding RAM...) Pool NameActive Pending Completed FILEUTILS-DELETE-POOL 0 0 55 STREAM-STAGE 0 0 6 RESPONSE-STAGE0 0 0 ROW-READ-STAGE8 40887537229 LB-OPERATIONS 0 0 0 MESSAGE-DESERIALIZER-POOL 1123799 22198459 GMFD 0 0 471827 LB-TARGET 0 0 0 CONSISTENCY-MANAGER 0 0 0 ROW-MUTATION-STAGE0 0 14142351 MESSAGE-STREAMING-POOL0 0 16 LOAD-BALANCER-STAGE 0 0 0 FLUSH-SORTER-POOL 0 0 0 MEMTABLE-POST-FLUSHER 0 0128 FLUSH-WRITER-POOL 0 0128 AE-SERVICE-STAGE 1 1 8 HINTED-HANDOFF-POOL 0 0 10 On Sat, May 22, 2010 at 11:05 PM, Ran Tavory ran...@gmail.com wrote: The message deserializer has 10m pending tasks before the oom. What do you think makes the message deserializer blow up? I'd suspect that when it goes up to 10m pending tasks, don't know how much mem a task actually takes up, but they may consume a lot of memory. Is there a setting I need to tweak? (or am I barking at the wrong tree?). I'll add the counters from http://github.com/jbellis/cassandra-munin-plugins but I already have most of them monitored, so I attached the graphs of the ones that seemed the most suspicious in the previous email. The system keyspace and HH CF don't look too bad, I think, here they are: Keyspace: system Read Count: 154 Read Latency: 0.875012987012987 ms. Write Count: 9 Write Latency: 0.20054 ms. Pending Tasks: 0 Column Family: LocationInfo SSTable count: 1 Space used (live): 2714 Space used (total): 2714 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 3 Read Count: 2 Read Latency: NaN ms. Write Count: 9 Write Latency: 0.011 ms. Pending Tasks: 0 Key cache capacity: 1 Key cache size: 1 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 203 Compacted row maximum size: 397 Compacted row mean size: 300 Column Family: HintsColumnFamily SSTable count: 1 Space used (live): 1457 Space used (total): 4371 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 152 Read Latency: 0.369 ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 1 Key cache size: 1 Key cache hit rate: 0.07142857142857142 Row cache: disabled Compacted row minimum size: 829 Compacted row maximum size: 829 Compacted row mean size: 829 On Sat, May 22, 2010 at 4:14 AM, Jonathan Ellis jbel...@gmail.com wrote: Can you monitor cassandra-level metrics like the ones in http://github.com/jbellis/cassandra-munin-plugins ? the usual culprit is usually compaction but your compacted row size is small. nothing else really comes to mind. (you should check system keyspace too tho, HH rows can get large) On Fri, May 21, 2010 at 2:36 PM, Ran Tavory ran...@gmail.com wrote: I see some OOM on one of the hosts in the cluster and I wonder if there's a formula that'll help me calculate what's the required memory setting given the parameters x,y,z... In short, I need
Re: oom in ROW-MUTATION-STAGE
I am disk bound, certainly. I'll try adding more keys and row caching, but I suspect it's a short blanket, if I add more caching I'll have less free memory so more chance to OOM again. (is the cache using soft ref so it won't take mem from real objects?) On Sun, May 23, 2010 at 8:15 PM, Jonathan Ellis jbel...@gmail.com wrote: On Sun, May 23, 2010 at 10:59 AM, Ran Tavory ran...@gmail.com wrote: Is there another solution except adding capacity? Either you need to get more performance/node or increase node count. :) How does the ConcurrentReads (default 8) affect that? If I expect to have similar number of reads and writes should I set the ConcurrentReads equal to ConcurrentWrites (default 32) ? You should figure out where the bottleneck is, before tweaking things: http://spyced.blogspot.com/2010/01/linux-performance-basics.html Increasing CR will only help if you are (a) cpu bound and (b) have so many cores that 8 threads isn't saturating them. Sight unseen, my guess is you are disk bound. iostat can confirm this. If that's the case then you can try to reduce the disk load w/ row cache or key cache. On Sun, May 23, 2010 at 5:43 PM, Jonathan Ellis jbel...@gmail.com wrote: looks like reads are backing up, which in turn is making deserialize back up On Sun, May 23, 2010 at 4:25 AM, Ran Tavory ran...@gmail.com wrote: Here's tpstats on a server with traffic that I think will get OOM shortly. We have 4k pending reads and 123k pending at MESSAGE-DESERIALIZER-POOL Is there something I can do to prevent that? (other than adding RAM...) Pool NameActive Pending Completed FILEUTILS-DELETE-POOL 0 0 55 STREAM-STAGE 0 0 6 RESPONSE-STAGE0 0 0 ROW-READ-STAGE8 40887537229 LB-OPERATIONS 0 0 0 MESSAGE-DESERIALIZER-POOL 1123799 22198459 GMFD 0 0 471827 LB-TARGET 0 0 0 CONSISTENCY-MANAGER 0 0 0 ROW-MUTATION-STAGE0 0 14142351 MESSAGE-STREAMING-POOL0 0 16 LOAD-BALANCER-STAGE 0 0 0 FLUSH-SORTER-POOL 0 0 0 MEMTABLE-POST-FLUSHER 0 0128 FLUSH-WRITER-POOL 0 0128 AE-SERVICE-STAGE 1 1 8 HINTED-HANDOFF-POOL 0 0 10 On Sat, May 22, 2010 at 11:05 PM, Ran Tavory ran...@gmail.com wrote: The message deserializer has 10m pending tasks before the oom. What do you think makes the message deserializer blow up? I'd suspect that when it goes up to 10m pending tasks, don't know how much mem a task actually takes up, but they may consume a lot of memory. Is there a setting I need to tweak? (or am I barking at the wrong tree?). I'll add the counters from http://github.com/jbellis/cassandra-munin-plugins but I already have most of them monitored, so I attached the graphs of the ones that seemed the most suspicious in the previous email. The system keyspace and HH CF don't look too bad, I think, here they are: Keyspace: system Read Count: 154 Read Latency: 0.875012987012987 ms. Write Count: 9 Write Latency: 0.20054 ms. Pending Tasks: 0 Column Family: LocationInfo SSTable count: 1 Space used (live): 2714 Space used (total): 2714 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 3 Read Count: 2 Read Latency: NaN ms. Write Count: 9 Write Latency: 0.011 ms. Pending Tasks: 0 Key cache capacity: 1 Key cache size: 1 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 203 Compacted row maximum size: 397 Compacted row mean size: 300 Column Family: HintsColumnFamily SSTable count: 1 Space used (live): 1457 Space used (total): 4371 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 152 Read Latency: 0.369 ms. Write Count: 0 Write Latency: NaN ms. Pending
Re: how to decommission two slow nodes?
Thanks, I'll try that next time. On May 21, 2010 5:23 PM, Jonathan Ellis jbel...@gmail.com wrote: There is no other way to make the cluster forget a node w/o decommission / removetoken. You could do everything up to stop the entire cluster and do a rolling restart instead, kill the 2 nodes you want to remove, and then do removetoken, which would still do extra i/o but at least the slow nodes would not be involved. On Thu, May 20, 2010 at 8:54 PM, Ran Tavory ran...@gmail.com wrote: I forgot to mention that th... -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Ca...
Re: Disk usage doubled after nodetool decommission and node still in ring
Run nodetool streams. On May 18, 2010 4:14 PM, Maxim Kramarenko maxi...@trackstudio.com wrote: Hi! After nodetool decomission data size on all nodes grow twice, node still up and in ring, and no streaming now / tmp SSTables now. BTW, I have ssh connection to server, so after run nodetool decommission I expect, that server receive the command press Ctrl-C and close shell. It is correct ? What is the best way to check current node state to check, is decommission is finished ? Should node accept new data after I run decommission command ?
Re: ConcurrentModificationException in gossiper while decommissioning another node
that sounds like it, thanks On Tue, May 18, 2010 at 3:53 PM, roger schildmeijer schildmei...@gmail.comwrote: This is hopefully fixed in trunk (CASSANDRA-757 (revision 938597)); Replace synchronization in Gossiper with concurrent data structures and volatile fields. // Roger Schildmeijer On Tue, May 18, 2010 at 1:55 PM, Ran Tavory ran...@gmail.com wrote: While the node 192.168.252.61 was in the process of decommissioning I see this error in two other nodes: INFO [Timer-1] 2010-05-18 06:01:12,048 Gossiper.java (line 179) InetAddress /192.168.252.62 is now dead. INFO [GMFD:1] 2010-05-18 06:04:00,189 Gossiper.java (line 568) InetAddress /192.168.252.62 is now UP INFO [Timer-1] 2010-05-18 06:11:45,311 Gossiper.java (line 401) FatClient /192.168.252.61 has been silent for 360ms, removing from gossip ERROR [Timer-1] 2010-05-18 06:11:45,315 CassandraDaemon.java (line 88) Fatal exception in thread Thread[Timer-1,5,main] java.lang.RuntimeException: java.util.ConcurrentModificationException at org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:97) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) Caused by: java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1031) at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:382) at org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:91) ... 2 more .61 is the decommissioned node. .62 was under load (streams transferred to it from .61) I simply ran nodetool decommission on the 61 node and then (after an hour, I guess) I saw this error in two other live nodes. Does this ring any bell? It's either a bug, or that I wasn't running decommission correctly...
Re: decommission and org.apache.thrift.TApplicationException: get_slice failed: unknown result
My decommission was progressing OK, although very slow, but I'll send another question to the list about that... The exception must be a hiccup, I hope I won't get it again I suppose... On Tue, May 18, 2010 at 4:10 PM, Gary Dusbabek gdusba...@gmail.com wrote: If I had to guess, I'd say that something at the transport layer had trouble. Possibly some kind of thrift hiccup that we haven't seen before. Your description makes it sound as if the decommission is proceeding normally though. Gary. On Tue, May 18, 2010 at 04:42, Ran Tavory ran...@gmail.com wrote: What's the correct way to remove a node from a cluster? According to this page http://wiki.apache.org/cassandra/Operations a decommission call should be enough. When decommissioning one of the nodes from my cluster I see an error in the client: org.apache.thrift.TApplicationException: get_slice failed: unknown result at org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:407) at org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:367) The client isn't talking to the decommissioned node, it's connected to another node, so I'd expect all operations to continue as normal (although slower), right? I simply called nodetool -h ... decommission on the host and waited. After a while, while the node is still decommissioning I saw the error at the client. The current state of the node is Decommissioned and it's not in the ring now. It is still moving streams to other hosts, though. I can't be sure, though whether the error happened during the time it was Leaving the ring or was it already Decommissioned. The server logs don't show something of note (no errors or warnings). What do you think?
how to decommission two slow nodes?
In my cluster setup I have two datacenters with 5 hosts in one DC and 3 in the other. In the 5 hosts DC I'd like to remove two hosts so I'd get 3 and 3 in each. The two nodes I'd like to decommission have less RAM than the other 3 so they operate slower. What's the most effective way to decommission them? At first I thought I'd decommission the first and then when it's done, decommission the second, but the problem was that when I decommissioned the first it started streaming its data to the second node (as well as others I think) and since the second node was under heavy load, and not enough ram, it was busy GCing and worked horribly slow. Eventually, after almost 24h of horribly slow streaming I gave up. This also caused the entire cluster to operate horribly slow. So, is there a better way to decommission the two under provisioned nodes without slowing down the cluster, or at least with a minimum effect? My replication is 2 and I'm using a RackAwareStrategy so (if everything is configured correctly with the EndPointSnitch) then at any given time, two copies of the data exist, one in each DC. Thanks
mapreduce from cassandra to cassandra
In the wordcount example the process reads from cassandra and the result is written to a local file at /tmp/word_count* Is it possible to read from cassandra and write the result back to cassandra to a specified cf/row/column? I see that there exists a ColumnFamilyInputFormat but not ColumnFamilyOutputFormat or something like that (in http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/hadoop/ ) My knowledge about hadoop and mr is pretty basic so maybe I'm missing something simple, lmk, thanks!
Re: mapreduce from cassandra to cassandra
hbase - yes. But is that reusable for cassandra? On Tue, May 18, 2010 at 12:17 PM, Jeff Zhang zjf...@gmail.com wrote: I believe it is possible to write result back to cassandra. If I remember correctly, HBase has both InputFormat and OutputFormat for hadoop. On Tue, May 18, 2010 at 5:08 PM, Ran Tavory ran...@gmail.com wrote: In the wordcount example the process reads from cassandra and the result is written to a local file at /tmp/word_count* Is it possible to read from cassandra and write the result back to cassandra to a specified cf/row/column? I see that there exists a ColumnFamilyInputFormat but not ColumnFamilyOutputFormat or something like that (in http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/hadoop/ ) My knowledge about hadoop and mr is pretty basic so maybe I'm missing something simple, lmk, thanks! -- Best Regards Jeff Zhang
decommission and org.apache.thrift.TApplicationException: get_slice failed: unknown result
What's the correct way to remove a node from a cluster? According to this page http://wiki.apache.org/cassandra/Operations a decommission call should be enough. When decommissioning one of the nodes from my cluster I see an error in the client: org.apache.thrift.TApplicationException: get_slice failed: unknown result at org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:407) at org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:367) The client isn't talking to the decommissioned node, it's connected to another node, so I'd expect all operations to continue as normal (although slower), right? I simply called nodetool -h ... decommission on the host and waited. After a while, while the node is still decommissioning I saw the error at the client. The current state of the node is Decommissioned and it's not in the ring now. It is still moving streams to other hosts, though. I can't be sure, though whether the error happened during the time it was Leaving the ring or was it already Decommissioned. The server logs don't show something of note (no errors or warnings). What do you think?
ConcurrentModificationException in gossiper while decommissioning another node
While the node 192.168.252.61 was in the process of decommissioning I see this error in two other nodes: INFO [Timer-1] 2010-05-18 06:01:12,048 Gossiper.java (line 179) InetAddress /192.168.252.62 is now dead. INFO [GMFD:1] 2010-05-18 06:04:00,189 Gossiper.java (line 568) InetAddress /192.168.252.62 is now UP INFO [Timer-1] 2010-05-18 06:11:45,311 Gossiper.java (line 401) FatClient / 192.168.252.61 has been silent for 360ms, removing from gossip ERROR [Timer-1] 2010-05-18 06:11:45,315 CassandraDaemon.java (line 88) Fatal exception in thread Thread[Timer-1,5,main] java.lang.RuntimeException: java.util.ConcurrentModificationException at org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:97) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) Caused by: java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1031) at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:382) at org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:91) ... 2 more .61 is the decommissioned node. .62 was under load (streams transferred to it from .61) I simply ran nodetool decommission on the 61 node and then (after an hour, I guess) I saw this error in two other live nodes. Does this ring any bell? It's either a bug, or that I wasn't running decommission correctly...
Re: is it possible to trace/debug cassandra?
Add to cassandra.in.sh -Xdebug -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=n to the JVM_OPTS section. Then connect with jdb ( http://java.sun.com/j2se/1.3/docs/tooldocs/solaris/jdb.html) or your IDE as a remote process On Tue, May 18, 2010 at 1:18 PM, S Ahmed sahmed1...@gmail.com wrote: Would it be possible to put cassandra in debug mode, so I could actually step through, line by line, the execution flow of operations I execute against it? If yes, any help would be great.
Re: JMX metrics for monitoring
There are many, but here's what I found useful so far: Per CF you have: - Recent read/write latency - PendingTasks - Read/Write count Globally you have, for each of the stages (e.g. org.apache.cassandra.concurrent:type=ROW-READ-STAGE): - PendingTasks - ActiveCount ... and as you go you'll find more On Tue, May 18, 2010 at 1:02 AM, Maxim Kramarenko maxi...@trackstudio.comwrote: Hi! Which JMX metrics do you use for Cassandra monitoring ? Which values can be used for alerts ?
Re: what/how do you guys monitor slow nodes?
There is a per cf read and write latency jmx. On May 12, 2010 12:55 AM, Jordan Pittier - Rezel jor...@rezel.net wrote: For sure you have to pay particular attention to memory allocation on each node, especially be sure your servers dont swap. Then you can monitor how load are balanced among your nodes (nodetools -h XX ring). On Tue, May 11, 2010 at 11:46 PM, S Ahmed sahmed1...@gmail.com wrote: If you have 3-4 nodes,...