from:"Ran Tavory"

Re: Cassandra Agent

2013-10-11 Thread Ran Tavory

Now that's a fun problem to solve!

 On 11 באוק 2013, at 17:17, David Schairer dschai...@humbaba.net wrote:
 
 http://en.wikipedia.org/wiki/List_of_children_of_Priam
 
 You've got plenty of children of Priam to go around.  Doesn't anyone read the 
 Iliad any more?  :)
 
 --DRS
 
 On Oct 11, 2013, at 6:55 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
 
 Stick sandra on the end. Restsandra.
 
 On Friday, October 11, 2013, Ran Tavory ran...@gmail.com wrote:
 Seems like the greeks are all used out, how about moving the the japanese 
 mythology? it's a brand new pool of names...
 http://en.wikipedia.org/wiki/Japanese_mythology
 
 
 On Fri, Oct 11, 2013 at 8:29 AM, Blair Zajac bl...@orcaware.com wrote:
 
 On 10/10/2013 10:28 PM, Blair Zajac wrote:
 
 On 10/10/2013 08:53 PM, Sean McCully wrote:
 
 On Thursday, October 10, 2013 08:30:42 PM Blair Jacuzzi wrote:
 
 On 10/10/2013 07:54 PM, Sean McCully wrote:
 
 Hello Cassandra Users,
 
 I've recently created a Cassandra Agent as part of Netflix's Cloud
 Prize
 competition, the submission which I've named Hector is largely based on
 Netflix's Priam. I would be very interested in getting feedback, from
 anyone willing to give Hector (https://github.com/seanmccully/hector) a
 try. I am very interested in seeing if this is something the Cassandra
 Community is interested in using with their Cassandra installs.
 
 For one, there's a name conflict with the well known Hector Cassandra
 client project:
 
 http://hector-client.github.io/hector/build/html/index.html
 
 Any suggestions on a new name?
 
 Helenus, the twin brother of the prophetess Cassandra???
 
 http://en.wikipedia.org/wiki/Helenus
 
 Oops, should have Googled myself before suggesting this, they are NodeJS 
 Bindings for Cassandra:
 
 https://github.com/simplereach/helenus
 
 Well, I'll leave it to you to find a free name ;)
 
 Blair
 
 
 
 --
 /Ran
 http://tavory.com

Re: Cassandra Agent

2013-10-10 Thread Ran Tavory

Seems like the greeks are all used out, how about moving the the japanese
mythology? it's a brand new pool of names...

http://en.wikipedia.org/wiki/Japanese_mythology


On Fri, Oct 11, 2013 at 8:29 AM, Blair Zajac bl...@orcaware.com wrote:

 On 10/10/2013 10:28 PM, Blair Zajac wrote:

 On 10/10/2013 08:53 PM, Sean McCully wrote:


 On Thursday, October 10, 2013 08:30:42 PM Blair Jacuzzi wrote:

 On 10/10/2013 07:54 PM, Sean McCully wrote:

 Hello Cassandra Users,

 I've recently created a Cassandra Agent as part of Netflix's Cloud
 Prize
 competition, the submission which I've named Hector is largely based on
 Netflix's Priam. I would be very interested in getting feedback, from
 anyone willing to give Hector 
 (https://github.com/**seanmccully/hectorhttps://github.com/seanmccully/hector)
 a
 try. I am very interested in seeing if this is something the Cassandra
 Community is interested in using with their Cassandra installs.


 For one, there's a name conflict with the well known Hector Cassandra
 client project:

 http://hector-client.github.**io/hector/build/html/index.**htmlhttp://hector-client.github.io/hector/build/html/index.html


 Any suggestions on a new name?


 Helenus, the twin brother of the prophetess Cassandra???

 http://en.wikipedia.org/wiki/**Helenushttp://en.wikipedia.org/wiki/Helenus


 Oops, should have Googled myself before suggesting this, they are NodeJS
 Bindings for Cassandra:

 https://github.com/**simplereach/helenushttps://github.com/simplereach/helenus

 Well, I'll leave it to you to find a free name ;)

 Blair




-- 
/Ran
http://tavory.com

[no subject]

2013-10-06 Thread Ran Tavory

Hi, I have a small cluster of 1.2.6 and after some config changes I started
seeing errors int the logs.

Not sure that's related, but the changes I performed were to disable hinted
handoff and disable auto snapshot. I'll try to reverte these, see if the
picture changes.

But anyway, that seems like a bug, right?

I see this across many nodes, not only one.

ERROR [ReplicateOnWriteStage:105] 2013-10-06 16:13:13,799
CassandraDaemon.java (line 192) Exception in thread
Thread[ReplicateOnWriteStage:105,5,main]
java.lang.AssertionError: DecoratedKey(-9223372036854775808, ) !=
DecoratedKey(-1854619418400985942, 00033839390a4769676f707469782d3100)
in
/raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db
 at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:119)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:60)
 at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68)
 at
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:272)
at
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
 at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1214)
 at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1126)
at org.apache.cassandra.db.Table.getRow(Table.java:347)
 at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:64)
at
org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMutation.java:90)
 at
org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:772)
at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1593)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
ERROR [ReplicateOnWriteStage:82] 2013-10-06 16:13:14,249
CassandraDaemon.java (line 192) Exception in thread
Thread[ReplicateOnWriteStage:82,5,main]
java.lang.RuntimeException: java.lang.IllegalArgumentException: unable to
seek to position 2171332 in
/raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db
(1250125 bytes) in read-only mode
 at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1597)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IllegalArgumentException: unable to seek to position
2171332 in
/raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db
(1250125 bytes) in read-only mode
 at
org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:306)
at
org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:42)
 at
org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1054)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.createFileDataInput(SSTableNamesIterator.java:94)
 at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:112)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:60)
 at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68)
 at
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:272)
at
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
 at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1214)
 at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1126)
at org.apache.cassandra.db.Table.getRow(Table.java:347)
 at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:64)
at
org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMutation.java:90)
 at
org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:772)
at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1593)

-- 
/Ran
http://tavory.com

AssertionError: DecoratedKey(... ) != DecoratedKey (...)

2013-10-06 Thread Ran Tavory

Pardon me, now with the appropriate subject line...

Hi, I have a small cluster of 1.2.6 and after some config changes I started
seeing errors int the logs.

Not sure that's related, but the changes I performed were to disable hinted
handoff and disable auto snapshot. I'll try to reverte these, see if the
picture changes.

But anyway, that seems like a bug, right?

I see this across many nodes, not only one.

ERROR [ReplicateOnWriteStage:105] 2013-10-06 16:13:13,799
CassandraDaemon.java (line 192) Exception in thread
Thread[ReplicateOnWriteStage:105,5,main]
java.lang.AssertionError: DecoratedKey(-9223372036854775808, ) !=
DecoratedKey(-1854619418400985942, 00033839390a4769676f707469782d3100)
in
/raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db
 at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:119)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:60)
 at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68)
 at
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:272)
at
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
 at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1214)
 at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1126)
at org.apache.cassandra.db.Table.getRow(Table.java:347)
 at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:64)
at
org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMutation.java:90)
 at
org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:772)
at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1593)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
ERROR [ReplicateOnWriteStage:82] 2013-10-06 16:13:14,249
CassandraDaemon.java (line 192) Exception in thread
Thread[ReplicateOnWriteStage:82,5,main]
java.lang.RuntimeException: java.lang.IllegalArgumentException: unable to
seek to position 2171332 in
/raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db
(1250125 bytes) in read-only mode
 at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1597)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IllegalArgumentException: unable to seek to position
2171332 in
/raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db
(1250125 bytes) in read-only mode
 at
org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:306)
at
org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:42)
 at
org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1054)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.createFileDataInput(SSTableNamesIterator.java:94)
 at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:112)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:60)
 at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68)
 at
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:272)
at
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
 at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1214)
 at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1126)
at org.apache.cassandra.db.Table.getRow(Table.java:347)
 at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:64)
at
org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMutation.java:90)
 at
org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:772)
at
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1593)

-- 
/Ran
http://tavory.com

com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query

2013-10-06 Thread Ran Tavory

Hi all, when using the java-driver I see this error on the client, for
reads (as well as for writes).
Many of the ops succeed, however I do see a significant amount of errors.

com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra
timeout during write query at consistency ONE (1 replica were required but
only 0 acknowledged the write)
at
com.datastax.driver.core.ResultSetFuture.convertException(ResultSetFuture.java:243)
 at
com.datastax.driver.core.ResultSetFuture$ResponseCallback.onSet(ResultSetFuture.java:119)
at
com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:202)
 at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:331)
at
com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:484)
 at
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)

The cluster itself isn't working very hard and seems to be in good shape...
CPU Load is around 0.1, IO wait is below 1%, all hosts are up, not flapping
of anything and the logs don't indicate any special GC activity...


So I'm a bit puzzled as to where to look next. Any hints?...

-- 
/Ran
http://tavory.com

Re: AssertionError: DecoratedKey(... ) != DecoratedKey (...)

2013-10-06 Thread Ran Tavory

Update: I've reverted hinted_handoff_enabled back to its default value of
true and the errors stopped. Is this just a coincidence, or could be
related?



On Sun, Oct 6, 2013 at 7:23 PM, Ran Tavory ran...@gmail.com wrote:

 Pardon me, now with the appropriate subject line...

 Hi, I have a small cluster of 1.2.6 and after some config changes I
 started seeing errors int the logs.

 Not sure that's related, but the changes I performed were to disable
 hinted handoff and disable auto snapshot. I'll try to reverte these, see if
 the picture changes.

 But anyway, that seems like a bug, right?

 I see this across many nodes, not only one.

 ERROR [ReplicateOnWriteStage:105] 2013-10-06 16:13:13,799
 CassandraDaemon.java (line 192) Exception in thread
 Thread[ReplicateOnWriteStage:105,5,main]
 java.lang.AssertionError: DecoratedKey(-9223372036854775808, ) !=
 DecoratedKey(-1854619418400985942, 00033839390a4769676f707469782d3100)
 in
 /raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db
  at
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:119)
 at
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:60)
  at
 org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81)
 at
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68)
  at
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:272)
 at
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
  at
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391)
 at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1214)
  at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1126)
 at org.apache.cassandra.db.Table.getRow(Table.java:347)
  at
 org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:64)
 at
 org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMutation.java:90)
  at
 org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:772)
 at
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1593)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
 ERROR [ReplicateOnWriteStage:82] 2013-10-06 16:13:14,249
 CassandraDaemon.java (line 192) Exception in thread
 Thread[ReplicateOnWriteStage:82,5,main]
 java.lang.RuntimeException: java.lang.IllegalArgumentException: unable to
 seek to position 2171332 in
 /raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db
 (1250125 bytes) in read-only mode
  at
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1597)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.IllegalArgumentException: unable to seek to position
 2171332 in
 /raid0/cassandra/data/test_realtime/activities_summary_realtime/test_realtime-activities_summary_realtime-ic-2-Data.db
 (1250125 bytes) in read-only mode
  at
 org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:306)
 at
 org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:42)
  at
 org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1054)
 at
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.createFileDataInput(SSTableNamesIterator.java:94)
  at
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:112)
 at
 org.apache.cassandra.db.columniterator.SSTableNamesIterator.init(SSTableNamesIterator.java:60)
  at
 org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81)
 at
 org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68)
  at
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:272)
 at
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
  at
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1391)
 at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1214)
  at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1126)
 at org.apache.cassandra.db.Table.getRow(Table.java:347)
  at
 org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:64

Re: 0.7.0 mx4j, get attribute

2011-02-03 Thread Ran Tavory

Try adding this to the end of the URL: ?template=identity

On Thu, Feb 3, 2011 at 4:23 PM, Chris Burroughs
chris.burrou...@gmail.comwrote:

 On 02/02/2011 01:41 PM, Ryan King wrote:
  On Wed, Feb 2, 2011 at 10:40 AM, Chris Burroughs
  chris.burrou...@gmail.com wrote:
  I'm using 0.7.0 and experimenting with the new mx4j support.
 
  http://host:port
 /mbean?objectname=org.apache.cassandra.request%3Atype%3DReadStage
 
  Returns a nice pretty html page.  For purposes of monitoring I would
  like to get a single attribute as xml.  The docs [1] decribe a
  getattribute endpoint.  But I have been unable to get anything other
  than a blank response from that.  mx4j does not seem to include any
  logging for troubleshooting.
 
  Example:
  http://host:port
 /getattribute?objectname=org.apache.cassandra.request%3atype%3dReadStageattribute=PendingTasks
 
  returns 200 OK with no data.
 
  If anyone could point out what embarrassingly simple mistake I am making
  I would be much obliged.
 
 
  [1] http://mx4j.sourceforge.net/docs/ch05.html
 
 
  Note that many objects in cassandra aren't initialized until they're
  used for the first time.
 
  -ryan

 But if I can access them through jconsole just fine I don't see what
 would be stopping mx4j.




-- 
/Ran

Re: Do you have a site in production environment with Cassandra? What client do you use?

2011-01-14 Thread Ran Tavory

I use Hector,  if that counts. ..
On Jan 14, 2011 7:25 PM, Ertio Lew ertio...@gmail.com wrote:
 Hey,

 If you have a site in production environment or considering so, what
 is the client that you use to interact with Cassandra. I know that
 there are several clients available out there according to the
 language you use but I would love to know what clients are being used
 widely in production environments and are best to work with(support
 most required features for performance).

 Also preferably tell about the technology stack for your applications.

 Any suggestions, comments appreciated ?

 Thanks
 Ertio

Re: Do you have a site in production environment with Cassandra? What client do you use?

2011-01-14 Thread Ran Tavory

Java
On Jan 14, 2011 8:25 PM, Ertio Lew ertio...@gmail.com wrote:
 what is the technology stack do you use?

 On 1/14/11, Ran Tavory ran...@gmail.com wrote:
 I use Hector, if that counts. ..
 On Jan 14, 2011 7:25 PM, Ertio Lew ertio...@gmail.com wrote:
 Hey,

 If you have a site in production environment or considering so, what
 is the client that you use to interact with Cassandra. I know that
 there are several clients available out there according to the
 language you use but I would love to know what clients are being used
 widely in production environments and are best to work with(support
 most required features for performance).

 Also preferably tell about the technology stack for your applications.

 Any suggestions, comments appreciated ?

 Thanks
 Ertio

Re: maven cassandra plugin

2011-01-06 Thread Ran Tavory

Stephen, just FYI cassandra cannot be stopped cleanly. It's jvm must
be taken down. So the plugin would need to probably fork a jvm and
kill it when it's done.

On Thursday, January 6, 2011, B. Todd Burruss bburr...@real.com wrote:






 would u like some testers?  we were about to write one.

 On 01/06/2011 12:43 PM, Stephen Connolly wrote:

   I nearly have one ready...
   my plan is to have it added to contrib... if the cassandra devs
 agree
   -stephen
   - Stephen
   ---
 Sent from my Android phone, so random spelling mistakes, random
 nonsense words and other nonsense are a direct result of using
 swype to type on the screen
   On 6 Jan 2011 19:38, B. Todd Burruss
 bburr...@real.com
 wrote:
  has anyone created a maven plugin, like cargo for tomcat,
 for automating
  starting/stopping a cassandra instance?






-- 
/Ran

Re: Bootstrapping taking long

2011-01-05 Thread Ran Tavory

In storage-conf I see this comment [1] from which I understand that the
recommended way to bootstrap a new node is to set AutoBootstrap=true and
remove itself from the seeds list.
Moreover, I did try to set AutoBootstrap=true and have the node in its own
seeds list, but it would not bootstrap. I don't recall the exact message but
it was something like I found myself in the seeds list therefore I'm not
going to bootstrap even though AutoBootstrap is true.

[1]
  !--
   ~ Turn on to make new [non-seed] nodes automatically migrate the right
data
   ~ to themselves.  (If no InitialToken is specified, they will pick one
   ~ such that they will get half the range of the most-loaded node.)
   ~ If a node starts up without bootstrapping, it will mark itself
bootstrapped
   ~ so that you can't subsequently accidently bootstrap a node with
   ~ data on it.  (You can reset this by wiping your data and commitlog
   ~ directories.)
   ~
   ~ Off by default so that new clusters and upgraders from 0.4 don't
   ~ bootstrap immediately.  You should turn this on when you start adding
   ~ new nodes to a cluster that already has data on it.  (If you are
upgrading
   ~ from 0.4, start your cluster with it off once before changing it to
true.
   ~ Otherwise, no data will be lost but you will incur a lot of unnecessary
   ~ I/O before your cluster starts up.)
  --
  AutoBootstrapfalse/AutoBootstrap

On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com wrote:

 If seed list should be the same across the cluster that means that nodes
 *should* have themselves as a seed. If that doesn't work for Ran, then that
 is the first problem, no?


 On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote:

 Well your ring issues don't make sense to me, seed list should be the same
 across the cluster.
 I'm just thinking of other things to try, non-boostrapped nodes should
 join the ring instantly but reads will fail if you aren't using quorum.


 On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote:

 I haven't tried repair.  Should I?
 On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote:
  Have you tried not bootstrapping but setting the token and manually
 calling
  repair?
 
  On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com wrote:
 
  My conclusion is lame: I tried this on several hosts and saw the same
  behavior, the only way I was able to join new nodes was to first start
 them
  when they are *not in* their own seeds list and after they
  finish transferring the data, then restart them with themselves *in*
 their
  own seeds list. After doing that the node would join the ring.
  This is either my misunderstanding or a bug, but the only place I
 found it
  documented stated that the new node should not be in its own seeds
 list.
  Version 0.6.6.
 
  On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn da...@lookin2.com
 wrote:
 
  My nodes all have themselves in their list of seeds - always did -
 and
  everything works. (You may ask why I did this. I don't know, I must
 have
  copied it from an example somewhere.)
 
  On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote:
 
  I was able to make the node join the ring but I'm confused.
  What I did is, first when adding the node, this node was not in the
 seeds
  list of itself. AFAIK this is how it's supposed to be. So it was
 able to
  transfer all data to itself from other nodes but then it stayed in
 the
  bootstrapping state.
  So what I did (and I don't know why it works), is add this node to
 the
  seeds list in its own storage-conf.xml file. Then restart the server
 and
  then I finally see it in the ring...
  If I had added the node to the seeds list of itself when first
 joining
  it, it would not join the ring but if I do it in two phases it did
 work.
  So it's either my misunderstanding or a bug...
 
 
  On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com
 wrote:
 
  The new node does not see itself as part of the ring, it sees all
 others
  but itself, so from that perspective the view is consistent.
  The only problem is that the node never finishes to bootstrap. It
 stays
  in this state for hours (It's been 20 hours now...)
 
 
  $ bin/nodetool -p 9004 -h localhost streams
  Mode: Bootstrapping
  Not sending any streams.
  Not receiving any streams.
 
 
  On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com
 wrote:
 
  Does the new node have itself in the list of seeds per chance?
 This
  could cause some issues if so.
 
  On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com
 wrote:
   I'm still at lost. I haven't been able to resolve this. I tried
   adding another node at a different location on the ring but this
 node
   too remains stuck in the bootstrapping state for many hours
 without
   any of the other nodes being busy with anti compaction or
 anything
   else. I don't know what's keeping it from finishing the
 bootstrap,no
   CPU, no io, files were already streamed so

Re: Bootstrapping taking long

2011-01-05 Thread Ran Tavory

@Thibaut wrong email? Or how's Avoid dropping messages off the client
request path (CASSANDRA-1676) related to the bootstrap questions I had?

On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz thibaut.br...@trendiction.com
 wrote:

 https://issues.apache.org/jira/browse/CASSANDRA-1676

 you have to use at least 0.6.7



 On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory ran...@gmail.com wrote:
  In storage-conf I see this comment [1] from which I understand that the
  recommended way to bootstrap a new node is to set AutoBootstrap=true and
  remove itself from the seeds list.
  Moreover, I did try to set AutoBootstrap=true and have the node in its
 own
  seeds list, but it would not bootstrap. I don't recall the exact message
 but
  it was something like I found myself in the seeds list therefore I'm
 not
  going to bootstrap even though AutoBootstrap is true.
 
  [1]
!--
 ~ Turn on to make new [non-seed] nodes automatically migrate the
 right
  data
 ~ to themselves.  (If no InitialToken is specified, they will pick
 one
 ~ such that they will get half the range of the most-loaded node.)
 ~ If a node starts up without bootstrapping, it will mark itself
  bootstrapped
 ~ so that you can't subsequently accidently bootstrap a node with
 ~ data on it.  (You can reset this by wiping your data and commitlog
 ~ directories.)
 ~
 ~ Off by default so that new clusters and upgraders from 0.4 don't
 ~ bootstrap immediately.  You should turn this on when you start
 adding
 ~ new nodes to a cluster that already has data on it.  (If you are
  upgrading
 ~ from 0.4, start your cluster with it off once before changing it to
  true.
 ~ Otherwise, no data will be lost but you will incur a lot of
 unnecessary
 ~ I/O before your cluster starts up.)
--
AutoBootstrapfalse/AutoBootstrap
  On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com
 wrote:
 
  If seed list should be the same across the cluster that means that
 nodes
  *should* have themselves as a seed. If that doesn't work for Ran, then
 that
  is the first problem, no?
 
 
  On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com wrote:
 
  Well your ring issues don't make sense to me, seed list should be the
  same across the cluster.
  I'm just thinking of other things to try, non-boostrapped nodes should
  join the ring instantly but reads will fail if you aren't using
 quorum.
 
  On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com wrote:
 
  I haven't tried repair.  Should I?
 
  On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote:
   Have you tried not bootstrapping but setting the token and manually
   calling
   repair?
  
   On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com
 wrote:
  
   My conclusion is lame: I tried this on several hosts and saw the
 same
   behavior, the only way I was able to join new nodes was to first
   start them
   when they are *not in* their own seeds list and after they
   finish transferring the data, then restart them with themselves
 *in*
   their
   own seeds list. After doing that the node would join the ring.
   This is either my misunderstanding or a bug, but the only place I
   found it
   documented stated that the new node should not be in its own seeds
   list.
   Version 0.6.6.
  
   On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn
   da...@lookin2.comwrote:
  
   My nodes all have themselves in their list of seeds - always did
 -
   and
   everything works. (You may ask why I did this. I don't know, I
 must
   have
   copied it from an example somewhere.)
  
   On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com
 wrote:
  
   I was able to make the node join the ring but I'm confused.
   What I did is, first when adding the node, this node was not in
 the
   seeds
   list of itself. AFAIK this is how it's supposed to be. So it was
   able to
   transfer all data to itself from other nodes but then it stayed
 in
   the
   bootstrapping state.
   So what I did (and I don't know why it works), is add this node
 to
   the
   seeds list in its own storage-conf.xml file. Then restart the
   server and
   then I finally see it in the ring...
   If I had added the node to the seeds list of itself when first
   joining
   it, it would not join the ring but if I do it in two phases it
 did
   work.
   So it's either my misunderstanding or a bug...
  
  
   On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com
   wrote:
  
   The new node does not see itself as part of the ring, it sees
 all
   others
   but itself, so from that perspective the view is consistent.
   The only problem is that the node never finishes to bootstrap.
 It
   stays
   in this state for hours (It's been 20 hours now...)
  
  
   $ bin/nodetool -p 9004 -h localhost streams
   Mode: Bootstrapping
   Not sending any streams.
   Not receiving any streams

Re: Bootstrapping taking long

2011-01-05 Thread Ran Tavory

OK, thanks, so I see we had the same problem (I too had multiple keyspace,
not that I know why it matters to the problem at hand) and I see that by
upgrading to 0.6.7 you solved your problem (I didn't try it, had a different
workaround) but frankly, I don't understand how
https://issues.apache.org/jira/browse/CASSANDRA-1676 would relate the the
stuck bootstrap problem (I'm not saying that it isn't, I'd just like to
understand why...)


On Wed, Jan 5, 2011 at 5:42 PM, Thibaut Britz thibaut.br...@trendiction.com
 wrote:

 Had the same Problem a while ago. Upgrading solved the problem (Don't know
 if you have to redeploy your cluster though)

 http://www.mail-archive.com/user@cassandra.apache.org/msg07106.html



 On Wed, Jan 5, 2011 at 4:29 PM, Ran Tavory ran...@gmail.com wrote:

 @Thibaut wrong email? Or how's Avoid dropping messages off the client
 request path (CASSANDRA-1676) related to the bootstrap questions I had?


 On Wed, Jan 5, 2011 at 5:23 PM, Thibaut Britz 
 thibaut.br...@trendiction.com wrote:

 https://issues.apache.org/jira/browse/CASSANDRA-1676

 you have to use at least 0.6.7



 On Wed, Jan 5, 2011 at 4:19 PM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:

 On Wed, Jan 5, 2011 at 10:05 AM, Ran Tavory ran...@gmail.com wrote:
  In storage-conf I see this comment [1] from which I understand that
 the
  recommended way to bootstrap a new node is to set AutoBootstrap=true
 and
  remove itself from the seeds list.
  Moreover, I did try to set AutoBootstrap=true and have the node in its
 own
  seeds list, but it would not bootstrap. I don't recall the exact
 message but
  it was something like I found myself in the seeds list therefore I'm
 not
  going to bootstrap even though AutoBootstrap is true.
 
  [1]
!--
 ~ Turn on to make new [non-seed] nodes automatically migrate the
 right
  data
 ~ to themselves.  (If no InitialToken is specified, they will pick
 one
 ~ such that they will get half the range of the most-loaded node.)
 ~ If a node starts up without bootstrapping, it will mark itself
  bootstrapped
 ~ so that you can't subsequently accidently bootstrap a node with
 ~ data on it.  (You can reset this by wiping your data and
 commitlog
 ~ directories.)
 ~
 ~ Off by default so that new clusters and upgraders from 0.4 don't
 ~ bootstrap immediately.  You should turn this on when you start
 adding
 ~ new nodes to a cluster that already has data on it.  (If you are
  upgrading
 ~ from 0.4, start your cluster with it off once before changing it
 to
  true.
 ~ Otherwise, no data will be lost but you will incur a lot of
 unnecessary
 ~ I/O before your cluster starts up.)
--
AutoBootstrapfalse/AutoBootstrap
  On Wed, Jan 5, 2011 at 4:58 PM, David Boxenhorn da...@lookin2.com
 wrote:
 
  If seed list should be the same across the cluster that means that
 nodes
  *should* have themselves as a seed. If that doesn't work for Ran,
 then that
  is the first problem, no?
 
 
  On Wed, Jan 5, 2011 at 3:56 PM, Jake Luciani jak...@gmail.com
 wrote:
 
  Well your ring issues don't make sense to me, seed list should be
 the
  same across the cluster.
  I'm just thinking of other things to try, non-boostrapped nodes
 should
  join the ring instantly but reads will fail if you aren't using
 quorum.
 
  On Wed, Jan 5, 2011 at 8:51 AM, Ran Tavory ran...@gmail.com
 wrote:
 
  I haven't tried repair.  Should I?
 
  On Jan 5, 2011 3:48 PM, Jake Luciani jak...@gmail.com wrote:
   Have you tried not bootstrapping but setting the token and
 manually
   calling
   repair?
  
   On Wed, Jan 5, 2011 at 7:07 AM, Ran Tavory ran...@gmail.com
 wrote:
  
   My conclusion is lame: I tried this on several hosts and saw the
 same
   behavior, the only way I was able to join new nodes was to first
   start them
   when they are *not in* their own seeds list and after they
   finish transferring the data, then restart them with themselves
 *in*
   their
   own seeds list. After doing that the node would join the ring.
   This is either my misunderstanding or a bug, but the only place
 I
   found it
   documented stated that the new node should not be in its own
 seeds
   list.
   Version 0.6.6.
  
   On Wed, Jan 5, 2011 at 10:35 AM, David Boxenhorn
   da...@lookin2.comwrote:
  
   My nodes all have themselves in their list of seeds - always
 did -
   and
   everything works. (You may ask why I did this. I don't know, I
 must
   have
   copied it from an example somewhere.)
  
   On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com
 wrote:
  
   I was able to make the node join the ring but I'm confused.
   What I did is, first when adding the node, this node was not
 in the
   seeds
   list of itself. AFAIK this is how it's supposed to be. So it
 was
   able to
   transfer all data to itself from other nodes but then it
 stayed in
   the
   bootstrapping state.
   So what I did (and I don't know why it works), is add this
 node to
   the
   seeds list in its

Re: Bootstrapping taking long

2011-01-05 Thread Ran Tavory

I see. Thanks for claryfing Jonathan.

On Wednesday, January 5, 2011, Jonathan Ellis jbel...@gmail.com wrote:
 1676 says Avoid dropping messages off the client request path.
 Bootstrap messages are off the client requst path.  So, if some of
 the nodes involved were loaded enough that they were dropping messages
 older than RPC_TIMEOUT to cope, it could lose part of the bootstrap
 communication permanently.

 On Wed, Jan 5, 2011 at 10:01 AM, Ran Tavory ran...@gmail.com wrote:
 OK, thanks, so I see we had the same problem (I too had multiple keyspace,
 not that I know why it matters to the problem at hand) and I see that by
 upgrading to 0.6.7 you solved your problem (I didn't try it, had a different
 workaround) but frankly, I don't understand
 how https://issues.apache.org/jira/browse/CASSANDRA-1676 would relate the
 the stuck bootstrap problem (I'm not saying that it isn't, I'd just like
 to understand why...)

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com


-- 
/Ran

Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory

Thanks Shimi, so indeed anticompaction was run on one of the other nodes
from the same DC but to my understanding it has already ended. A few hour
ago...
I plenty of log messages such as [1] which ended a couple of hours ago, and
I've seen the new node streaming and accepting the data from the node which
performed the anticompaction and so far it was normal so it seemed that data
is at its right place. But now the new node seems sort of stuck. None of the
other nodes is anticompacting right now or had been anticompacting since
then.
The new node's CPU is close to zero, it's iostats are almost zero so I can't
find another bottleneck that would keep it hanging.

On the IRC someone suggested I'd maybe retry to join this node,
e.g. decommission and rejoin it again. I'll try it now...


[1]
 INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
 INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
 INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
 INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]

On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:

 In my experience most of the time it takes for a node to join the cluster
 is the anticompaction on the other nodes. The streaming part is very fast.
 Check the other nodes logs to see if there is any node doing
 anticompaction.
 I don't remember how much data I had in the cluster when I needed to
 add/remove nodes. I do remember that it took a few hours.

 The node will join the ring only when it will finish the bootstrap.

 Shimi


 On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote:

 I asked the same question on the IRC but no luck there, everyone's asleep
 ;)...

 Using 0.6.6 I'm adding a new node to the cluster.
 It starts out fine but then gets stuck on the bootstrapping state for too
 long. More than an hour and still counting.

 $ bin/nodetool -p 9004 -h localhost streams
 Mode: Bootstrapping
 Not sending any streams.
 Not receiving any streams.


 It seemed to have streamed data from other nodes and indeed the load is
 non-zero but I'm not clear what's keeping it right now from finishing.

 $ bin/nodetool -p 9004 -h localhost info
 51042355038140769519506191114765231716
 Load : 22.49 GB
 Generation No: 1294133781
 Uptime (seconds) : 1795
 Heap Memory (MB) : 315.31 / 6117.00


 nodetool ring does not list this new node in the ring, although nodetool
 can happily talk to the new node, it's just not listing itself as a member
 of the ring. This is expected when the node is still bootstrapping, so the
 question is still how long might the bootstrap take and whether is it stuck.

 The data ins't huge so I find it hard to believe that streaming or anti
 compaction are the bottlenecks. I have ~20G on each node and the new node
 already has just about that so it seems that all data had already been
 streamed to it successfully, or at least most of the data... So what is it
 waiting for now? (same question, rephrased... ;)

 I tried:
 1. Restarting the new node. No good. All logs seem normal but at the end
 the node is still in bootstrap mode.
 2. As someone suggested I increased the rpc timeout from 10k to 30k
 (RpcTimeoutInMillis) but that didn't seem to help. I did this only on the
 new node. Should I have done that on all (old) nodes as well? Or maybe only
 on the ones that were supposed to stream data to that node.
 3. Logging level at DEBUG now but nothing interesting going on except
 for occasional messages such as [1] or [2]

 So the question is: what's keeping the new node from finishing the
 bootstrap and how can I check its status?
 Thanks

 [1] DEBUG [Timer-1] 2011-01-04 05:21:24,402 LoadDisseminator.java (line
 36) Disseminating load info ...
 [2] DEBUG [RMI TCP Connection(22)-192.168.252.88] 2011-01-04 05:12:48,033
 StorageService.java (line 1189) computing ranges for
 28356863910078205288614550619314017621,
 56713727820156410577229101238628035242

Running nodetool decommission didn't help. Actually the node refused to
decommission itself (b/c it wasn't part of the ring). So I simply stopped
the process, deleted all the data directories and started it again. It
worked in the sense of the node bootstrapped again but as before, after it
had finished moving the data nothing happened for a long time (I'm still
waiting, but nothing seems to be happening).

Any hints how to analyze a stuck bootstrapping node??
thanks

On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:

Thanks Shimi, so indeed anticompaction was run on one of the other nodes
from the same DC but to my understanding it has already ended. A few hour
ago...
I plenty of log messages such as [1] which ended a couple of hours ago, and
I've seen the new node streaming and accepting the data from the node which
performed the anticompaction and so far it was normal so it seemed that data
is at its right place. But now the new node seems sort of stuck. None of the
other nodes is anticompacting right now or had been anticompacting since
then.
The new node's CPU is close to zero, it's iostats are almost zero so I
can't find another bottleneck that would keep it hanging.

On the IRC someone suggested I'd maybe retry to join this node,
e.g. decommission and rejoin it again. I'll try it now...

[1]
INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]
INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]
INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]
INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]

On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:

In my experience most of the time it takes for a node to join the cluster
is the anticompaction on the other nodes. The streaming part is very fast.
Check the other nodes logs to see if there is any node doing
anticompaction.
I don't remember how much data I had in the cluster when I needed to
add/remove nodes. I do remember that it took a few hours.

The node will join the ring only when it will finish the bootstrap.

Shimi

On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote:

I asked the same question on the IRC but no luck there, everyone's asleep
;)...

Using 0.6.6 I'm adding a new node to the cluster.
It starts out fine but then gets stuck on the bootstrapping state for too
long. More than an hour and still counting.

$ bin/nodetool -p 9004 -h localhost streams
Mode: Bootstrapping
Not sending any streams.
Not receiving any streams.

It seemed to have streamed data from other nodes and indeed the load is
non-zero but I'm not clear what's keeping it right now from finishing.

$ bin/nodetool -p 9004 -h localhost info
51042355038140769519506191114765231716
Load : 22.49 GB
Generation No: 1294133781
Uptime (seconds) : 1795
Heap Memory (MB) : 315.31 / 6117.00

nodetool ring does not list this new node in the ring, although nodetool
can happily talk to the new node, it's just not listing itself as a member
of the ring. This is expected when the node is still bootstrapping, so the
question is still how long might the bootstrap take and whether is it stuck.

The data ins't huge so I find it hard to believe that streaming or anti
compaction are the bottlenecks. I have ~20G on each node and the new node
already has just about that so it seems that all data had already been
streamed to it successfully, or at least most of the data... So what is it
waiting for now? (same question, rephrased... ;)

I tried:
1. Restarting the new node. No good. All logs seem normal but at the end
the node is still in bootstrap mode.
2. As someone suggested I increased the rpc timeout from 10k to 30k
(RpcTimeoutInMillis) but that didn't seem to help. I did this only on the
new node. Should I have done that on all (old) nodes as well? Or maybe only
on the ones that were supposed to stream data to that node.
3

Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory

Thanks Jake, but unfortunately the streams directory is empty so I don't
think that any of the nodes is anti-compacting data right now or had been in
the past 5 hours.
It seems that all the data was already transferred to the joining host but
the joining node, after having received the data would still remain in
bootstrapping mode and not join the cluster. I'm not sure that *all* data
was transferred (perhaps other nodes need to transfer more data) but nothing
is actually happening so I assume all has been moved.
Perhaps it's a configuration error from my part. Should I use I use
AutoBootstrap=true ? Anything else I should look out for in the
configuration file or something else?

On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote:

In 0.6, locate the node doing anti-compaction and look in the streams
subdirectory in the keyspace data dir to monitor the anti-compaction
progress (it puts new SSTables for bootstrapping node in there)

On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote:

Any hints how to analyze a stuck bootstrapping node??
thanks

On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:

Thanks Shimi, so indeed anticompaction was run on one of the other nodes
from the same DC but to my understanding it has already ended. A few hour
ago...
I plenty of log messages such as [1] which ended a couple of hours ago,
and I've seen the new node streaming and accepting the data from the node
which performed the anticompaction and so far it was normal so it seemed
that data is at its right place. But now the new node seems sort of stuck.
None of the other nodes is anticompacting right now or had been
anticompacting since then.
The new node's CPU is close to zero, it's iostats are almost zero so I
can't find another bottleneck that would keep it hanging.

On the IRC someone suggested I'd maybe retry to join this node,
e.g. decommission and rejoin it again. I'll try it now...

On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:

In my experience most of the time it takes for a node to join the
cluster is the anticompaction on the other nodes. The streaming part is
very
fast.
Check the other nodes logs to see if there is any node doing
anticompaction.
I don't remember how much data I had in the cluster when I needed to
add/remove nodes. I do remember that it took a few hours.

The node will join the ring only when it will finish the bootstrap.

Shimi

On Tue, Jan 4, 2011 at 12:28 PM, Ran Tavory ran...@gmail.com wrote:

I asked the same question on the IRC but no luck there, everyone's
asleep ;)...

Using 0.6.6 I'm adding a new node to the cluster.
It starts out fine but then gets stuck on the bootstrapping state for
too long. More than an hour and still counting.

$ bin/nodetool -p 9004 -h localhost streams
Mode: Bootstrapping
Not sending any streams.
Not receiving any streams.

It seemed to have streamed data from other nodes and indeed the load is
non-zero but I'm not clear what's keeping it right now from finishing.

$ bin/nodetool -p 9004 -h localhost info
51042355038140769519506191114765231716
Load : 22.49 GB
Generation No: 1294133781
Uptime (seconds) : 1795
Heap Memory (MB) : 315.31 / 6117.00

nodetool

Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory

I'm still at lost. I haven't been able to resolve this. I tried
adding another node at a different location on the ring but this node
too remains stuck in the bootstrapping state for many hours without
any of the other nodes being busy with anti compaction or anything
else. I don't know what's keeping it from finishing the bootstrap,no
CPU, no io, files were already streamed so what is it waiting for?
I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
be anything addressing a similar issue so I figured there was no point
in upgrading. But let me know if you think there is.
Or any other advice...

On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote:
Thanks Jake, but unfortunately the streams directory is empty so I don't
think that any of the nodes is anti-compacting data right now or had been in
the past 5 hours. It seems that all the data was already transferred to the
joining host but the joining node, after having received the data would still
remain in bootstrapping mode and not join the cluster. I'm not sure that
*all* data was transferred (perhaps other nodes need to transfer more data)
but nothing is actually happening so I assume all has been moved.
Perhaps it's a configuration error from my part. Should I use I use
AutoBootstrap=true ? Anything else I should look out for in the configuration
file or something else?

On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote:

In 0.6, locate the node doing anti-compaction and look in the streams
subdirectory in the keyspace data dir to monitor the anti-compaction progress
(it puts new SSTables for bootstrapping node in there)

On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote:

Running nodetool decommission didn't help. Actually the node refused to
decommission itself (b/c it wasn't part of the ring). So I simply stopped the
process, deleted all the data directories and started it again. It worked in
the sense of the node bootstrapped again but as before, after it had finished
moving the data nothing happened for a long time (I'm still waiting, but
nothing seems to be happening).

Any hints how to analyze a stuck bootstrapping node??thanks
On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:
Thanks Shimi, so indeed anticompaction was run on one of the other nodes from
the same DC but to my understanding it has already ended. A few hour ago...

I plenty of log messages such as [1] which ended a couple of hours ago, and
I've seen the new node streaming and accepting the data from the node which
performed the anticompaction and so far it was normal so it seemed that data
is at its right place. But now the new node seems sort of stuck. None of the
other nodes is anticompacting right now or had been anticompacting since then.

The new node's CPU is close to zero, it's iostats are almost zero so I can't
find another bottleneck that would keep it hanging.
On the IRC someone suggested I'd maybe retry to join this node,
e.g. decommission and rejoin it again. I'll try it now...

INFO [COMPACTION-POOL:1] 2011-01-04 04:34:18,683 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3874-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3873-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvImpressions-3876-Data.db')]

INFO [COMPACTION-POOL:1] 2011-01-04 04:34:19,132 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-951-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-976-Data.db'),org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvRatings-978-Data.db')]

INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
(line 338) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/outbrain/cassandra/data/outbrain_kvdb/KvAds-6449-Data.db')]

On Tue, Jan 4, 2011 at 12:45 PM, shimi shim...@gmail.com wrote:

In my experience most of the time it takes for a node to join the cluster is
the anticompaction on the other nodes. The streaming part is very fast.
Check the other nodes logs to see if there is any node doing anticompaction.I
don't remember how much data I had in the cluster when I needed to add/remove
nodes. I do remember that it took a few hours.

The node will join the ring only when it will finish the bootstrap.
--
/Ran

--
/Ran

Re: Bootstrapping taking long

2011-01-04 Thread Ran Tavory

The new node does not see itself as part of the ring, it sees all others but
itself, so from that perspective the view is consistent.
The only problem is that the node never finishes to bootstrap. It stays in
this state for hours (It's been 20 hours now...)

$ bin/nodetool -p 9004 -h localhost streams
Mode: Bootstrapping
Not sending any streams.
Not receiving any streams.

On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com wrote:

Does the new node have itself in the list of seeds per chance? This
could cause some issues if so.

On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote:
I'm still at lost. I haven't been able to resolve this. I tried
adding another node at a different location on the ring but this node
too remains stuck in the bootstrapping state for many hours without
any of the other nodes being busy with anti compaction or anything
else. I don't know what's keeping it from finishing the bootstrap,no
CPU, no io, files were already streamed so what is it waiting for?
I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
be anything addressing a similar issue so I figured there was no point
in upgrading. But let me know if you think there is.
Or any other advice...

On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote:
Thanks Jake, but unfortunately the streams directory is empty so I don't
think that any of the nodes is anti-compacting data right now or had been in
the past 5 hours. It seems that all the data was already transferred to the
joining host but the joining node, after having received the data would
still remain in bootstrapping mode and not join the cluster. I'm not sure
that *all* data was transferred (perhaps other nodes need to transfer more
data) but nothing is actually happening so I assume all has been moved.
Perhaps it's a configuration error from my part. Should I use I use
AutoBootstrap=true ? Anything else I should look out for in the
configuration file or something else?

On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com wrote:

On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote:

Any hints how to analyze a stuck bootstrapping node??thanks
On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:
Thanks Shimi, so indeed anticompaction was run on one of the other nodes
from the same DC but to my understanding it has already ended. A few hour
ago...

I plenty of log messages such as [1] which ended a couple of hours ago,
and I've seen the new node streaming and accepting the data from the node
which performed the anticompaction and so far it was normal so it seemed
that data is at its right place. But now the new node seems sort of stuck.
None of the other nodes is anticompacting right now or had been
anticompacting since then.

The new node's CPU is close to zero, it's iostats are almost zero so I
can't find another bottleneck that would keep it hanging.
On the IRC someone suggested I'd maybe retry to join this node,
e.g. decommission and rejoin it again. I'll try it now...

INFO [COMPACTION-POOL:1] 2011-01-04 04:34:26,486 CompactionManager.java
(line 338) AntiCompacting

Re: Hector version

2010-12-30 Thread Ran Tavory

Use 0.6.0-19

On Friday, December 31, 2010, Zhidong She zhidong@gmail.com wrote:
 Hi guys,

 We are trying Cassandra 0.6.8, and could you please kindly tell me which 
 Hector Java client is suitable for 0.6.8?
 The Hector 0.7.0 says it's for Cassandra 0.7.X, and shall we use Hector 0.6.0?

 Thanks,
 Br
 Zhidong


-- 
/Ran

Re: Cassandra Monitoring

2010-12-19 Thread Ran Tavory

FYI, I just added an mx4j section to the bottom of this page
http://wiki.apache.org/cassandra/Operations


On Sun, Dec 19, 2010 at 4:30 PM, Jonathan Ellis jbel...@gmail.com wrote:

 mx4j? https://issues.apache.org/jira/browse/CASSANDRA-1068


 On Sun, Dec 19, 2010 at 8:36 AM, Peter Schuller 
 peter.schul...@infidyne.com wrote:

  How / what are you monitoring? Best practices someone?

 I recently set up monitoring using the cassandra-munin-plugins
 (https://github.com/jamesgolick/cassandra-munin-plugins). However, due
 to various little details that wasn't too fun to integrate properly
 with munin-node-configure and automated configuration management. A
 problem is also the starting of a JVM for each use of jmxquery, which
 can become a problem with many column families.

 I like your web server idea. Something persistent that can sit there
 and do the JMX acrobatics, and expose something more easily consumed
 for stuff like munin/zabbix/etc. It would be pretty nice to have that
 out of the box with Cassandra, though I expect that would be
 considered bloat. :)

 --
 / Peter Schuller




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com




-- 
/Ran

Re: iterate over all the rows with RP

2010-12-12 Thread Ran Tavory

This should be the case, yes, semantics isn't affected by the
connection and state isn't kept. What might happen if you read/write
with low consistency levels then when you hit a different host on the
ring it might have an inconsistent state in case of partition.

On Sunday, December 12, 2010, shimi shim...@gmail.com wrote:
 So if I will use a different connection (thrift via Hector), will I get the 
 same results? It's make sense when you use OPP and I assume it is the same 
 with RP. I just wanted to make sure this is the case and there is no state 
 which is kept.

 Shimi

 On Sun, Dec 12, 2010 at 8:14 PM, Peter Schuller peter.schul...@infidyne.com 
 wrote:

 Is the same connection is required when iterating over all the rows with
 Random Paritioner or is it possible to use a different connection for each
 iteration?

 In general, the choice of RPC connection (I assume you mean the
 underlying thrift connection) does not affect the semantics of the RPC
 calls.

 --
 / Peter Schuller




-- 
/Ran

Re: understanding the cassandra storage scaling

2010-12-09 Thread Ran Tavory

there are two numbers to look at, N the numbers of hosts in the ring
(cluster) and R the number of replicas for each data item. R is configurable
per column family.
Typically for large clusters N  R. For very small clusters if makes sense
for R to be close to N in which case cassandra is useful so the database
doesn't have a single a single point of failure but not so much b/c of the
size of the data. But for large clusters it rarely makes sense to have N=R,
usually N  R.

On Thu, Dec 9, 2010 at 12:28 PM, Jonathan Colby jonathan.co...@gmail.comwrote:

 I have a very basic question which I have been unable to find in
 online documentation on cassandra.

 It seems like every node in a cassandra cluster contains all the data
 ever stored in the cluster (i.e., all nodes are identical).  I don't
 understand how you can scale this on commodity servers with merely
 internal hard disks.   In other words, if I want to store 5 TB of
 data, does that each node need a hard disk capacity of 5 TB??

 With HBase, memcached and other nosql solutions it is more clear how
 data is spilt up in the cluster and replicated for fault tolerance.
 Again, please excuse the rather basic question.




-- 
/Ran

Re: understanding the cassandra storage scaling

2010-12-09 Thread Ran Tavory


 So is it not true that each node contains all the data in the cluster?

No, not in the general case, in fact rarely is it the case. Usually RN. In
my case I have N=6 and R=2.
You configure R per CF under ReplicationFactor (v0.6.*)
or replication_factor (v0.7.*).
http://wiki.apache.org/cassandra/StorageConfiguration


On Thu, Dec 9, 2010 at 12:43 PM, Jonathan Colby jonathan.co...@gmail.comwrote:

 Thanks Ran.  This helps a little but unfortunately I'm still a bit
 fuzzy for me.  So is it not true that each node contains all the data
 in the cluster? I haven't come across any information on how clustered
 data is coordinated in cassandra.  how does my query get directed to
 the right node?

 On Thu, Dec 9, 2010 at 11:35 AM, Ran Tavory ran...@gmail.com wrote:
  there are two numbers to look at, N the numbers of hosts in the ring
  (cluster) and R the number of replicas for each data item. R is
 configurable
  per column family.
  Typically for large clusters N  R. For very small clusters if makes
 sense
  for R to be close to N in which case cassandra is useful so the database
  doesn't have a single a single point of failure but not so much b/c of
 the
  size of the data. But for large clusters it rarely makes sense to have
 N=R,
  usually N  R.
 
  On Thu, Dec 9, 2010 at 12:28 PM, Jonathan Colby 
 jonathan.co...@gmail.com
  wrote:
 
  I have a very basic question which I have been unable to find in
  online documentation on cassandra.
 
  It seems like every node in a cassandra cluster contains all the data
  ever stored in the cluster (i.e., all nodes are identical).  I don't
  understand how you can scale this on commodity servers with merely
  internal hard disks.   In other words, if I want to store 5 TB of
  data, does that each node need a hard disk capacity of 5 TB??
 
  With HBase, memcached and other nosql solutions it is more clear how
  data is spilt up in the cluster and replicated for fault tolerance.
  Again, please excuse the rather basic question.
 
 
 
  --
  /Ran
 




-- 
/Ran

Re: Taking down a node in a 3-node cluster, RF=2

2010-11-28 Thread Ran Tavory

to me it makes sense that if hinted handoff is off then cassandra cannot
satisfy 2 out of every 3rd writes writes when one of the nodes is down since
this node is the designated node of 2/3 writes.
But I don't remember reading this somewhere. Does hinted handoff affect
David's situation?
(David, did you disable HH in your storage-config?
HintedHandoffEnabledfalse/HintedHandoffEnabled)

On Sun, Nov 28, 2010 at 4:32 PM, David Boxenhorn da...@lookin2.com wrote:

 For the vast majority of my data usage eventual consistency is fine (i.e.
 CL=ONE) but I have a small amount of critical data for which I read and
 write using CL=QUORUM.

 If I have a cluster with 3 nodes and RF=2, and CL=QUORUM does that mean
 that a value can be read from or written to any 2 nodes, or does it have to
 be the particular 2 nodes that store the data? If it is the particular 2
 nodes that store the data, that means that I can't even take down one node,
 since it will be the mandatory 2nd node for 1/3 of my data...




-- 
/Ran

Re: Hector question under cassandra 0.7

2010-11-02 Thread Ran Tavory

u...@cass to bcc

Indeed, the KeyspaceOperator isn't thread safe. (and in recent revisions it
was extracted to an interface at
http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/hector/api/Keyspace.javaand
implementation at
http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/model/ExecutingKeyspace.java

On Thu, Oct 21, 2010 at 12:10 AM, Ned Wolpert ned.wolp...@imemories.comwrote:

 I figure I'd reply to my own question in case this helps others.

 Talking on the IRC, having one KeyspaceOperator per thread (via
 ThreadLocal) makes sense.


 On Wed, Oct 20, 2010 at 9:13 AM, Ned Wolpert ned.wolp...@imemories.comwrote:

 Folks-

   I'm finally upgrading the grails-cassandra plugin for 0.7, and wanted to
 understand a bit more on the usage of the Cluster and KeyspaceOperator . Is
 the cluster object retrieved from HFactor.createCluster() thread safe, and
 is the KeyspaceOperator required to only be in one thread? Or are both
 thread safe objects? My assumption is I can call createCluster any time as
 it will only create one cluster object. I'm trying to decide if the
 KeyspaceOperator should be unique to each thread (threadlocal variable) or
 unique to each web request.

   Thanks

 --
 Virtually, Ned Wolpert

 Settle thy studies, Faustus, and begin...   --Marlowe




 --
 Virtually, Ned Wolpert

 Settle thy studies, Faustus, and begin...   --Marlowe




-- 
/Ran

Cassandra users meetup in Israel

2010-11-01 Thread Ran Tavory

http://cassandra-il.eventbrite.com/

Hi all, I'm organizing a users meetup in Israel, if you happen to be around
you're most welcome to join.

Event Details:

The first Cassandra users meetup in Israel will take place at outbrain,
Natanya on Tuesday Nov 16th 4pm.

Please register and get yourself a (free) ticket from eventbrite. Space is
limited and we want to make sure we have enough space for everyone, so
please register.

Once you register simply arrive at outbrain (directions below) and call Ran
from Saifun's reception desk.

We will cover:

 o  Operations (Ran, unless someone else wants to jump in)
 o  Internals and implementation overview (me again...)
 o  Success stories and pain points - an open roundtable.

Please advise, if you have specific subjects you're interested in or talks
you're willing to propose (15-30min) we''re naturally very open to
suggestions.

--
/Ran

Re: Cassandra 0.6.6

2010-10-12 Thread Ran Tavory

it's not official yet, in voting now.

On Tue, Oct 12, 2010 at 8:41 PM, marinko pasic marinko_pa...@hotmail.comwrote:

  Hi there,
 I'm just wandering is the Cassandra 0.6.6 release, which I found on:

 http://people.apache.org/~eevans/

 is an official release? If it is not, can you tell me where I can find
 official 0.6.6 Cassandra release?

 Thx
 Marinko




-- 
/Ran

Re: Nodes getting slowed down after a few days of smooth operation

2010-10-11 Thread Ran Tavory

Thanks Peter, Robert and Brandon.
So it seems that the only suspect by now is my excessive caching ;)
I'll get a better look at the GC activity next time shit starts to happen,
but in the mean time, as for the cache size (cassandra's internal cache),
it's row cache capacity is set to 10,000,000. I actually wanted to say 100%
but at the time there was a bug that interpreted 100% as just 1 so I used
10M instead.
My motivation was that since I don't have too much data (10G each node) then
why don't I cache the hell out of it, so I started with a cache size of 100%
and a much larger heap size (started with 12G out of the 16G ram). Over time
I've learned that too much heap for the JVM is like a kid in a candy shop,
it'll eat as much as it can and then throw up (the kid was GC storming), so
I started lowering the max heap until I reached 6G. with 4G I ran OOM BTW.
So now I have row cach capacity of effectively 100%, a heap size of 6G, data
of 10G and so I wonder how come the heap doesn't explode?
Well, as it turns out, although I have 10G data on each node, the row cache
effective size is only about  681 * 2377203 = 1.6G (bytes)

Key cache: disabled
Row cache capacity: 1000
Row cache size: *2377203*
Row cache hit rate: 0.7017551635100059
Compacted row minimum size: 392
Compacted row maximum size: 102961
Compacted row mean size: *681*

This strengthens what both Peter and Brandon have suggested that the row
cache is generating too much GC b/c it gets invalidated too frequently.
That's certainly possible, so I'll try to set a 50% row cache size on one of
the nodes (and wait about a week...) and see what happens, and if this
proves to be the answer then this means that my dream of I have so little
data and so much ram, why don't I cache the hell out of it isn't going to
come true b/c too much of the row cache gets invalidated and hence GCed
which creates too much overhead for the JVM. (well, at least I was getting
nice read performance while it lasted ;)
If this is true, then how would you recommend optimizing the row cache size
for maximum utility and minimum GC overhead?

I've pasted here a log snippet from one of the servers while it was at high
CPU and GCing http://pastebin.com/U1cszFKv
You can see a large number of pending reads as well as other pending tasks
(response stage or consistency manager).
GC runs every like 20-40 seconds and almost for the entire duration of that
20-40 secs. I'm not sure what to make of all the other numbers such as: GC
for ConcurrentMarkSweep: 22742 ms, 181335192 reclaimed leaving 6254994856
used; max is 6552551424

Thanks!

On Mon, Oct 11, 2010 at 7:42 PM, Peter Schuller peter.schul...@infidyne.com
 wrote:

  170141183460469231731687303715884105727
  192.168.252.88Up 10.07 GB

 Firstly, I second the point raised about the row cache size (very
 frequent concurrent GC:s is definitely an indicator that the JVM heap
 size is too small, and the row cache seems like a likely contender -
 especially given that you say it builds up over days). Note though
 that you have to look at the GCInspector's output with respect to the
 concurrent mark/sweep GC phases to judge the live set in your heap,
 rather than system memory. Attaching with jconsole or visualvm to the
 JVM will also give you a pretty good view of what's going on. Look for
 the heap usage as it appears after one of the major dips in the
 graph (not the regular sawtooth dips, which are young generation
 collections and won't help indicate actual live set).

 That said, with respect to caching effects: Your total data size seems
 to be about in the same ballpark as memory. Your maximum heap size is
 6 gig; on a 16 gig machine, taking into account varous overhead, maybe
 you've got something like 8 GB for buffer cache? It doesn't sound
 strange at all that there would be a significant difference between a
 32 GB machine and a 16 GB machine given your ~ 10 GB data size given
 that buffer cache size goes from slightly below data size to almost
 three times data size. Especially when major or almost-major
 compactions are triggered; on the small machine you would expect to
 evict everything from cache during a compaction (except that touched
 *during* the compaction) while on the larger machine the newly written
 sstables effectively fit the cache too.

 But note that these are two pretty different conditions; the first is
 about making sure your JVM heap size is appropriate. The second can be
 tested for by observing I/O load (iostat -x -k 1) and correlating with
 compactions. So e.g., what's the average utilization and queue size in
 iostat just before a compaction vs. just after it? That difference
 should be due to cache eviction (assuming you're not servicing a
 built-up backlog). There is also the impact of compaction itself, as
 it is happening, and the I/O it generates. In general, the higher your
 disk

Re: Nodes getting slowed down after a few days of smooth operation

2010-10-11 Thread Ran Tavory

Peter, you're my JVM GC hero!
Thank you!

On Tue, Oct 12, 2010 at 12:38 AM, Peter Schuller 
peter.schul...@infidyne.com wrote:

  My motivation was that since I don't have too much data (10G each node)
 then
  why don't I cache the hell out of it, so I started with a cache size of
 100%
  and a much larger heap size (started with 12G out of the 16G ram). Over
 time
  I've learned that too much heap for the JVM is like a kid in a candy
 shop,
  it'll eat as much as it can and then throw up (the kid was GC storming),

 In general CMS will tend to gobble up the maximum heap size unless
 your workload is such that the heuristics really work well and don't
 expand the heap beyond some level, but it won't magically fill the
 heap with data that doesn't exist. If you were reaching the maximum
 heap size with 12 GB, making the heap 6 GB instead won't make it
 better.

 Also, just be sure that you're really having an issue with GC. For
 example frequent young-generation GC:s are fully expected and normal.
 If you are seeing extremely frequent concurrent mark/sweep phases that
 do not free up a lot of data - that is an indication that the heap is
 too small.

 So, with respect to GC storming, a bigger heap is generally better.
 The bigger the heap, the more effective GC is and the less often a
 concurrent mark/sweep has to happen.

 But this does not mean you want to give it too big a heap either,
 since whatever is gobbled up by the heap *won't* be used by the
 operating system for buffer caching.

 Keeping a big row cache may or may not be a good idea depending on
 circumstances, but if you have one, that directly implies additional
 heap usage and the heap must be sized accordingly. The row cache are
 just objects in memory; there is no automatic row cache size
 adjustment in response to heap pressure.

 If 10 million rows is your entire data set, and if that dataset is 10
 GB on disk (without in-memory object overhead), then I am not
 surprised at all that you're seeing issues after a few days of uptime.
 Likely the row cache is just much too big for the heap.

  so
  I started lowering the max heap until I reached 6G. with 4G I ran OOM
 BTW.

 Note that OOM and GC storming are often equivalent in terms of their
 cause (unless the OOM is caused by a single huge allocation or
 something). It's just that actually determining whether you are out
 of memory is difficult for the JVM, so there are heuristics involved.
 You may be sufficiently out of memory that you see excessive GC
 activity, but not so much as to trigger the threshold of GC
 inefficiency at which the JVM decides to actually through an OOM.

  So now I have row cach capacity of effectively 100%, a heap size of 6G,
 data
  of 10G and so I wonder how come the heap doesn't explode?

 Well, everything up to now has suggested to me that it *is* exploding ;)

 But:

  Well, as it turns out, although I have 10G data on each node, the row
 cache
  effective size is only about  681 * 2377203 = 1.6G (bytes)
  Key cache: disabled
  Row cache capacity: 1000
  Row cache size: 2377203
  Row cache hit rate: 0.7017551635100059
  Compacted row minimum size: 392
  Compacted row maximum size: 102961
  Compacted row mean size: 681
  This strengthens what both Peter and Brandon have suggested that the row
  cache is generating too much GC b/c it gets invalidated too frequently.

 Note that the compacted row size is not directly indicative of
 in-memory row size. I'm not sure what the overhead is expected to be
 though off hand; but you can probably assume a factor of 2 just from
 general fragmentation issue. Add to that overhead from the
 representation in object form itself etc. 1.6x2 = 3.2. Now we're
 starting to get close, especially taking into account additional
 overhead and other things on the heap.

  That's certainly possible, so I'll try to set a 50% row cache size on one
 of
  the nodes (and wait about a week...) and see what happens, and if this
  proves to be the answer then this means that my dream of I have so
 little
  data and so much ram, why don't I cache the hell out of it isn't going
 to
  come true b/c too much of the row cache gets invalidated and hence GCed
  which creates too much overhead for the JVM. (well, at least I was
 getting
  nice read performance while it lasted ;)

 Given that you're not hitting your maximum cache size, data isn't
 evicted from the cache except as it is updated. Presumably that means
 you're actually not hitting the worst-case scenario, which is LRU
 eviction. Even then though, it's not as simple as it just being too
 much for the JVM. Especially given the rows/second that you'd expect
 to be evicted in Cassandra. A high rate of eviction does mean you need
 more margin in terms of free heap, but I seriously doubt the
 fundamental problem here is GC throughput vs. eviction rate.

 In general, I cannot

Re: Cassandra for graph data structure

2010-09-25 Thread Ran Tavory

Courtney this certainly sounds interesting and as Nate suggested we're
always looking for valuable contributions.
A few things to keep in mind:
- I'm curious, as Lucas has asked - is it possible to create an
efficient graph API over cassandra and what are the tradeoffs?
- If the API is general enough and the functionality is reusable then
we'd be happy to add it to hector. If not, you can create a library
that uses hector as a layer.

On Friday, September 24, 2010, Courtney Robinson sa...@live.co.uk wrote:
 ?Nate  Lucas thanks for the responses.
 Nate, I think it would be asking a bit much to suggest the hector team 
 implement convenience methods for
 a graph representations. But if we went ahead and forked hector, I'd be sure 
 to contribute back what i can and just release it as another client
 or if the final product can be merged with hector...
 I'd like thoughts on any features outside my own usecase though so that we 
 can build it to handle other use cases as well.

 Lucas, I understand what you're saying but i've had a quick play with neo4j 
 and the expense we'd pay for reads offsets a lot of the
 setbacks i'd run into using neo4j, not to mention having to learn it...

 --
 From: Nate McCall n...@riptano.com
 Sent: Friday, September 24, 2010 4:14 PM
 To: user@cassandra.apache.org
 Subject: Re: Cassandra for graph data structure


 My idea however was to fork hector, remove all the stuff i don't need and
 turn it into a graph API sitting on top of Cassandra.


 We are always looking for ideas and design feedback regarding Hector.
 Please feel free to make suggestions or fork and send pull requests.
 http://groups.google.com/group/hector-users

Re: Client developer mailing list

2010-08-30 Thread Ran Tavory

awesome, thanks, I'm subscribed :)

On Mon, Aug 30, 2010 at 10:05 PM, Jeremy Hanna
jeremy.hanna1...@gmail.comwrote:

 There has been a new mailing list created for those who are working on
 Cassandra clients above thrift and/or avro.  You can subscribe by sending an
 email to client-dev-subscr...@cassandra.apache.org or using the link at
 the bottom of http://cassandra.apache.org

 The list is meant to give client authors a discussion forum as well as a
 place to interact with core cassandra developers about the roadmap and
 upcoming features.

 Thanks to Cliff Moon (@moonpolysoft) for starting a discussion about client
 quality at the Cassandra Summit.

Re: Read before Write

2010-08-27 Thread Ran Tavory

I haven't benchmarked so it's purely theoretical.
If there's no caching then I'm pretty sure just writing would yield better
performance.
If you do cache rows/keys it really depends on your hit ratio. Naturally if
you have a small data set and high cache ratio and use row caching I'm
pretty sure it's better to read first.
Although writes are order of magnitude faster than reads, if you have high
write rate then cassandra might throttle you at different bottlenecks,
depending on your hardware and data so for example disk is many times a
bottleneck (and you can teak storage-conf to improve that), sometimes memory
is pressing and I have seen also CPU pressure although it's less common.
You need to also keep in mind that even if you write the same value but with
a newer timestamp then cassandra will have to run compactions and that's
where disk/mem is usually bottlenecking.

Bottom line - if you can cache (have enough mem) and there's good hit ratio,
cache entire rows and read first. If not, always write first and make sure
compactions aren't killing you, if they are, tweak storage-conf to do less
compactions.


On Fri, Aug 27, 2010 at 5:44 PM, Chen Xinli chen.d...@gmail.com wrote:

 I think Just writing all the time is much better, as most of replacements
 will be done in memtable.

 also you should set a large memtable size, in compared with the average row
 size.


 2010/8/27 Daniel Doubleday daniel.double...@gmx.net

 Hi people

 I was wondering if anyone already benchmarked such a situation:

 I have:

 day of year (row key) - SomeId (column key) - byte[0]

 I need to make sure that I write SomeId, but in around 80% of the cases it
 will be already present (so I would essentially replace it with itself). RF
 will be 2.

 So should I rather just write all the time (given that cassandra is so
 fast on write) or should I read and write only if not present?

 Cheers,
 Daniel




 --
 Best Regards,
 Chen Xinli

Re: Calls block when using Thrift API

2010-08-27 Thread Ran Tavory

did you try connecting to a real cassandra instance, not an embedded one?
I use an embedded one for testing and it works, but just to narrow down your
problem.

On Fri, Aug 27, 2010 at 6:13 PM, Ruben de Laat ru...@logic-labs.nl wrote:

 Hi,

 I am new to cassandra, so maybe I am missing something obvious...
 Version: Latest nightly build (2010-08-23_13-57-40), but same results
 with 0.7.0b1

 Server code (default configuration file):

 System.setProperty(cassandra.config, conf/cassandra.yaml);
 EmbeddedCassandraService embeddedCassandraService = new
 EmbeddedCassandraService();
 embeddedCassandraService.init();

 Client code:

 Socket socket = new Socket(127.0.0.1, 9160);
 TSocket transport = new TSocket(socket);
 TBinaryProtocol tBinaryProtocol = new TBinaryProtocol(transport);
 Client client = new Client(tBinaryProtocol);
 System.out.println(client.describe_cluster_name());

 The problem is that it hangs/blocks on the
 client.describe_cluster_name() call, actually it hangs on any call I
 have tried.
 I was first trying with the Pelops client, but that one is using the
 Thrift API as well, so this is narrowed down.

 I have already tried multiple different combination of creating the
 client (different transports).
 I have also tried with thrift_framed_transport_size_in_mb: 0
 disabling framed transports.

 Starting the client without a running server gives a proper
 Connection refused, so some sort of connection is definitely made.

 Thanks and Kind regards,
 Ruben

Re: [RELEASE] 0.7.0 beta1

2010-08-18 Thread Ran Tavory

[cross posting to u...@cass and hector-use...@googlegroups]

Happy to announce hector's support in 0.7.0. Hector is a java client for
cassandra which wraps the low level thrift interface with a nicer API, adds
monitoring, connection pooling and more.
I didn't do anything... The amazing 0.7.0 work was done by Ed (thanks Ed)
with support from Nate (thanks)

This version includes support in a 0.7.0 cluster with feature parity to
0.6.0 (all API calls that used to work in 0.6.* now work with 0.7.0-beta1).
Complete support for all 0.7.0 new features is in the works and will be
available soon (the system_ calls).
A good place to start with some examples is here
http://github.com/rantav/hector/blob/0.7.0/src/test/java/me/prettyprint/cassandra/model/ApiV2SystemTest.javaand
here
http://github.com/rantav/hector/blob/0.7.0/src/main/java/me/prettyprint/cassandra/examples/ExampleDaoV2.java

The code is here http://github.com/rantav/hector/tree/0.7.0, a zip file with
all dependencies is on the downloads page (
http://github.com/rantav/hector/downloads) and here's a direct link
http://github.com/downloads/rantav/hector/hector-0.7.0-16.zip

Enjoy

On Fri, Aug 13, 2010 at 11:24 PM, Eric Evans eev...@rackspace.com wrote:

Happy Friday the 13th. Are you feeling lucky? I know I am.

Ok, first off, a disclaimer.

As the suffix on the version indicates this is *beta* software. If you
run off and upgrade a production server with this there is a very good
chance that you are going to be sad/fired/mocked/ridiculed/laughed
at/sorry.

FUD aside, any help testing 0.7.0-beta1 would be very appreciated. The
list of changes is enormous[1] and we want to make sure we shake as many
bugs out before the final as possible.

If you're coming from 0.6, there are some things to keep in mind, and
they're documented in the release notes[2], so be sure to read them.

If you find bugs, please file a report[3], and if you have questions,
don't hesitate to ask them.

Have fun!

[1]: http://bit.ly/d4HOMw
[2]: http://bit.ly/9fcewt
[3]: https://issues.apache.org/jira/browse/CASSANDRA

--
Eric Evans
eev...@rackspace.com

KeyRange.token in 0.7.0

2010-08-18 Thread Ran Tavory

I'm a bit confused WRT KeyRange's tokens in 0.7.0
When making a range query you can either use KeyRange.key or KeyRange.token.
In 0.7.0 key was typed as byte[]. tokens remain strings.
What does this string represent in case of a RP and in case of an OPP? Did
this change in 0.7.0?

AFAIK in 0.6.0 if the partitioner is OPP then the tokens are actual strings
and they might just be actual subset of the keys. When using a RP tokens are
BigIntegers (keys are still strings) and I'm not actually sure if you're
allowed to shoot a range query using tokens...

In 0.7.0 since keys are now bytes, when using an OPP, how do those bytes
translate to strings? I'd assume it'd just be byte[] - UTF8 conversion,
only that this may result in illegal UTF8 chars when keys are just random
bytes, so I guess not... Perhaps md5 hashing? But then if using an OPP and
keys are actual strings, I want to have the same 0.6.0 functionality in
place, meaning tokens are strings like the keys. I actually tested this
scenario and it looks working, so it seems like the String keys are
translated to UTF8, but what happens when they are invalid UTF8?
Another question is what's the story with RP in 0.7.0? Should range query
even be supported with tokens? If so, then are the tokens expected to be
string of integers? (e.g. 1234567890)

Thanks.

Re: KeyRange.token in 0.7.0

2010-08-18 Thread Ran Tavory

On Wed, Aug 18, 2010 at 4:30 PM, Jonathan Ellis jbel...@gmail.com wrote:

 (a) if you're using token queries and you're not hadoop, you're doing it
 wrong

ah, didn't know that, so I guess I'll remove support for it from hector...


 (b) they are expected to be of the form generated by
 TokenFactory.toString and fromString. You should not be generating
 them yourself.

 On Wed, Aug 18, 2010 at 7:56 AM, Ran Tavory ran...@gmail.com wrote:
  I'm a bit confused WRT KeyRange's tokens in 0.7.0
  When making a range query you can either use KeyRange.key or
 KeyRange.token.
  In 0.7.0 key was typed as byte[]. tokens remain strings.
  What does this string represent in case of a RP and in case of an OPP?
 Did
  this change in 0.7.0?
  AFAIK in 0.6.0 if the partitioner is OPP then the tokens are actual
 strings
  and they might just be actual subset of the keys. When using a RP tokens
 are
  BigIntegers (keys are still strings) and I'm not actually sure if you're
  allowed to shoot a range query using tokens...
  In 0.7.0 since keys are now bytes, when using an OPP, how do those bytes
  translate to strings? I'd assume it'd just be byte[] - UTF8 conversion,
  only that this may result in illegal UTF8 chars when keys are just random
  bytes, so I guess not... Perhaps md5 hashing? But then if using an OPP
 and
  keys are actual strings, I want to have the same 0.6.0 functionality in
  place, meaning tokens are strings like the keys. I actually tested this
  scenario and it looks working, so it seems like the String keys are
  translated to UTF8, but what happens when they are invalid UTF8?
  Another question is what's the story with RP in 0.7.0? Should range query
  even be supported with tokens? If so, then are the tokens expected to be
  string of integers? (e.g. 1234567890)
  Thanks.



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com

Re: File write errors but cassandra isn't crashing

2010-08-18 Thread Ran Tavory

I opened as an improvement suggestion:
https://issues.apache.org/jira/browse/CASSANDRA-1409

On Mon, Aug 16, 2010 at 8:26 PM, Benjamin Black b...@b3k.us wrote:

 Useful config option, perhaps?

 On Mon, Aug 16, 2010 at 8:51 AM, Jonathan Ellis jbel...@gmail.com wrote:
  That's a tough call -- you can also come up with scenarios where you'd
  rather have it read-only than completely dead.
 
  On Wed, Aug 11, 2010 at 12:38 PM, Ran Tavory ran...@gmail.com wrote:
  Due to administrative error one of the hosts in the cluster lost
 permission
  to write to it's data directory.
  So I started seeing errors in the log, however, the server continued
 serving
  traffic. It wasn't able to compact and do other write operations but it
  didn't crash.
  I was wondering wether that's by design and if so, is this a good one...
 I
  guess I want to know if really bad things happen to my cluster...
  logs look like that...
 
   INFO [FLUSH-TIMER] 2010-08-11 07:53:14,683 ColumnFamilyStore.java (line
  357) KvAds has reached its threshold; switching in a fresh Memtable at
  CommitLogContext(file='/outbrain/cassandra/commitlog/Commi
  tLog-1281505164614.log', position=88521163)
   INFO [FLUSH-TIMER] 2010-08-11 07:53:14,683 ColumnFamilyStore.java (line
  609) Enqueuing flush of Memtable(KvAds)@851225759
   INFO [FLUSH-WRITER-POOL:1] 2010-08-11 07:53:14,684 Memtable.java (line
 148)
  Writing Memtable(KvAds)@851225759
  ERROR [FLUSH-WRITER-POOL:1] 2010-08-11 07:53:14,688
  DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask
  java.util.concurrent.ExecutionException: java.lang.RuntimeException:
  java.io.FileNotFoundException:
  /outbrain/cassandra/data/outbrain_kvdb/KvAds-tmp-249-Data.db (Permission
  denied)
  at
  java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
  at java.util.concurrent.FutureTask.get(FutureTask.java:83)
  at
 
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:619)
  Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
  /outbrain/cassandra/data/outbrain_kvdb/KvAds-tmp-249-Data.db (Permission
  denied)
  at
  org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
  at
  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
  at
  java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  ... more
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of Riptano, the source for professional Cassandra support
  http://riptano.com

Re: The entry of Cassandra

2010-08-16 Thread Ran Tavory

The common practice is to connect to a few hosts and send request in round
robin or other lb tactic. The hosts are symmetric so any host will do.
There are also higher lever libraries that help with that as well as
connection pooling and other goodies

On Mon, Aug 16, 2010 at 1:14 PM, Ying Tang ivytang0...@gmail.com wrote:

After reading the docs and the  thrift demo , i found that if the demo
 ,if we want to connect to the database , we must  first do  TTransport tr
 = new TSocket(localhost, 9160)  .
Then  we operate on the database through this TTransport .
But this operation assigns a fixed IP , so all requests would
 transformed to this IP . And the cassandra node of this ip would load a
 heavy reading load and proxy load .

Do i understand this wrong , or cassandra client has other way to access
 cassandra and doesn't need to assign a fixed IP?



 --
 Best regards,

 Ivy Tang






 --
 Best regards,

 Ivy Tang

File write errors but cassandra isn't crashing

2010-08-11 Thread Ran Tavory

Due to administrative error one of the hosts in the cluster lost permission
to write to it's data directory.
So I started seeing errors in the log, however, the server continued serving
traffic. It wasn't able to compact and do other write operations but it
didn't crash.
I was wondering wether that's by design and if so, is this a good one... I
guess I want to know if really bad things happen to my cluster...

logs look like that...

 INFO [FLUSH-TIMER] 2010-08-11 07:53:14,683 ColumnFamilyStore.java (line
357) KvAds has reached its threshold; switching in a fresh Memtable at
CommitLogContext(file='/outbrain/cassandra/commitlog/Commi
tLog-1281505164614.log', position=88521163)
 INFO [FLUSH-TIMER] 2010-08-11 07:53:14,683 ColumnFamilyStore.java (line
609) Enqueuing flush of Memtable(KvAds)@851225759
 INFO [FLUSH-WRITER-POOL:1] 2010-08-11 07:53:14,684 Memtable.java (line 148)
Writing Memtable(KvAds)@851225759
ERROR [FLUSH-WRITER-POOL:1] 2010-08-11 07:53:14,688
DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask
*java.util.concurrent.ExecutionException: java.lang.RuntimeException:
java.io.FileNotFoundException:
/outbrain/cassandra/data/outbrain_kvdb/KvAds-tmp-249-Data.db (Permission
denied)
*at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
*Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
/outbrain/cassandra/data/outbrain_kvdb/KvAds-tmp-249-Data.db (Permission
denied)
*at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
... more

Re: RuntimeException: Cannot service reads while bootstrapping!

2010-08-04 Thread Ran Tavory

ok, so I don't send writes to bootstrapping or decommissioned nodes, that's
cool, but what about the inconsistent ring view after nodetool move, isn't
this strange?
After the move, the moved node has the correct view of the ring but all
other nodes have the old view. I waited a few minutes after the log said
that Bootstrap/move completed! Now serving reads but this didn't help, view
was still inconsistent. Only restarting the moved node helped other nodes
realize the change.

On Wed, Aug 4, 2010 at 3:24 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Don't point clients at nodes that aren't part of the ring.  Cassandra
 rejecting requests when you do is a feature.

 On Wed, Aug 4, 2010 at 6:52 AM, Ran Tavory ran...@gmail.com wrote:
  Is this a known issue?
  Running 0.6.2 I moved a node to different token and eventually saw errors
 in
  the log.
 
  ERROR [ROW-READ-STAGE:116804] 2010-08-04 06:34:29,699
  DebuggableThreadPoolExecutor.java (line 101) Error in ThreadPoolExecutor
  java.lang.RuntimeException: Cannot service reads while bootstrapping!
  at
  org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:66)
  at
 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:619)
  ERROR [ROW-READ-STAGE:116805] 2010-08-04 06:34:29,700
 CassandraDaemon.java
  (line 82) Fatal exception in thread Thread[ROW-READ-STAGE:116805,5,main]
  java.lang.RuntimeException: Cannot service reads while bootstrapping!
  at
  org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:66)
  at
 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:619)
  ... many more of those and then...
   INFO [MESSAGE-DESERIALIZER-POOL:1] 2010-08-04 06:34:29,709
  StorageService.java (line 181) Bootstrap/move completed! Now serving
 reads.
 
  The move ended up ok but during the operation the log was filled with
  those errors and at the end of it the ring state was inconsistent.
  If I ask the moved node where it is in the ring it tells me something but
  other nodes tell something else...
  (ob1124)(cassan...@cass24:apache-cassandra-0.6.2)$ nodetool -h
  192.168.254.58 -p 9004 ring
  Address   Status Load  Range
   Ring
 
  170141183460469231731687303715884105727
  192.168.252.88Up 5.7 GB
   14131484407726020523932116250949797205 |--|
  192.168.252.124Up 2.44 GB
  56713727820156410577229101238628035242 |   ^
  192.168.254.58Up 8.13 GB
  113427455640312821154458202477256070484v   |
  192.168.254.57Up 6.52 GB
  113427455640312821154458202477256070485|   ^
  192.168.252.125Up 6.52 GB
  141784319550391026443072753096570088105v   |
  192.168.254.59Up 1.63 GB
  170141183460469231731687303715884105727|--|
  (ob1124)(cassan...@cass24:apache-cassandra-0.6.2)$ nodetool
  -h 192.168.252.124 -p 9004 ring
  Address   Status Load  Range
   Ring
 
  170141183460469231731687303715884105727
  192.168.252.88Up 5.7 GB
   14131484407726020523932116250949797205 |--|
  192.168.252.124Up 2.46 GB
  56713727820156410577229101238628035242 |   ^
  192.168.254.57Up 6.52 GB
  113427455640312821154458202477256070485v   |
  192.168.252.125Up 6.52 GB
  141784319550391026443072753096570088105|   ^
  192.168.254.58Up 1.63 GB
  141784319550391026443072753096570088106v   |
  192.168.254.59Up 1.63 GB
  170141183460469231731687303715884105727|--|
  Restarting the moved node fixes the ring view by other hosts.
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com

Re: about cassandra compression

2010-07-26 Thread Ran Tavory

cassandra doesn't compress before storing, no.
It may be beneficial to compress, depending on the size of your data,
network latency, disk size and data compressability... You'll need to test.
I sometimes compress, depending on data size but it's done in the client,

On Mon, Jul 26, 2010 at 1:31 PM, john xie shanfengg...@gmail.com wrote:

 is cassandra  compression before stored?
 when I stored the data,  is  compression  beneficial to reduce the storage
 space?

Re: CRUD test

2010-07-25 Thread Ran Tavory

Oleg, note that the unofficial recommendation is to use microsec, not mili.
As jonathan notes, although there isn't a real way to get microsec in java,
at the very least you should take the mili and multiply it by 1000. If you
use hector then just use Keyspace.createTimestamp() (
http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/Keyspace.java#L236
)

On Sun, Jul 25, 2010 at 8:54 AM, Oleg Tsvinev oleg.tsvi...@gmail.comwrote:

 Thank you guys for your help!

 Yes, I am using System.currentTimeMillis() in my CRUD test. Even though I'm
 still using it my tests now run as expected. I do not use cassandra-cli
 anymore.

 @Ran great job on Hector, I wish there was more documentation but I
 managed.

 @Jonathan, what is the recommended time source? I use batch_mutation to
 insert and update multiple columns atomically. Do I have to use
 the batch_mutation for deletion, too?

 On Sat, Jul 24, 2010 at 2:36 PM, Jonathan Shook jsh...@gmail.com wrote:

 Just to clarify, microseconds may be used, but they provide the same
 behavior as milliseconds if they aren't using a higher time resolution
 underneath. In some cases, the microseconds are generated simply as
 milliseconds * 1000, which doesn't actually fix any sequencing bugs.

 On Sat, Jul 24, 2010 at 3:46 PM, Ran Tavory ran...@gmail.com wrote:
  Hi Oleg, I didn't follow up the entire thread, but just to let you know
 that
  the 0.6.* version of the CLI uses microsec as the time unit for
 timestamps.
  Hector also uses micros to match that, however, previous versions of
 hector
  (as well as the CLI) used milliseconds, not micro.
  So if you're using hector version 0.6.0-11 or earlier, or by any chance
 in
  some other ways are mixing milisec in your app (are you using
  System.currentTimeMili() somewhere?) then the behavior you're seeing is
  expected.
 
  On Sat, Jul 24, 2010 at 1:06 AM, Jonathan Shook jsh...@gmail.com
 wrote:
 
  I think you are getting it.
 
  As far as what means what at which level, it's really about using them
  consistently in every case. The [row] key (or [row] key range) is a
  top-level argument for all of the operations, since it is the key to
  mapping the set of responsible nodes. The key is the part of the name
  of any column which most affects how the load is apportioned in the
  cluster, so it is used very early in request processing.
 
 
  On Fri, Jul 23, 2010 at 4:22 PM, Peter Minearo
  peter.mine...@reardencommerce.com wrote:
   Consequentially the remove should look like:
  
  
   ColumnPath cp1 = new ColumnPath(Super2);
  cp1.setSuper_column(Best Western.getBytes());
  
  client.remove(KEYSPACE,
hotel,
cp1,
System.currentTimeMillis(),
ConsistencyLevel.ONE);
  
  ColumnPath cp2 = new ColumnPath(Super2);
  cp2.setSuper_column(Econolodge.getBytes());
  
  client.remove(KEYSPACE,
hotel,
cp2,
System.currentTimeMillis(),
ConsistencyLevel.ONE);
  
  
   -Original Message-
   From: Peter Minearo [mailto:peter.mine...@reardencommerce.com]
   Sent: Fri 7/23/2010 2:17 PM
   To: user@cassandra.apache.org
   Subject: RE: CRUD test
  
   CORRECTION:
  
   ColumnPath cp1 = new ColumnPath(Super2);
   cp1.setSuper_column(Best Western.getBytes());
   cp1.setColumn(name.getBytes());
   client.insert(KEYSPACE, hotel, cp1, Best Western of
 SF.getBytes(),
   System.currentTimeMillis(), ConsistencyLevel.ALL);
  
  
   -Original Message-
   From: Peter Minearo [mailto:peter.mine...@reardencommerce.com]
   Sent: Friday, July 23, 2010 2:14 PM
   To: user@cassandra.apache.org
   Subject: RE: CRUD test
  
   Interesting!! Let me rephrase to make sure I understood what is going
   on:
  
   When Inserting data via the insert function/method:
  
   void insert(string keyspace, string key, ColumnPath column_path,
 binary
   value, i64 timestamp, ConsistencyLevel consistency_level)
  
   The key parameter is the actual Key to the Row, which contains
   SuperColumns.  The 'ColumnPath' gives the path within the Key.
  
  
  
   INCORRECT:
   ColumnPath cp1 = new ColumnPath(Super2);
   cp1.setSuper_column(hotel.getBytes());
   cp1.setColumn(Best Western.getBytes()); client.insert(KEYSPACE,
   name, cp1, Best Western of SF.getBytes(),
 System.currentTimeMillis(),
   ConsistencyLevel.ALL);
  
  
   CORRECT:
   ColumnPath cp1 = new ColumnPath(Super2);
   cp1.setSuper_column(name.getBytes());
   cp1.setColumn(Best Western.getBytes()); client.insert(KEYSPACE,
   hotel, cp1, Best Western of SF.getBytes(),
 System.currentTimeMillis(),
   ConsistencyLevel.ALL);
  
  
  
  
  
   -Original Message-
   From: Jonathan Shook [mailto:jsh

Re: Question on Eventual Consistency

2010-07-19 Thread Ran Tavory

if your test case is correct then it sounds like a bug to me. With one node,
unless you're writing with CL=0 you should get full consistency.

On Mon, Jul 19, 2010 at 10:14 PM, Hugo h...@unitedgames.com wrote:

 Hi,

 Being fairly new to Cassandra I have a question on the eventual
 consistency. I'm currently performing experiments with a single-node
 Cassandra system and a single client. In some of my tests I perform an
 update to an existing subcolumn in a row and subsequently read it back from
 the same thread. More often than not I get back the value I've written (and
 expected), but sometimes it can occur that I get back the old value of the
 subcolumn. Is this a bug or does it fall into the eventual consistency?

 I'm using Hector 0.6.0-14 on Cassandra 0.6.3 on a single disk, double-core
 Windows machine with a Sun 1.6 JVM. All reads and writes are quorum (the
 default), but I don't think this matters in my setup.

 Groets, Hugo.

Re: How to stop Cassandra running in embeded mode

2010-07-14 Thread Ran Tavory

look at my pom. it has forkModealways/
http://github.com/rantav/hector/blob/master/pom.xml#L95

On Wed, Jul 14, 2010 at 3:02 PM, Andriy Kopachevsky
kopachev...@gmail.comwrote:

 Ran, I do know to run jest in own thread with maven surefire plugin, but
 don't sure how can I do this with own JVM for each test. How are you doing
 this? Thanks.


 On Fri, Jul 9, 2010 at 10:33 PM, Ran Tavory ran...@gmail.com wrote:

 The workaround I do is fork always. Each test pulls up its own jvm.

 On Jul 9, 2010 9:51 PM, Jonathan Ellis jbel...@gmail.com wrote:

 there's some support for this in 0.7 (see
 http://issues.apache.org/jira/browse/CASSANDRA-1018) but fundamentally
 it's not really designed to be started and stopped multiple times
 within the same process.


 On Thu, Jul 8, 2010 at 3:44 AM, Andriy Kopachevsky
 kopachev...@gmail.com wrote:
  Hi, we are tryi...

 --

 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com

Re: Performance Issues

2010-07-13 Thread Ran Tavory

Since you're using hector hector-users@ is a good place to be, so
u...@cassandra to bcc

operateWithFailover is one stop before sending the request over the network
and waiting, so it makes lots of sense that a significant part of the
application is spent in it.

On Tue, Jul 13, 2010 at 6:22 PM, Samuru Jackson 
samurujack...@googlemail.com wrote:

 Hi,

 I have set up a ring with a couple of servers and wanted to run some
 stress tests.

 Unfortunately, there is some kind of bottleneck at the client side.
 I'm using Hector and Cassandra 0.6.1.

 The subsequent profile results are based on a small Java program that
 inserts sequentially records, with a couple of columns, into Cassandra
 (no-multithreading or something that increases the stress). The nodes
 are not too busy while inserting the records (approx. 20%-25% CPU
 utilization).

 Log-Level is on Info and I don't see any exception flying around. The
 client has also registered all available node IPs.

 According to my Profiler
 operateWithFailover(me.prettyprint.cassandra.service.Operation)
 consumes ~86% of the execution time and further down the hierarchy the
 method executeAndSetResult(org.apache.cassandra.thrift.Cassandra$Client)
 ist responsible for ~73%.

 I'm inserting the columns one-by-one is such way:

 ColumnPath cp = new ColumnPath(colFamilyName);
 cp.setColumn(bytes(colName));
 cp.setSuper_column(bytes(superColName));
 keySpace.insert(key, cp, value.getBytes());

 Can anyone point me out in what I could look into to resolve this issue?

 /SJ

Re: Using Pelops with Cassandra 0.7.X

2010-07-13 Thread Ran Tavory

Hector doesn't have 0.7 support yet

On Jul 14, 2010 1:34 AM, Peter Harrison cheetah...@gmail.com wrote:

I know Cassandra 0.7 isn't released yet, but I was wondering if anyone
has used Pelops with the latest builds of Cassandra? I'm having some
issues, but I wanted to make sure that somebody else isn't working on
a branch of Pelops to support Cassandra 7. I have downloaded and built
the latest code from GitHub, trunk of Pelops, and this works with 6.3,
but not Cassandra Trunk. Is Pelops worth updating or should I use
other client libraries for Java such as Hector?

Re: How to stop Cassandra running in embeded mode

2010-07-09 Thread Ran Tavory

The workaround I do is fork always. Each test pulls up its own jvm.

On Jul 9, 2010 9:51 PM, Jonathan Ellis jbel...@gmail.com wrote:

there's some support for this in 0.7 (see
http://issues.apache.org/jira/browse/CASSANDRA-1018) but fundamentally
it's not really designed to be started and stopped multiple times
within the same process.


On Thu, Jul 8, 2010 at 3:44 AM, Andriy Kopachevsky
kopachev...@gmail.com wrote:
 Hi, we are tryi...
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

http://scale.metaoptimize.com/

2010-07-08 Thread Ran Tavory

Just found this site and thought it might be interesting to folks on this
list.
http://scale.metaoptimize.com/
It's a stack-overflow style qna site, in their words:

 A community interested in scalability, high availability, data stores,
 NoSQL, distributed computing, parallel computing, cloud computing, elastic
 computing, HPC, grid computing, AWS, crawling, failover, redundancy, and
 concurrency.

Re: Hector Client Failover errors

2010-06-27 Thread Ran Tavory

ttransport exception usually happens when the server cannot respond or
there's a network error.
Can you send more context to from your code?
More context from the exception?

Is the insertion rate about the same in the thrift or hector versions? If
insertion with hector is faster than thrift (connection pooling) then maybe
your server is freaking out.

On Sun, Jun 27, 2010 at 1:53 PM, Atul Gosain atul.gos...@gmail.com wrote:

 I am trying to insert the data using hector client. Using only one host in
 the pool ie. localhost. like as follows
 CassandraClientPool pool =
 CassandraClientPoolFactory.INSTANCE.get();
 client = pool.borrowClient(localhost, 9160);
 global = client.getKeyspace(keyspace, ConsistencyLevel.ONE);


 After some(5-6) iterations of insertions of data (every 5 min insertion of
 about 40 MB of data), the program starts emitting this

 10/06/27 09:55:28 WARN service.FailoverOperator: Got a TTransportException
 from localhost. Num of retries: 1
 10/06/27 09:55:28 INFO service.FailoverOperator: Skipping to next host.
 Current host is: localhost
 10/06/27 09:55:28 INFO service.FailoverOperator: Skipped host. New host is:
 localhost

 I couldnt understand the reason for this, when im using only device in the
 pool. If i remove the ConsistencyLevel.ONE, then the error starts from the
 first iteration itself.

 The same program through Thrift runs without any problems. Actually, i just
 modified the thrift program and replaced the calls to thrift api to
 corresponding Hector calls.

 Thanks
 Atul

Re: Call for input of cassandra, thrift , hector, pelops example / sample / test code snippets

2010-06-26 Thread Ran Tavory

these classes are from a newer version and they should not exist in version
14.

On Fri, Jun 25, 2010 at 3:42 PM, Gavan Hood gavan.h...@gmail.com wrote:

 Hi Ran,

 I downloaded the git code, I think I have something up with my versioning,
 i have the latest build 0.6.0.14 of hector and the git download, i have a
 bunch of classes that do not appear to resolve, some of them are: *

 KeyspaceOperatorFactory
 **

 ClusterFactory
 *Cluster

 Are these classes from a newer or older version of hector maybe, or am I
 missing some step ?

 Regards
 Gavan

 On Thu, Jun 24, 2010 at 11:09 PM, Ran Tavory ran...@gmail.com wrote:

 Thanks for this effort Gavan :)

  On Thu, Jun 24, 2010 at 3:47 PM, Gavan Hood gavan.h...@gmail.comwrote:

 Thanks Ran,
 I downloaded those,
 ReadAllKeys worked straight up, very good example, I have already got
 ExampleClient working so ditto there :-)
 I am searching for the defintion of Commandvoid in ExampleDAO
 getallkey slices and keyspace test have a few more unresolved externals
 like junit, mockito and other items.
 I tried downloading the code stack fom git but I am not sure that was a
 good idea, but it did have some of the files in that download.

 if you use git that should be straight forward, many developers have done
 that already. If you just downloaded one of the released versions then lmk
 if I forgot to include one dependency or another...


 I noticed a file IterateOverKeysOnly.java on the site too, but that has
 some issues, some undefined KeySpace entries and other syntax errors.

 It was contributed by another developer so I don't know.


 Gavan
   On Thu, Jun 24, 2010 at 10:18 PM, Ran Tavory ran...@gmail.com wrote:

 Here's what we have for hector:

 wiki: http://wiki.github.com/rantav/hector/
  blog posts:
 http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/
 http://prettyprint.me/2010/03/03/load-balancing-and-improved-failover-in-hector/
  http://prettyprint.me/2010/04/03/jmx-in-hector/

 Examples:
 Example 
 http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.java
 DAOhttp://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.java
 http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.javaExample
 simple 
 clienthttp://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.java
 http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.javaExample
 read all 
 keyshttp://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.java
 http://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.javaget
 all key slices for… in groups http://pastie.org/957661
  http://pastie.org/957661and 
 KeyspaceTesthttp://github.com/rantav/hector/blob/master/src/test/java/me/prettyprint/cassandra/service/KeyspaceTest.java


 On Thu, Jun 24, 2010 at 1:45 AM, Gavan Hood gavan.h...@gmail.comwrote:

 Hi all,

 I have been researching the samples with some success but its taken a
 while. I am very keen on Cassandra and love the work thats been done, well
 done everyone involved.

 I would like to get as many of the samples I can get organized into
 something that makes it easier to kick of with for people taking the road 
 I
 am on.

 If people on this list have code snippets, full example apps, test
 apps, API test functions etc I would like to hear about them please. My 
 work
 is in Java so I really want to see those, the others are still of high
 interest as I will post them all out as I mention below.

 Ideally I would like to get a small test container set up to allow
 people to poke and prod API's and see what happens, but like most of us 
 time
 is the challenge. If I do not get that far I would at least post
 the findings to page(s)  that people can continue to add to, maybe if
 successful it could then be consumed back into the apachi wiki...

 If someone has already done this I would love to see the site.

 Let me know your thoughts,  and better yet show me the code :-)

 Regards
 Gavan

Re: Call for input of cassandra, thrift , hector, pelops example / sample / test code snippets

2010-06-26 Thread Ran Tavory

On Sat, Jun 26, 2010 at 4:58 PM, GH gavan.h...@gmail.com wrote:

 I believe that the code I grabbed from git is later than the latest
 download published on there then.

true. All the new classes you mentioned are new.


 Where would I find that code ? is it available to public (aka me :-) ), if
 not when would that be.
 If its a way off, i think I need a version of git that matches the released
 code base.

Check out the branch which is currently at version 14
http://github.com/rantav/hector/tree/0.6.0 (or use the sources from the
packaged downloads section http://github.com/rantav/hector/downloads)


 On Sat, Jun 26, 2010 at 11:42 PM, Ran Tavory ran...@gmail.com wrote:

 these classes are from a newer version and they should not exist in
 version 14.


 On Fri, Jun 25, 2010 at 3:42 PM, Gavan Hood gavan.h...@gmail.com wrote:

 Hi Ran,

 I downloaded the git code, I think I have something up with my
 versioning, i have the latest build 0.6.0.14 of hector and the git download,
 i have a bunch of classes that do not appear to resolve, some of them are:
 *

 KeyspaceOperatorFactory
 **

 ClusterFactory
 *Cluster

 Are these classes from a newer or older version of hector maybe, or am I
 missing some step ?

 Regards
 Gavan

 On Thu, Jun 24, 2010 at 11:09 PM, Ran Tavory ran...@gmail.com wrote:

 Thanks for this effort Gavan :)

  On Thu, Jun 24, 2010 at 3:47 PM, Gavan Hood gavan.h...@gmail.comwrote:

 Thanks Ran,
 I downloaded those,
 ReadAllKeys worked straight up, very good example, I have already got
 ExampleClient working so ditto there :-)
 I am searching for the defintion of Commandvoid in ExampleDAO
 getallkey slices and keyspace test have a few more unresolved externals
 like junit, mockito and other items.
 I tried downloading the code stack fom git but I am not sure that was a
 good idea, but it did have some of the files in that download.

 if you use git that should be straight forward, many developers have
 done that already. If you just downloaded one of the released versions then
 lmk if I forgot to include one dependency or another...


 I noticed a file IterateOverKeysOnly.java on the site too, but that has
 some issues, some undefined KeySpace entries and other syntax errors.

 It was contributed by another developer so I don't know.


 Gavan
   On Thu, Jun 24, 2010 at 10:18 PM, Ran Tavory ran...@gmail.comwrote:

 Here's what we have for hector:

 wiki: http://wiki.github.com/rantav/hector/
  blog posts:
 http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/
 http://prettyprint.me/2010/03/03/load-balancing-and-improved-failover-in-hector/
  http://prettyprint.me/2010/04/03/jmx-in-hector/

 Examples:
 Example 
 http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.java
 DAOhttp://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.java
 http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.javaExample
 simple 
 clienthttp://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.java
 http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.javaExample
 read all 
 keyshttp://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.java
 http://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.javaget
 all key slices for… in groups http://pastie.org/957661
  http://pastie.org/957661and 
 KeyspaceTesthttp://github.com/rantav/hector/blob/master/src/test/java/me/prettyprint/cassandra/service/KeyspaceTest.java


 On Thu, Jun 24, 2010 at 1:45 AM, Gavan Hood gavan.h...@gmail.comwrote:

 Hi all,

 I have been researching the samples with some success but its taken a
 while. I am very keen on Cassandra and love the work thats been done, 
 well
 done everyone involved.

 I would like to get as many of the samples I can get organized into
 something that makes it easier to kick of with for people taking the 
 road I
 am on.

 If people on this list have code snippets, full example apps, test
 apps, API test functions etc I would like to hear about them please. My 
 work
 is in Java so I really want to see those, the others are still of high
 interest as I will post them all out as I mention below.

 Ideally I would like to get a small test container set up to allow
 people to poke and prod API's and see what happens, but like most of us 
 time
 is the challenge. If I do not get that far I would at least post
 the findings to page(s)  that people can continue to add to, maybe if
 successful it could then be consumed back into the apachi wiki...

 If someone has already done this I would love to see the site.

 Let me know your thoughts,  and better yet show me the code :-)

 Regards
 Gavan

Re: hector or pelops

2010-06-24 Thread Ran Tavory

on the wiki http://wiki.github.com/rantav/hector/ you can find:

Example 
http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.java
DAOhttp://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.java
http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.javaExample
simple 
clienthttp://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.java
http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.javaExample
read all 
keyshttp://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.java
http://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.javaget
all key slices for… in groups http://pastie.org/957661
 http://pastie.org/957661and
KeyspaceTesthttp://github.com/rantav/hector/blob/master/src/test/java/me/prettyprint/cassandra/service/KeyspaceTest.java




On Thu, Jun 24, 2010 at 1:12 AM, Gavan Hood gavan.h...@gmail.com wrote:

 Hi Ran,

 I have been trialling hector but have not found the  the samples you refer
 to, I found your basic ExampleClient but it does not excercise many
 functions for instance the getSlice, fields usage etc

 I want to develop a solid set of tests for each API call, do you have some
 code  that will help me build that. If my code ends up useful I intend to
 publish it on my website for others to use.

 Regards
 Gavan

 On Thu, Jun 24, 2010 at 4:43 AM, Ran Tavory ran...@gmail.com wrote:

 As the developer of hector I can only speak in favor of my child of love
 and I haven't tried pelops so take the following with a grain of salt...
 Hector sees wide adoption and has been coined the de-facto java client.
 It's been in use in production critical systems since version 0.5.0 by a few
 companies.
 The development team is responsive and accepts patches from the community
 and is busy with new features and improvements all the time. There's a bug
 tracking system and all bugs are fixed very fast.
 There are two active mailing lists one for the developers and one for the
 users http://wiki.github.com/rantav/hector/mailing-lists (85 members)
 The project is maintained on github (http://github.com/rantav/hector) and
 the process in all is very transparent and open to the community.
 Code is well tested with an embedded version of cassandra which I
 contributed back to the main cassandra repository, it runs a mvn and an ant
 build and all release versions are available at
 http://github.com/rantav/hector/downloads including source code. We love
 contributions and want to make it as easy as possible to contribute back.
 I myself have made a few contributions to cassandra core so I'm well
 familiar with its internals, which doesn't hurt when you write a client...
 ...and finally the features (just the high level):
 - connection pooling
 - datacenter friendly
 - high level API
 - all public cassandra versions in the last 6 months
 - failover
 - simple LB
 - extensive JMX
 - well documented, many examples, wiki, mailing list, team of developers
 and contributors.

 ... and of course there's also thrift if you're into hacking on it...


 On Wed, Jun 23, 2010 at 5:38 PM, Serdar Irmak sir...@protel.com.trwrote:

  Hi

 Which java client library do you reccommend, hector or pelops and why ?



 Best Regards,


  http://www.protel.com.tr/

 --

 *- Bu e-posta mesaji kisiye özel olup, gizli bilgiler iceriyor olabilir.
 Eger bu e-posta mesaji size yanlislikla ulasmissa, e-posta mesajini
 kullaniciya hemen geri gonderiniz ve mesaj kutunuzdan siliniz. **Bu e-
 posta mesaji, **hicbir sekilde, herhangi bir amac için dagitilamaz,
 yayinlanamaz ve para karsiligi satilamaz. Yollayici, bu e-posta
 mesajinin - **virus  koruma sistemleri ile kontrol ediliyor olsa bile -
 **virus içermedigini garanti etmez ve meydana gelebilecek zararlardan
 dogacak hiçbir sorumlulugu kabul etmez.
 - The information contained in this message is confidential, intended
 solely for the use of the individual or entity to whom it is addressed
 and may be protected by professional secrecy. You should not copy, disclose
 or distribute this information for any purpose. If you are not the intended
 recipient of this message or you receive this mail in error, you should
 refrain from making any use of the contents and from opening any attachment.
 In that case, please notify the sender immediately and return the message to
 the sender, then, delete and destroy all copies. This e-mail message has
 been swept by anti-virus systems for the presence of computer viruses. In
 doing so, however, we cannot warrant that virus or other forms of data
 corruption may not be present and we do not take any responsibility in any
 occurrence.*
 --

Re: Hector vs cassandra-java-client

2010-06-24 Thread Ran Tavory

Hector has a pom.xml which and deals with its dependencies as gracefully as
it can, but the problem is that hector's dependencies such as cassandra and
libthrift aren't in public maven repos. Any suggestions how to deal with
that?


On Thu, Jun 24, 2010 at 6:00 AM, Kenneth Bartholet 
kennethbartho...@hotmail.com wrote:

  Agreed, but at what cost?
 It's my understanding that the big deterrent is the lack of 3rd party
 dependencies in maven public repos (e.g. Thrift itself).

 The option would be to publish a public maven repo containing all
 dependencies, which ends up being more responsibility then the client
 developers want to accept.
 Any volunteers?

 -Ken


  To: user@cassandra.apache.org
  From: bbo...@gmail.com

  Subject: Re: Hector vs cassandra-java-client
  Date: Tue, 22 Jun 2010 17:14:53 +0200

 
  Dop Sun su...@dopsun.com writes:
 
   Updated.
 
  the first Cassandra client lib to make it into the Maven repositories
  will probably end up with a big audience. :-)
 
  -Bjørn
 

 --
 Hotmail has tools for the New Busy. Search, chat and e-mail from your
 inbox. Learn 
 more.http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1

Re: Call for input of cassandra, thrift , hector, pelops example / sample / test code snippets

2010-06-24 Thread Ran Tavory

Thanks for this effort Gavan :)

On Thu, Jun 24, 2010 at 3:47 PM, Gavan Hood gavan.h...@gmail.com wrote:

 Thanks Ran,
 I downloaded those,
 ReadAllKeys worked straight up, very good example, I have already got
 ExampleClient working so ditto there :-)
 I am searching for the defintion of Commandvoid in ExampleDAO
 getallkey slices and keyspace test have a few more unresolved externals
 like junit, mockito and other items.
 I tried downloading the code stack fom git but I am not sure that was a
 good idea, but it did have some of the files in that download.

if you use git that should be straight forward, many developers have done
that already. If you just downloaded one of the released versions then lmk
if I forgot to include one dependency or another...


 I noticed a file IterateOverKeysOnly.java on the site too, but that has
 some issues, some undefined KeySpace entries and other syntax errors.

It was contributed by another developer so I don't know.


 Gavan
 On Thu, Jun 24, 2010 at 10:18 PM, Ran Tavory ran...@gmail.com wrote:

 Here's what we have for hector:

 wiki: http://wiki.github.com/rantav/hector/
  blog posts:
 http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/
 http://prettyprint.me/2010/03/03/load-balancing-and-improved-failover-in-hector/
  http://prettyprint.me/2010/04/03/jmx-in-hector/

 Examples:
 Example 
 http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.java
 DAOhttp://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.java
 http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/dao/ExampleDao.javaExample
 simple 
 clienthttp://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.java
 http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExampleClient.javaExample
 read all 
 keyshttp://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.java
 http://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.javaget
 all key slices for… in groups http://pastie.org/957661
  http://pastie.org/957661and 
 KeyspaceTesthttp://github.com/rantav/hector/blob/master/src/test/java/me/prettyprint/cassandra/service/KeyspaceTest.java


 On Thu, Jun 24, 2010 at 1:45 AM, Gavan Hood gavan.h...@gmail.com wrote:

 Hi all,

 I have been researching the samples with some success but its taken a
 while. I am very keen on Cassandra and love the work thats been done, well
 done everyone involved.

 I would like to get as many of the samples I can get organized into
 something that makes it easier to kick of with for people taking the road I
 am on.

 If people on this list have code snippets, full example apps, test apps,
 API test functions etc I would like to hear about them please. My work is in
 Java so I really want to see those, the others are still of high interest as
 I will post them all out as I mention below.

 Ideally I would like to get a small test container set up to allow people
 to poke and prod API's and see what happens, but like most of us time is the
 challenge. If I do not get that far I would at least post the findings to
 page(s)  that people can continue to add to, maybe if successful it could
 then be consumed back into the apachi wiki...

 If someone has already done this I would love to see the site.

 Let me know your thoughts,  and better yet show me the code :-)

 Regards
 Gavan

Re: hector or pelops

2010-06-23 Thread Ran Tavory

As the developer of hector I can only speak in favor of my child of love and
I haven't tried pelops so take the following with a grain of salt...
Hector sees wide adoption and has been coined the de-facto java client. It's
been in use in production critical systems since version 0.5.0 by a few
companies.
The development team is responsive and accepts patches from the community
and is busy with new features and improvements all the time. There's a bug
tracking system and all bugs are fixed very fast.
There are two active mailing lists one for the developers and one for the
users http://wiki.github.com/rantav/hector/mailing-lists (85 members)
The project is maintained on github (http://github.com/rantav/hector) and
the process in all is very transparent and open to the community.
Code is well tested with an embedded version of cassandra which I
contributed back to the main cassandra repository, it runs a mvn and an ant
build and all release versions are available at
http://github.com/rantav/hector/downloads including source code. We love
contributions and want to make it as easy as possible to contribute back.
I myself have made a few contributions to cassandra core so I'm well
familiar with its internals, which doesn't hurt when you write a client...
...and finally the features (just the high level):
- connection pooling
- datacenter friendly
- high level API
- all public cassandra versions in the last 6 months
- failover
- simple LB
- extensive JMX
- well documented, many examples, wiki, mailing list, team of developers and
contributors.

... and of course there's also thrift if you're into hacking on it...

On Wed, Jun 23, 2010 at 5:38 PM, Serdar Irmak sir...@protel.com.tr wrote:

  Hi

 Which java client library do you reccommend, hector or pelops and why ?



 Best Regards,


   http://www.protel.com.tr/

 --

 *- Bu e-posta mesaji kisiye özel olup, gizli bilgiler iceriyor olabilir.
 Eger bu e-posta mesaji size yanlislikla ulasmissa, e-posta mesajini
 kullaniciya hemen geri gonderiniz ve mesaj kutunuzdan siliniz. **Bu e-
 posta mesaji, **hicbir sekilde, herhangi bir amac için dagitilamaz,
 yayinlanamaz ve para karsiligi satilamaz. Yollayici, bu e-posta mesajinin-
 **virus  koruma sistemleri ile kontrol ediliyor olsa bile - **virus
 içermedigini garanti etmez ve meydana gelebilecek zararlardan dogacak
 hiçbir sorumlulugu kabul etmez.
 - The information contained in this message is confidential, intended
 solely for the use of the individual or entity to whom it is addressed and
 may be protected by professional secrecy. You should not copy, disclose or
 distribute this information for any purpose. If you are not the intended
 recipient of this message or you receive this mail in error, you should
 refrain from making any use of the contents and from opening any attachment.
 In that case, please notify the sender immediately and return the message to
 the sender, then, delete and destroy all copies. This e-mail message has
 been swept by anti-virus systems for the presence of computer viruses. In
 doing so, however, we cannot warrant that virus or other forms of data
 corruption may not be present and we do not take any responsibility in any
 occurrence.*
 --

Re: Instability and memory problems

2010-06-20 Thread Ran Tavory

I don't have the answer but if you provide jmap output, cfstats output that
may help.
Are you using mmap files?
Do you see swap? Gc in the logs?

On Jun 20, 2010 7:25 PM, James Golick jamesgol...@gmail.com wrote:

As I alluded to in another post, we just moved from 2-4 nodes. Since then,
the cluster has been incredibly

The memory problems I've posted about before have gotten much worse and our
nodes are becoming incredibly slow/unusable every 24 hours or so. Basically,
the JVM reports that only 14GB is committed, but the RSS of the process is
22GB, and cassandra is completely unresponsive, but still having requests
routed to it internally, so it completely destroys performance.

I'm at a loss for how to diagnose this issue.

In addition to that, read performance has gone way downhill, and query
latency is much slower than it was with a 2 node cluster. Perhaps this was
to be expected, though.

We really like cassandra for the most part, but these stability issues are
going to force us to abandon it. Our application is like a yoyo right now,
and we can't live with that.

Help resolving these issues would be greatly appreciated.

Re: Client connection and data distribution across nodes

2010-06-17 Thread Ran Tavory

On Thu, Jun 17, 2010 at 8:52 AM, Mubarak Seyed se...@apple.com wrote:

 Hi All,

 Regarding client thrift connection, i have 4 nodes which formed a ring, but
 client only knows the IP address of an one node (and thrift RPC port
 number),
 how does client can connect to any one other node without getting ring
 information? Can we keep the load balancer and bind all the four nodes or
 client needs to know the IP address of all the 4 nodes?

 If you use java there are higher level libraries that manage ring
information for you, so they may help. If not, I guess you'll need to call
the describe_ring thrift api.


 Regarding storage management, for instance, if we want to store 100k
 records, but each 25k records on each node, something like

 node 1 - 25K
 node 2 - 25K
 node 3 - 25K
 node 4 - 25K

 Can we accomplish using OrderPreservingPartitioner (OPP)? How does
 replication happen between nodes if we keep only 25k records in one node?

 Can someone please let me know. Thanks in advance.

 Thanks,
 Mubarak

Re: Cassandra questions

2010-06-17 Thread Ran Tavory

On Thu, Jun 17, 2010 at 9:09 PM, F. Hugo Zwaal hzw...@yahoo.com wrote:

 Hi,

 Being fairly new to Cassandra I have a couple of questions:

 1) Is there a way to remove multiple keys/rows in one operation (batch) or
 must keys be removed one by one?

yes, batch_mutate

 2) I see API references to version 0.7, but I couldn't find a alpha or beta
 anywhere? Does it exist already and if so, where can I get it? Or else, when
 is it planned to be public/released?

0.7 is still in development and is the trunk. Latest stable is 0.6.2. I
don't know what the planned date for 0.7.0, but there will also be an 0.6.3
before it.


 Thanks in advance, Hugo.

Re: batch_mutate atomic?

2010-06-14 Thread Ran Tavory

no, it's not atomic. it just shortens the roundtrip of many update requests.
Some may fail and some may succeed

On Mon, Jun 14, 2010 at 2:40 PM, Per Olesen p...@trifork.com wrote:

 Can I expect batch_mutate to work in what I would think of as an atomic
 operation?

 That either all the mutations in the batch_mutate call are executed or none
 of them are? Or can some of them fail while some of them succeeds?

Re: Pelops - a new Java client library paradigm

2010-06-12 Thread Ran Tavory

Nice going, Dominic, having a clear API for cassandra is a big step forward
:)
Interestingly, at hector we came up with similar approach, just didn't find
the time for code that, as production systems keep me busy at nights as
well... We started with the implementation of BatchMutation, but the rest of
the API improvements are still TODO
Keep up the good work, competition keeps us healthy ;)

On Fri, Jun 11, 2010 at 4:41 PM, Dominic Williams 
thedwilli...@googlemail.com wrote:

 Pelops is a new high quality Java client library for Cassandra.

 It has a design that:
 * reveals the full power of Cassandra through an elegant Mutator and
 Selector paradigm
  * generates better, cleaner, less bug prone code
 * reduces the learning curve for new users
 * drives rapid application development
 * encapsulates advanced pooling algorithms

 An article introducing Pelops can be found at

 http://ria101.wordpress.com/2010/06/11/pelops-the-beautiful-cassandra-database-client-for-java/

 Thanks for reading.
 Best, Dominic

Re: cassandra out of heap space crash

2010-06-11 Thread Ran Tavory

Gary fwiw I get oom with Cl one quite commonly if I'm not careful with my
writes

On Jun 11, 2010 8:48 PM, Jonathan Ellis jbel...@gmail.com wrote:

We give you enough rope to hang yourself.  Don't use ZERO if that's
not what you want. :)


On Fri, Jun 11, 2010 at 9:23 AM, William Ashley wash...@gmail.com wrote:
 Would it be reasonable...
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Passing client as parameter

2010-06-10 Thread Ran Tavory

You can look at
http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/CassandraClientFactory.java

so, to close the client you can just get the transport out of the client
(bold):

  private void closeClient(CassandraClient cclient) {
log.debug(Closing client {}, cclient);
((CassandraClientPoolImpl) pool).reportDestroyed(cclient);
Cassandra.Client client = cclient.getCassandra();
*client.getInputProtocol().getTransport().close();*
*client.getOutputProtocol().getTransport().close();*
cclient.markAsClosed();
  }

But to create a client you need a transport (bold):

  private Cassandra.Client createThriftClient(String  url, int port)
  throws TTransportException , TException {
log.debug(Creating a new thrift connection to {}:{}, url, port);
TTransport tr;
if (useThriftFramedTransport) {
  tr = new TFramedTransport(new TSocket(url, port, timeout));
} else {
  tr = new TSocket(url, port, timeout);
}
TProtocol proto = new TBinaryProtocol(tr);
*Cassandra.Client client = new Cassandra.Client(proto);*
try {
  tr.open();
} catch (TTransportException e) {
  // Thrift exceptions aren't very good in reporting, so we have to
catch the exception here and
  // add details to it.
  log.error(Unable to open transport to  + url + : + port, e);
  clientMonitor.incCounter(Counter.CONNECT_ERROR);
  throw new TTransportException(Unable to open transport to  + url +
: + port +  ,  +
  e.getLocalizedMessage(), e);
}
return client;
  }


So what you can do is instead of passing a client to the method, pass a URL
to the method. The method would open the transport, create a client, make
some cassandra operations and then close the transport.

On Wed, Jun 9, 2010 at 10:35 PM, Steven Haar sh...@vintagesoftware.comwrote:

 C#


 On Wed, Jun 9, 2010 at 2:34 PM, Ran Tavory ran...@gmail.com wrote:

 Some languages have higher level clients that might help you. What
 language are you using?

 On Jun 9, 2010 9:01 PM, Steven Haar sh...@vintagesoftware.com wrote:

 What is the best way to pass a Cassandra client as a parameter? If you
 pass it as a parameter, do you also have to pass the transport in order to
 be able to close the connection? Is there any way to open or close the
 transport direclty from the client?

 Essentailly what I want to do is pass a Cassandra client to a method and
 then within that method be able to open the transport, execute a get or set
 to the Cassandra database, and then close the transport all witihin the
 method. The only way I see to do this is to also pass the transport to the
 method.

Re: cassandra out of heap space crash

2010-06-10 Thread Ran Tavory

I can't say exactly how much memory is the correct amount, but surely 1G is
very little.
By replicating 3 times your cluster now makes 3 times more work than it used
to do, both on reads and on writes while the readers/writers continue
hammering it the same pace.

So once you've upped your memory (try 4g, if not enough 8g etc) if this
still doesn't help, you want to look at either adding capacity or slowing
down your writes.
Which consistency level are you writing with? You can try ALL, this will
slow down your writes just as much needed by the cluster to catch its breath
(or so I hope, I never actually tried that...)

On Fri, Jun 11, 2010 at 12:26 AM, Julie julie.su...@nextcentury.com wrote:

 I am running an 8 node cassandra cluster with each node on its own
 dedicated VM.

 My app very quickly populates the database with about 100,000 rows of data
 (each row is about 100K bytes) times the number of nodes in my cluster so
 there's about 100,000 rows of data on each node (seems very evenly
 distributed).

 I have been running my app fairly successfully but today changed the
 replication
 factor from 1 to 3. (I first took down the servers, nuked their data
 directories, copied over the new storage-conf.xml to each node, then
 restarted
 the servers.)  My app begins by populating the database with fresh data.
  During
 the writing phase, all the cassandra servers, one by one, started getting
 an
 out-of-memory exception.  Here's the output from the first to die:

 INFO [COMMIT-LOG-WRITER] 2010-06-10 14:18:54,609 CommitLog.java (line 407)
 Discarding obsolete commit

 log:CommitLogSegment(/var/lib/cassandra/commitlog/CommitLog-1276193883235.log)

 INFO [ROW-MUTATION-STAGE:5] 2010-06-10 14:18:55,499 ColumnFamilyStore.java
 (line 609) Enqueuing flush of Memtable(Standard1)@19571399

 INFO [GMFD:1] 2010-06-10 14:19:01,556 Gossiper.java (line 568)
 InetAddress /10.210.69.221 is now UP
 INFO [GMFD:1] 2010-06-10 14:20:35,136 Gossiper.java (line 568)
 InetAddress /10.254.242.228 is now UP
 INFO [GMFD:1] 2010-06-10 14:20:35,137 Gossiper.java (line 568)
 InetAddress /10.201.207.129 is now UP
 INFO [GMFD:1] 2010-06-10 14:20:36,922 Gossiper.java (line 568)
 InetAddress /10.198.37.241 is now UP

 INFO [GC inspection] 2010-06-10 14:19:03,722 GCInspector.java (line 110)
 GC for ConcurrentMarkSweep: 2164 ms, 8754168 reclaimed leaving 1070909048
 used;
 max is 1174339584
 INFO [GC inspection] 2010-06-10 14:21:09,068 GCInspector.java (line 110) GC
 for
 ConcurrentMarkSweep: 2151 ms, 78896080 reclaimed leaving 994679752 used;
 max is
 1174339584
 INFO [Timer-1] 2010-06-10 14:21:09,068 Gossiper.java (line 179)
 InetAddress /10.198.37.241 is now dead.
 INFO [Timer-1] 2010-06-10 14:21:12,045 Gossiper.java (line 179)
 InetAddress /10.210.69.221 is now dead.
  INFO [GMFD:1] 2010-06-10 14:21:12,046 Gossiper.java (line 568)
 InetAddress /10.210.203.210 is now UP
  INFO [GMFD:1] 2010-06-10 14:21:12,306 Gossiper.java (line 568)
 InetAddress /10.210.69.221 is now UP
  INFO [GMFD:1] 2010-06-10 14:21:12,306 Gossiper.java (line 568)
 InetAddress /10.192.218.117 is now UP
  INFO [GMFD:1] 2010-06-10 14:21:12,306 Gossiper.java (line 568)
 InetAddress /10.198.37.241 is now UP
  INFO [GMFD:1] 2010-06-10 14:21:12,307 Gossiper.java (line 568)
 InetAddress /10.254.138.226 is now UP
 ERROR [ROW-MUTATION-STAGE:25] 2010-06-10 14:21:15,127 CassandraDaemon.java
 (line 78) Fatal exception in thread Thread[ROW-MUTATION-STAGE:25,5,main]
 java.lang.OutOfMemoryError: Java heap space
at

 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:84)
at

 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:29)
at
 org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns
 (ColumnFamilySerializer.java:117)
at
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize
 (ColumnFamilySerializer.java:108)
at
 org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps
 (RowMutation.java:359)
at
 org.apache.cassandra.db.RowMutationSerializer.deserialize
 (RowMutation.java:369)
at
 org.apache.cassandra.db.RowMutationSerializer.deserialize
 (RowMutation.java:322)
at
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb
 (RowMutationVerbHandler.java:45)
at
 org.apache.cassandra.net.MessageDeliveryTask.run
 (MessageDeliveryTask.java:40)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask
 (ThreadPoolExecutor.java:886)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run
 (ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
 ERROR [ROW-MUTATION-STAGE:18] 2010-06-10 14:21:15,129 CassandraDaemon.java
 (line 78) Fatal exception in thread Thread[ROW-MUTATION-STAGE:18,5,main]



 Within 15 minutes, all 8 nodes died while my app continued trying to
 populate
 the database.  Is there something I am doing wrong?  I am populating the
 database very quickly by writing 100 rows at once in each of 8 clients,

Re: Tree Search in Cassandra

2010-06-07 Thread Ran Tavory

Quote from Gary:

batch_mutate makes no atomicity guarantees.  It s intended to help avoiding
many round-trips.
It can fail half-way through leaving you with a partially completed batch.

On Mon, Jun 7, 2010 at 9:39 AM, David Boxenhorn da...@lookin2.com wrote:

 Is batch mutate atomic? If not, can we make it so?

 On Mon, Jun 7, 2010 at 4:11 AM, Tatu Saloranta tsalora...@gmail.comwrote:

 Yeah, or maybe just clustering, since there is no branching structure.
 It's quite commonly useful even on regular b-tree style storage (BDB
 et al), as it can reduce per-entry overhead quite a bit. And allows
 very efficient compression, if entries have lots of redundancy (xml or
 json serialized data).

 I doubt this can be done reliably from client perspective. While a
 good idea from functionality perspective, problem is that it requires
 some level of atomic operations or locking, since updates are
 multi-step operations. From server side I guess it would be similar to
 work on allowing atomic multi-part operations (like ones being worked
 on to implement counters?).

 -+ Tatu +-

 On Sun, Jun 6, 2010 at 2:19 AM, Ran Tavory ran...@gmail.com wrote:
  sounds interesting... btree on top of cassandra ;)
 
  On Sun, Jun 6, 2010 at 12:16 PM, David Boxenhorn da...@lookin2.com
 wrote:
 
  I'm still thinking about the problem of how to handle range queries on
  very large sets of data, using Random Partitioning.
 
  Has anyone used tree search to solve this? What do you think?
 
  More specifically, something like this:
 
  - Store a maximum of 1000 values per supercolumn (or some other fixed
  number)
  - Each supercolumn has a greaterChild and a lessChild in addition
 to
  the values
  - When the number of values in the supercolumn grows beyond the
 maximum,
  split it into 3 parts, with the top third going into greaterChild and
 the
  bottom third into lessChild
  - To find a value, look at greaterChild and lessChild to find out
  whether your key is within the current range, and if not, where to look
 next
  - Range searches mean finding the first value, then looking at
  greaterChild or lessChild (depending on the direction of your
 search)
  until you reach the end of the range.
 
  Super Column Family:
 
  index [ columnFamilyId [ firstVal : val ,
 lastVal : val ,
 val : dataId,
 lessChild : columnFamilyId ,
 greaterChild : columnFamilyId ]

Re: Is ReplicationFactor values number of replicas or number of copies of data?

2010-06-07 Thread Ran Tavory

to have two copies you need RF=2.
RF=0 doesn't make sense as far as I understand it.

On Mon, Jun 7, 2010 at 2:16 PM, Per Olesen p...@trifork.com wrote:

 Hi,

 I am unclear about what the ReplicationFactor value means.

 Does RF=1 mean that there is only one single node that has the data in the
 cluster (actually no replication), or, does it mean, that there are two
 copies of the data - one actual and one replica (as in replicated one
 time)?

 I noticed, that I CAN start a node with RF=0, but I get
 UnavailableException when trying to insert, so I assume RF=0 is wrong then?

 Put another way: If I want my data to always live on exactly 2 nodes in the
 cluster, do I set RF=2 or RF=1? :-)

 /Per

Re: nodetool cleanup isn't cleaning up?

2010-06-03 Thread Ran Tavory

getRangeToEndpointMap is very useful, thanks, I didn't know about it...
however, I've reconfigured my cluster since (moved some nodes and tokens) so
not the problem is gone. I guess I'll use getRangeToEndpointMap next time I
see something like this...

On Thu, Jun 3, 2010 at 9:15 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Then the next step is to check StorageService.getRangeToEndpointMap via jmx

 On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory ran...@gmail.com wrote:
  I'm using RackAwareStrategy. But it still doesn't make sense I think...
  let's see what did I miss...
  According to http://wiki.apache.org/cassandra/Operations
 
  RackAwareStrategy: replica 2 is placed in the first node along the ring
 the
  belongs in another data center than the first; the remaining N-2
 replicas,
  if any, are placed on the first nodes along the ring in the same rack as
 the
  first
 
  192.168.252.124Up803.33 MB
  56713727820156410577229101238628035242 |--|
  192.168.252.99Up 352.85 MB
  56713727820156410577229101238628035243 |   ^
  192.168.252.125Up134.24 MB
  85070591730234615865843651857942052863 v   |
  192.168.254.57Up 676.41 MB
   113427455640312821154458202477256070485|   ^
  192.168.254.58Up  99.74 MB
   141784319550391026443072753096570088106v   |
  192.168.254.59Up  99.94 MB
   170141183460469231731687303715884105727|--|
  Alright, so I made a mistake and didn't use the alternate-datacenter
  suggestion on the page so the first node of every DC is overloaded with
  replicas. However,  the current situation still doesn't make sense to me.
  .252.124 will be overloaded b/c it has the first token in the 252 dc.
  .254.57 will also be overloaded since it has the first token in the .254
 DC.
  But for which node does 252.99 serve as a replicator? It's not the first
 in
  the DC and it's just one single token more than it's predecessor (which
 is
  in the same DC).
  On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  I'm saying that .99 is getting a copy of all the data for which .124
  is the primary.  (If you are using RackUnawarePartitioner.  If you are
  using RackAware it is some other node.)
 
  On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory ran...@gmail.com wrote:
   ok, let me try and translate your answer ;)
   Are you saying that the data that was left on the node is
   non-primary-replicas of rows from the time before the move?
   So this implies that when a node moves in the ring, it will affect
   distribution of:
   - new keys
   - old keys primary node
   -- but will not affect distribution of old keys non-primary replicas.
   If so, still I don't understand something... I would expect even the
   non-primary replicas of keys to be moved since if they don't, how
 would
   they
   be found? I mean upon reads the serving node should not care about
   whether
   the row is new or old, it should have a consistent and global mapping
 of
   tokens. So I guess this ruins my theory...
   What did you mean then? Is this deletions of non-primary replicated
   data?
   How does the replication factor affect the load on the moved host
 then?
  
   On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis jbel...@gmail.com
   wrote:
  
   well, there you are then.
  
   On Mon, May 31, 2010 at 2:34 PM, Ran Tavory ran...@gmail.com
 wrote:
yes, replication factor = 2
   
On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis 
 jbel...@gmail.com
wrote:
   
you have replication factor  1 ?
   
On Mon, May 31, 2010 at 7:23 AM, Ran Tavory ran...@gmail.com
wrote:
 I hope I understand nodetool cleanup correctly - it should clean
 up
 all
 data
 that does not (currently) belong to this node. If so, I think it
 might
 not
 be working correctly.
 Look at nodes 192.168.252.124 and 192.168.252.99 below
 192.168.252.99Up 279.35 MB
 3544607988759775661076818827414252202
  |--|
 192.168.252.124Up 167.23 MB
 56713727820156410577229101238628035242 |   ^
 192.168.252.125Up 82.91 MB
  85070591730234615865843651857942052863 v   |
 192.168.254.57Up 366.6 MB
  113427455640312821154458202477256070485|   ^
 192.168.254.58Up 88.44 MB
  141784319550391026443072753096570088106v   |
 192.168.254.59Up 88.45 MB
  170141183460469231731687303715884105727|--|
 I wanted 124 to take all the load from 99. So I issued a move
 command.
 $ nodetool -h cass99 -p 9004 move
 56713727820156410577229101238628035243

 This command tells 99 to take the space b/w




 (56713727820156410577229101238628035242, 
 56713727820156410577229101238628035243]
 which is basically just one item in the token space, almost
 nothing... I
 wanted it to be very slim (just playing around).
 So, next I get this:
 192.168.252.124Up 803.33 MB

Re: Number of client connections

2010-06-03 Thread Ran Tavory

as far as I know, only the os level limitations, e.g. typically ~60k

On Thu, Jun 3, 2010 at 9:34 AM, Lev Stesin lev.ste...@gmail.com wrote:

 Hi,

 Is there a limit on the number of client connections to a node? Thanks.

 --
 Lev

Re: nodetool cleanup isn't cleaning up?

2010-06-01 Thread Ran Tavory

ok, let me try and translate your answer ;)

Are you saying that the data that was left on the node is
non-primary-replicas of rows from the time before the move?
So this implies that when a node moves in the ring, it will affect
distribution of:
- new keys
- old keys primary node
-- but will not affect distribution of old keys non-primary replicas.

If so, still I don't understand something... I would expect even the
non-primary replicas of keys to be moved since if they don't, how would they
be found? I mean upon reads the serving node should not care about whether
the row is new or old, it should have a consistent and global mapping of
tokens. So I guess this ruins my theory...
What did you mean then? Is this deletions of non-primary replicated data?
How does the replication factor affect the load on the moved host then?

On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis jbel...@gmail.com wrote:

 well, there you are then.

 On Mon, May 31, 2010 at 2:34 PM, Ran Tavory ran...@gmail.com wrote:
  yes, replication factor = 2
 
  On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  you have replication factor  1 ?
 
  On Mon, May 31, 2010 at 7:23 AM, Ran Tavory ran...@gmail.com wrote:
   I hope I understand nodetool cleanup correctly - it should clean up
 all
   data
   that does not (currently) belong to this node. If so, I think it might
   not
   be working correctly.
   Look at nodes 192.168.252.124 and 192.168.252.99 below
   192.168.252.99Up 279.35 MB
   3544607988759775661076818827414252202
|--|
   192.168.252.124Up 167.23 MB
   56713727820156410577229101238628035242 |   ^
   192.168.252.125Up 82.91 MB
85070591730234615865843651857942052863 v   |
   192.168.254.57Up 366.6 MB
113427455640312821154458202477256070485|   ^
   192.168.254.58Up 88.44 MB
141784319550391026443072753096570088106v   |
   192.168.254.59Up 88.45 MB
170141183460469231731687303715884105727|--|
   I wanted 124 to take all the load from 99. So I issued a move command.
   $ nodetool -h cass99 -p 9004 move
 56713727820156410577229101238628035243
  
   This command tells 99 to take the space b/w
  
  
 (56713727820156410577229101238628035242, 
 56713727820156410577229101238628035243]
   which is basically just one item in the token space, almost nothing...
 I
   wanted it to be very slim (just playing around).
   So, next I get this:
   192.168.252.124Up 803.33 MB
   56713727820156410577229101238628035242 |--|
   192.168.252.99Up 352.85 MB
   56713727820156410577229101238628035243 |   ^
   192.168.252.125Up 134.24 MB
   85070591730234615865843651857942052863 v   |
   192.168.254.57Up 676.41 MB
   113427455640312821154458202477256070485|   ^
   192.168.254.58Up 99.74 MB
141784319550391026443072753096570088106v   |
   192.168.254.59Up 99.94 MB
170141183460469231731687303715884105727|--|
   The tokens are correct, but it seems that 99 still has a lot of data.
   Why?
   OK, that might be b/c it didn't delete its moved data.
   So next I issued a nodetool cleanup, which should have taken care of
   that.
   Only that it didn't, the node 99 still has 352 MB of data. Why?
   So, you know what, I waited for 1h. Still no good, data wasn't cleaned
   up.
   I restarted the server. Still, data wasn't cleaned up... I issued a
   cleanup
   again... still no good... what's up with this node?
  
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of Riptano, the source for professional Cassandra support
  http://riptano.com
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com

Re: nodetool cleanup isn't cleaning up?

2010-06-01 Thread Ran Tavory

I'm using RackAwareStrategy. But it still doesn't make sense I think...
let's see what did I miss...
According to http://wiki.apache.org/cassandra/Operations


   -

   RackAwareStrategy: replica 2 is placed in the first node along the ring
   the belongs in *another* data center than the first; the remaining N-2
   replicas, if any, are placed on the first nodes along the ring in the *
   same* rack as the first



192.168.252.124Up803.33 MB
56713727820156410577229101238628035242 |--|
192.168.252.99Up 352.85 MB
56713727820156410577229101238628035243 |   ^
192.168.252.125Up134.24 MB
85070591730234615865843651857942052863 v   |
192.168.254.57Up 676.41 MB
 113427455640312821154458202477256070485|   ^
192.168.254.58Up  99.74 MB
 141784319550391026443072753096570088106v   |
192.168.254.59Up  99.94 MB
 170141183460469231731687303715884105727|--|

Alright, so I made a mistake and didn't use the alternate-datacenter
suggestion on the page so the first node of every DC is overloaded with
replicas. However,  the current situation still doesn't make sense to me.
.252.124 will be overloaded b/c it has the first token in the 252 dc.
.254.57 will also be overloaded since it has the first token in the .254 DC.
But for which node does 252.99 serve as a replicator? It's not the first in
the DC and it's just one single token more than it's predecessor (which is
in the same DC).

On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis jbel...@gmail.com wrote:

 I'm saying that .99 is getting a copy of all the data for which .124
 is the primary.  (If you are using RackUnawarePartitioner.  If you are
 using RackAware it is some other node.)

 On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory ran...@gmail.com wrote:
  ok, let me try and translate your answer ;)
  Are you saying that the data that was left on the node is
  non-primary-replicas of rows from the time before the move?
  So this implies that when a node moves in the ring, it will affect
  distribution of:
  - new keys
  - old keys primary node
  -- but will not affect distribution of old keys non-primary replicas.
  If so, still I don't understand something... I would expect even the
  non-primary replicas of keys to be moved since if they don't, how would
 they
  be found? I mean upon reads the serving node should not care about
 whether
  the row is new or old, it should have a consistent and global mapping of
  tokens. So I guess this ruins my theory...
  What did you mean then? Is this deletions of non-primary replicated data?
  How does the replication factor affect the load on the moved host then?
 
  On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  well, there you are then.
 
  On Mon, May 31, 2010 at 2:34 PM, Ran Tavory ran...@gmail.com wrote:
   yes, replication factor = 2
  
   On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis jbel...@gmail.com
   wrote:
  
   you have replication factor  1 ?
  
   On Mon, May 31, 2010 at 7:23 AM, Ran Tavory ran...@gmail.com
 wrote:
I hope I understand nodetool cleanup correctly - it should clean up
all
data
that does not (currently) belong to this node. If so, I think it
might
not
be working correctly.
Look at nodes 192.168.252.124 and 192.168.252.99 below
192.168.252.99Up 279.35 MB
3544607988759775661076818827414252202
 |--|
192.168.252.124Up 167.23 MB
56713727820156410577229101238628035242 |   ^
192.168.252.125Up 82.91 MB
 85070591730234615865843651857942052863 v   |
192.168.254.57Up 366.6 MB
 113427455640312821154458202477256070485|   ^
192.168.254.58Up 88.44 MB
 141784319550391026443072753096570088106v   |
192.168.254.59Up 88.45 MB
 170141183460469231731687303715884105727|--|
I wanted 124 to take all the load from 99. So I issued a move
command.
$ nodetool -h cass99 -p 9004 move
56713727820156410577229101238628035243
   
This command tells 99 to take the space b/w
   
   
   
 (56713727820156410577229101238628035242, 
 56713727820156410577229101238628035243]
which is basically just one item in the token space, almost
nothing... I
wanted it to be very slim (just playing around).
So, next I get this:
192.168.252.124Up 803.33 MB
56713727820156410577229101238628035242 |--|
192.168.252.99Up 352.85 MB
56713727820156410577229101238628035243 |   ^
192.168.252.125Up 134.24 MB
85070591730234615865843651857942052863 v   |
192.168.254.57Up 676.41 MB
113427455640312821154458202477256070485|   ^
192.168.254.58Up 99.74 MB
 141784319550391026443072753096570088106v   |
192.168.254.59Up 99.94 MB
 170141183460469231731687303715884105727|--|
The tokens are correct, but it seems that 99 still has a lot of
 data.
Why?
OK, that might be b

HintedHandoffEnabled

2010-05-31 Thread Ran Tavory

In 0.6.2 I disabled hinted handoff, however tpstats and cfstats report seems
odd.

On all servers in the cluster I have:

 HintedHandoffEnabledfalse/HintedHandoffEnabled

tpstats reports 5 completed handoffs.

$ nodetool -h cass25 -p 9004 tpstats
Pool NameActive   Pending  Completed
FILEUTILS-DELETE-POOL 0 0  2
STREAM-STAGE  0 0  0
RESPONSE-STAGE0 05903099
ROW-READ-STAGE0 0 669093
LB-OPERATIONS 0 0  0
MESSAGE-DESERIALIZER-POOL 1 06595504
GMFD  0 0  35947
LB-TARGET 0 0  0
CONSISTENCY-MANAGER   0 0 669095
ROW-MUTATION-STAGE0 0 644360
MESSAGE-STREAMING-POOL0 0  0
LOAD-BALANCER-STAGE   0 0  0
FLUSH-SORTER-POOL 0 0  0
MEMTABLE-POST-FLUSHER 0 0  7
FLUSH-WRITER-POOL 0 0  7
AE-SERVICE-STAGE  0 0  1
HINTED-HANDOFF-POOL   0 0  5

In data/system/* there are only LocationInfo files, so looks like hinted
handoff is indeed disabled and cfstats does indicate there are 0 bytes,
however it also indicates of 32 reads which I didn't expect (cluster has
been up for a few hours).

$ nodetool -h cass25 -p 9004 tpstats
...
Column Family: HintsColumnFamily
SSTable count: 0
Space used (live): 0
Space used (total): 0
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 32
Read Latency: 0.062 ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 1
Key cache size: 0
Key cache hit rate: NaN
Row cache: disabled
Compacted row minimum size: 0
Compacted row maximum size: 0
Compacted row mean size: 0

Any idea why is this happening?

nodetool cleanup isn't cleaning up?

2010-05-31 Thread Ran Tavory

I hope I understand nodetool cleanup correctly - it should clean up all data
that does not (currently) belong to this node. If so, I think it might not
be working correctly.

Look at nodes 192.168.252.124 and 192.168.252.99 below

192.168.252.99Up 279.35 MB 3544607988759775661076818827414252202
 |--|
192.168.252.124Up 167.23 MB
56713727820156410577229101238628035242 |   ^
192.168.252.125Up 82.91 MB
 85070591730234615865843651857942052863 v   |
192.168.254.57Up 366.6 MB
 113427455640312821154458202477256070485|   ^
192.168.254.58Up 88.44 MB
 141784319550391026443072753096570088106v   |
192.168.254.59Up 88.45 MB
 170141183460469231731687303715884105727|--|

I wanted 124 to take all the load from 99. So I issued a move command.

$ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243


This command tells 99 to take the space b/w
(56713727820156410577229101238628035242,
56713727820156410577229101238628035243]
which is basically just one item in the token space, almost nothing... I
wanted it to be very slim (just playing around).

So, next I get this:

192.168.252.124Up 803.33 MB
56713727820156410577229101238628035242 |--|
192.168.252.99Up 352.85 MB
56713727820156410577229101238628035243 |   ^
192.168.252.125Up 134.24 MB
85070591730234615865843651857942052863 v   |
192.168.254.57Up 676.41 MB
113427455640312821154458202477256070485|   ^
192.168.254.58Up 99.74 MB
 141784319550391026443072753096570088106v   |
192.168.254.59Up 99.94 MB
 170141183460469231731687303715884105727|--|

The tokens are correct, but it seems that 99 still has a lot of data. Why?
OK, that might be b/c it didn't delete its moved data.
So next I issued a nodetool cleanup, which should have taken care of that.
Only that it didn't, the node 99 still has 352 MB of data. Why?
So, you know what, I waited for 1h. Still no good, data wasn't cleaned up.
I restarted the server. Still, data wasn't cleaned up... I issued a cleanup
again... still no good... what's up with this node?

Re: nodetool cleanup isn't cleaning up?

2010-05-31 Thread Ran Tavory

Do you think it's the tombstones that take up the disk space?
Shouldn't the tombstones be moved along with the data?

On Mon, May 31, 2010 at 3:29 PM, Maxim Kramarenko
maxi...@trackstudio.comwrote:

 Hello!

 You likely need wait for GCGraceSeconds seconds or modify this param.

 http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html
 ===
 Thus, a delete operation can't just wipe out all traces of the data being
 removed immediately: if we did, and a replica did not receive the delete
 operation, when it becomes available again it will treat the replicas that
 did receive the delete as having missed a write update, and repair them! So,
 instead of wiping out data on delete, Cassandra replaces it with a special
 value called a tombstone. The tombstone can then be propagated to replicas
 that missed the initial remove request.
 ...
 Here, we defined a constant, GCGraceSeconds, and had each node track
 tombstone age locally. Once it has aged past the constant, it can be GC'd.
 ===



 On 31.05.2010 16:23, Ran Tavory wrote:

 I hope I understand nodetool cleanup correctly - it should clean up all
 data that does not (currently) belong to this node. If so, I think it
 might not be working correctly.

 Look at nodes 192.168.252.124 and 192.168.252.99 below

 192.168.252.99Up 279.35 MB
 3544607988759775661076818827414252202  |--|
 192.168.252.124Up 167.23 MB
 56713727820156410577229101238628035242 |   ^
 192.168.252.125Up 82.91 MB
  85070591730234615865843651857942052863 v   |
 192.168.254.57Up 366.6 MB
  113427455640312821154458202477256070485|   ^
 192.168.254.58Up 88.44 MB
  141784319550391026443072753096570088106v   |
 192.168.254.59Up 88.45 MB
  170141183460469231731687303715884105727|--|

 I wanted 124 to take all the load from 99. So I issued a move command.

 $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243

 This command tells 99 to take the space b/w
 (56713727820156410577229101238628035242,
 56713727820156410577229101238628035243]
 which is basically just one item in the token space, almost nothing... I
 wanted it to be very slim (just playing around).

 So, next I get this:

 192.168.252.124Up 803.33 MB
 56713727820156410577229101238628035242 |--|
 192.168.252.99Up 352.85 MB
 56713727820156410577229101238628035243 |   ^
 192.168.252.125Up 134.24 MB
 85070591730234615865843651857942052863 v   |
 192.168.254.57Up 676.41 MB
 113427455640312821154458202477256070485|   ^
 192.168.254.58Up 99.74 MB
  141784319550391026443072753096570088106v   |
 192.168.254.59Up 99.94 MB
  170141183460469231731687303715884105727|--|

 The tokens are correct, but it seems that 99 still has a lot of data.
 Why? OK, that might be b/c it didn't delete its moved data.
 So next I issued a nodetool cleanup, which should have taken care of
 that. Only that it didn't, the node 99 still has 352 MB of data. Why?
 So, you know what, I waited for 1h. Still no good, data wasn't cleaned up.
 I restarted the server. Still, data wasn't cleaned up... I issued a
 cleanup again... still no good... what's up with this node?



 --
 Best regards,
  Maximmailto:maxi...@trackstudio.com

 LinkedIn Profile: http://www.linkedin.com/in/maximkr
 Google Talk/Jabber: maxi...@gmail.com
 ICQ number: 307863079
 Skype Chat: maxim.kramarenko
 Yahoo! Messenger: maxim_kramarenko

Re: nodetool cleanup isn't cleaning up?

2010-05-31 Thread Ran Tavory

yes, replication factor = 2

On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis jbel...@gmail.com wrote:

 you have replication factor  1 ?

 On Mon, May 31, 2010 at 7:23 AM, Ran Tavory ran...@gmail.com wrote:
  I hope I understand nodetool cleanup correctly - it should clean up all
 data
  that does not (currently) belong to this node. If so, I think it might
 not
  be working correctly.
  Look at nodes 192.168.252.124 and 192.168.252.99 below
  192.168.252.99Up 279.35 MB
 3544607988759775661076818827414252202
   |--|
  192.168.252.124Up 167.23 MB
  56713727820156410577229101238628035242 |   ^
  192.168.252.125Up 82.91 MB
   85070591730234615865843651857942052863 v   |
  192.168.254.57Up 366.6 MB
   113427455640312821154458202477256070485|   ^
  192.168.254.58Up 88.44 MB
   141784319550391026443072753096570088106v   |
  192.168.254.59Up 88.45 MB
   170141183460469231731687303715884105727|--|
  I wanted 124 to take all the load from 99. So I issued a move command.
  $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243
 
  This command tells 99 to take the space b/w
 
 (56713727820156410577229101238628035242, 
 56713727820156410577229101238628035243]
  which is basically just one item in the token space, almost nothing... I
  wanted it to be very slim (just playing around).
  So, next I get this:
  192.168.252.124Up 803.33 MB
  56713727820156410577229101238628035242 |--|
  192.168.252.99Up 352.85 MB
  56713727820156410577229101238628035243 |   ^
  192.168.252.125Up 134.24 MB
  85070591730234615865843651857942052863 v   |
  192.168.254.57Up 676.41 MB
  113427455640312821154458202477256070485|   ^
  192.168.254.58Up 99.74 MB
   141784319550391026443072753096570088106v   |
  192.168.254.59Up 99.94 MB
   170141183460469231731687303715884105727|--|
  The tokens are correct, but it seems that 99 still has a lot of data.
 Why?
  OK, that might be b/c it didn't delete its moved data.
  So next I issued a nodetool cleanup, which should have taken care of
 that.
  Only that it didn't, the node 99 still has 352 MB of data. Why?
  So, you know what, I waited for 1h. Still no good, data wasn't cleaned
 up.
  I restarted the server. Still, data wasn't cleaned up... I issued a
 cleanup
  again... still no good... what's up with this node?
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com

Re: RE: Hector samples -- where?

2010-05-26 Thread Ran Tavory

it's here
http://github.com/rantav/hector/blob/master/src/test/java/me/prettyprint/cassandra/service/KeyspaceTest.java

On Wed, May 26, 2010 at 8:18 AM, Nicholas Sun nick@raytheon.com wrote:

  Could you please provide some indication as to their location?  Thanks.



 Nick



 *From:* Ran Tavory [mailto:ran...@gmail.com]
 *Sent:* Tuesday, May 25, 2010 9:15 PM

 *To:* user@cassandra.apache.org
 *Subject:* Re: RE: Hector samples -- where?



 The best examples are in KeyspaceTest but don't include all scenarios

 On May 26, 2010 2:27 AM, Nicholas Sun nick@raytheon.com wrote:

 I am also interested in this.  It seems like adding multiple Cols into a CF
 or SuperCols would be very useful.  Like a dataload type capability?

 Nick


 -Original Message-
 From: Bill de hOra [mailto:b...@dehora.net]
 Sent: Tuesday, May 25, 2010...

Re: Questions regarding batch mutates and transactions

2010-05-26 Thread Ran Tavory

The summary of your question is: is batch_mutate atomic in the general
sense, meaning when used with multiple keys, multiple column families etc,
correct?

On Wed, May 26, 2010 at 12:45 PM, Todd Nine t...@spidertracks.co.nz wrote:

 Hey guys,
   I originally asked this on the Hector group, but no one was sure of the
 answer.  Can I get some feedback on this.  I'd prefer to avoid having to use
 something like Cages if I can for most of our use cases.  Long term I can
 see we'll need to use something like Cages, especially when it comes to
 complex operations such as billing.  However for a majority of our uses, I
 think it's a bit overkill.  I've used transactions heavily in the workplace
 on SQL based app developments.  To be honest, a majority of application's
 I've built utilize optimistic locking, and only the atomic, consistent, and
 durable functionality of transactional ACID properties.

 To encapsulate all 3, I essentially need all writes to cassandra for a
 given business invocation to occur in a single write.  With Spring, I would
 implement my own transaction manager which simply adds all mutates and
 delete ops to a batch mutate.  When my transaction commits, I would execute
 the mutation on the given keyspace.  Now this would only work if the
 following semantics apply.  I've tried searching for details in Cassandra's
 batch mutate, but I'm not finding what I need.  Here are 2 use cases as an
 example.

 Case 1: Successful update : User adds new contact

 Transaction Start.
 Biz op 1. Row is created in  contacts and all data is added via batch
 mutation
  Biz op 2. Row is created for an SMS message is created for queueing
  through the SMS gateway
 return op 2
 return op 1
 Transaction Commit (batch mutate executed)

 Case 2. Failed update: User adds new contact

 Biz op 1. Row is created in contacts
 Biz op 2. Row is created for SMS message queuing.  Fails due to invalid
 international phone number format
 return op 2
 return op 1
 Transaction is rolled back (batch mutate never executed)


 Now, here is where I can't find what I need in the doc.  In case 1, if my
 mutation from biz op 2 were to fail during a batch mutate operation
 encapsulating all mutations, does the batch mutation as a whole not get
 executed, or would I still have the mutation from op 1 written to cassandra
 while the op 2 write fails?

 Thanks,

Re: Error reporting Key cache hit rate with cfstats or with JMX

2010-05-26 Thread Ran Tavory

If I disable row cache the numbers look good - key cache hit rate is  0, so
it seems to be related to row cache.

Interestingly, after running for a really long time and with both row and
keys caches I do start to see Key cache hit rate  0 but the numbers are so
small that it doesn't make sense.
I have capacity for 10M keys and 10M rows, the number of cached keys is ~5M
and very similarly the number of cached rows is also ~5M, however the hit
rates are very different, 0.7 for rows and 0.006 for keys. I'd expect the
keys hit rate to be identical since none of them reached the limit yet.

Key cache capacity: 1000
Key cache size: 5044097
Key cache hit rate: 0.0062089764058896576
Row cache capacity: 1000
Row cache size: 5057231
Row cache hit rate: 0.7361241352465543



On Tue, May 25, 2010 at 3:43 PM, Jonathan Ellis jbel...@gmail.com wrote:

 What happens if you disable row cache?

 On Tue, May 25, 2010 at 4:53 AM, Ran Tavory ran...@gmail.com wrote:
  It seems there's an error reporting the Key cache hit rate. The value is
  always 0.0 and I have a feeling it's incorrect. This is seen both by
 using
  notetool cfstats as well as accessing JMX directly
 
 (org.apache.cassandra.db:type=Caches,keyspace=outbrain_kvdb,cache=KvAdsKeyCache
  RecentHitRate)
ColumnFamily CompareWith=BytesType Name=KvAds
  RowsCached=1000
  KeysCached=1000/
  Column Family: KvAds
  SSTable count: 7
  Space used (live): 1288942061
  Space used (total): 1559831566
  Memtable Columns Count: 73698
  Memtable Data Size: 17121092
  Memtable Switch Count: 33
  Read Count: 3614433
  Read Latency: 0.068 ms.
  Write Count: 3503269
  Write Latency: 0.024 ms.
  Pending Tasks: 0
  Key cache capacity: 1000
  Key cache size: 619624
  Key cache hit rate: 0.0
  Row cache capacity: 1000
  Row cache size: 447154
  Row cache hit rate: 0.8460295730014572
  Compacted row minimum size: 387
  Compacted row maximum size: 31430
  Compacted row mean size: 631
  The Row cache hit rate looks good, 0.8 but Key cache hit rate always
 seems
  to be 0.0 while the number of unique keys stays about 619624 for quite a
  while.
  Is it a real caching problem or just a reporting glitch?



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com

Re: Error reporting Key cache hit rate with cfstats or with JMX

2010-05-26 Thread Ran Tavory

so the row cache contains both rows and keys and if I have large enough row
cache (in particular if row cache size equals key cache size) then it's just
wasteful to keep another key cache and I should eliminate the key-cache,
correct?

On Thu, May 27, 2010 at 1:21 AM, Jonathan Ellis jbel...@gmail.com wrote:

 It sure sounds like you're seeing the my row cache contains the
 entire hot data set, so the key cache only gets the cold reads
 effect.

 On Wed, May 26, 2010 at 2:54 PM, Ran Tavory ran...@gmail.com wrote:
  If I disable row cache the numbers look good - key cache hit rate is  0,
 so
  it seems to be related to row cache.
  Interestingly, after running for a really long time and with both row and
  keys caches I do start to see Key cache hit rate  0 but the numbers are
 so
  small that it doesn't make sense.
  I have capacity for 10M keys and 10M rows, the number of cached keys is
 ~5M
  and very similarly the number of cached rows is also ~5M, however the hit
  rates are very different, 0.7 for rows and 0.006 for keys. I'd expect the
  keys hit rate to be identical since none of them reached the limit yet.
  Key cache capacity: 1000
  Key cache size: 5044097
  Key cache hit rate: 0.0062089764058896576
  Row cache capacity: 1000
  Row cache size: 5057231
  Row cache hit rate: 0.7361241352465543
 
 
  On Tue, May 25, 2010 at 3:43 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  What happens if you disable row cache?
 
  On Tue, May 25, 2010 at 4:53 AM, Ran Tavory ran...@gmail.com wrote:
   It seems there's an error reporting the Key cache hit rate. The value
 is
   always 0.0 and I have a feeling it's incorrect. This is seen both by
   using
   notetool cfstats as well as accessing JMX directly
  
  
 (org.apache.cassandra.db:type=Caches,keyspace=outbrain_kvdb,cache=KvAdsKeyCache
   RecentHitRate)
 ColumnFamily CompareWith=BytesType Name=KvAds
   RowsCached=1000
   KeysCached=1000/
   Column Family: KvAds
   SSTable count: 7
   Space used (live): 1288942061
   Space used (total): 1559831566
   Memtable Columns Count: 73698
   Memtable Data Size: 17121092
   Memtable Switch Count: 33
   Read Count: 3614433
   Read Latency: 0.068 ms.
   Write Count: 3503269
   Write Latency: 0.024 ms.
   Pending Tasks: 0
   Key cache capacity: 1000
   Key cache size: 619624
   Key cache hit rate: 0.0
   Row cache capacity: 1000
   Row cache size: 447154
   Row cache hit rate: 0.8460295730014572
   Compacted row minimum size: 387
   Compacted row maximum size: 31430
   Compacted row mean size: 631
   The Row cache hit rate looks good, 0.8 but Key cache hit rate always
   seems
   to be 0.0 while the number of unique keys stays about 619624 for quite
 a
   while.
   Is it a real caching problem or just a reporting glitch?
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of Riptano, the source for professional Cassandra support
  http://riptano.com
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com

Re: Hector vs cassandra-java-client

2010-05-25 Thread Ran Tavory

cassandra-java-client is up to cassandra's 0.4.2 version, so you probably
can't use it out of the box.
Hector is active and up to the latest 0.6.1 release with a bunch of
committers, contributors and users. See
http://wiki.github.com/rantav/hector/ and
http://groups.google.com/group/hector-users

On Tue, May 25, 2010 at 5:36 AM, Jeff Zhang zjf...@gmail.com wrote:

 I think hector is better, and seems the author of
 cassandra-java-client does not continue work on it.



 On Tue, May 25, 2010 at 10:21 AM, Peter Hsu pe...@motivecast.com wrote:
  Hi All,
 
  This may have been answered already, but I did a [quick] Google search
 and didn't find much.  Which is the better Java client to use?  Hector or
 cassandra-java-client or neither?
 
  it seems Hector is more fully featured and more active as a project in
 general.
 
  What are user experiences with either library?  Any advice?
 
  Thanks,
  Peter



 --
 Best Regards

 Jeff Zhang

Error reporting Key cache hit rate with cfstats or with JMX

2010-05-25 Thread Ran Tavory

It seems there's an error reporting the Key cache hit rate. The value is
always 0.0 and I have a feeling it's incorrect. This is seen both by using
notetool cfstats as well as accessing JMX directly
(org.apache.cassandra.db:type=Caches,keyspace=outbrain_kvdb,cache=KvAdsKeyCache
RecentHitRate)

  ColumnFamily CompareWith=BytesType Name=KvAds
RowsCached=1000
KeysCached=1000/

Column Family: KvAds
SSTable count: 7
Space used (live): 1288942061
Space used (total): 1559831566
Memtable Columns Count: 73698
Memtable Data Size: 17121092
Memtable Switch Count: 33
Read Count: 3614433
Read Latency: 0.068 ms.
Write Count: 3503269
Write Latency: 0.024 ms.
Pending Tasks: 0
Key cache capacity: 1000
Key cache size: 619624
Key cache hit rate: 0.0
Row cache capacity: 1000
Row cache size: 447154
Row cache hit rate: 0.8460295730014572
Compacted row minimum size: 387
Compacted row maximum size: 31430
Compacted row mean size: 631

The Row cache hit rate looks good, 0.8 but Key cache hit rate always seems
to be 0.0 while the number of unique keys stays about 619624 for quite a
while.
Is it a real caching problem or just a reporting glitch?

Re: Key cache capacity: 1 when using KeysCached=50%

2010-05-25 Thread Ran Tavory

 https://issues.apache.org/jira/browse/CASSANDRA-1129

On Tue, May 25, 2010 at 3:42 PM, Jonathan Ellis jbel...@gmail.com wrote:

 That does look like a bug.  Can you create a ticket and upload a
 (preferably small-ish) sstable that illustrates the problem?

 On Mon, May 24, 2010 at 12:07 PM, Ran Tavory ran...@gmail.com wrote:
  I'd like to have 100% keys cached. Sorry if my example of Super2 wasn't
  correct, but I do think there's a problem. Here's with my own data:
  When using actual numbers (in this case for RowsCached) it works as
  expected, however when specifying KeysCached=100% I get only 1.
ColumnFamily CompareWith=BytesType Name=KvAds
  KeysCached=100%
  RowsCached=1
  /
 
  Column Family: KvAds
  SSTable count: 7
  Space used (live): 797535964
  Space used (total): 797535964
  Memtable Columns Count: 42292
  Memtable Data Size: 10514176
  Memtable Switch Count: 24
  Read Count: 2563704
  Read Latency: 4.590 ms.
  Write Count: 1963804
  Write Latency: 0.025 ms.
  Pending Tasks: 0
  Key cache capacity: 1
  Key cache size: 1
  Key cache hit rate: 0.0
  Row cache capacity: 1
  Row cache size: 1
  Row cache hit rate: 0.2206178354382234
  Compacted row minimum size: 386
  Compacted row maximum size: 9808
  Compacted row mean size: 616
 
  On Mon, May 24, 2010 at 6:30 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  If you really want a cache capacity of 0 then you need to use 0
  explicitly, otherwise the % versions will give you at least 1.
 
  On Mon, May 24, 2010 at 12:34 AM, Ran Tavory ran...@gmail.com wrote:
   I've noticed that when defining KeysCached=50% (or KeysCached=100%
   and I
   didn't test other values with %) then cfstats reports Key cache
   capacity: 1
   This looks weird... is this expected? (version 0.6.1)
   For example, in the default configuration:
 ColumnFamily Name=Super2
   ColumnType=Super
   CompareWith=UTF8Type
   CompareSubcolumnsWith=UTF8Type
   RowsCached=1
   KeysCached=50%/
  
   
   Keyspace: Keyspace1
   Read Count: 0
   Read Latency: NaN ms.
   Write Count: 0
   Write Latency: NaN ms.
   Pending Tasks: 0
   Column Family: Super1
   SSTable count: 0
   Space used (live): 0
   Space used (total): 0
   Memtable Columns Count: 0
   Memtable Data Size: 0
   Memtable Switch Count: 0
   Read Count: 0
   Read Latency: NaN ms.
   Write Count: 0
   Write Latency: NaN ms.
   Pending Tasks: 0
   Key cache capacity: 20
   Key cache size: 0
   Key cache hit rate: NaN
   Row cache: disabled
   Compacted row minimum size: 0
   Compacted row maximum size: 0
   Compacted row mean size: 0
   Column Family: Super2
   SSTable count: 0
   Space used (live): 0
   Space used (total): 0
   Memtable Columns Count: 0
   Memtable Data Size: 0
   Memtable Switch Count: 0
   Read Count: 0
   Read Latency: NaN ms.
   Write Count: 0
   Write Latency: NaN ms.
   Pending Tasks: 0
   Key cache capacity: 1
   Key cache size: 0
   Key cache hit rate: NaN
   Row cache capacity: 1
   Row cache size: 0
   Row cache hit rate: NaN
   Compacted row minimum size: 0
   Compacted row maximum size: 0
   Compacted row mean size: 0
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of Riptano, the source for professional Cassandra support
  http://riptano.com
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com

Re: Hector samples -- where?

2010-05-25 Thread Ran Tavory

http://wiki.github.com/rantav/hector/examples

On May 25, 2010 10:43 PM, Asaf Lahav asaf.la...@gmail.com wrote:

Hi, Where can I find Hector code samples?

Re: RE: Hector samples -- where?

2010-05-25 Thread Ran Tavory

The best examples are in KeyspaceTest but don't include all scenarios

On May 26, 2010 2:27 AM, Nicholas Sun nick@raytheon.com wrote:

I am also interested in this.  It seems like adding multiple Cols into a CF
or SuperCols would be very useful.  Like a dataload type capability?

Nick


-Original Message-
From: Bill de hOra [mailto:b...@dehora.net]
Sent: Tuesday, May 25, 2010...

setcachecapacity is forgotten

2010-05-24 Thread Ran Tavory

I use nodetool to set cache capacity on a certain node but the settings are
forgotten after a few minutes.

I run:
$ nodetool -h localhost -p 9004 setcachecapacity outbrain_kvdb KvImpressions
1000 100

And then run nodetool cfstats immediately after and the settings are
effective, I see the correct cache settings.

However, after a few minutes, and I'm not sure what the trigger really is,
the settings are forgotten and the host returns to the cache settings it had
read when it was booted.
I even updated storage-config,xml thinking maybe the server re-reads the
value from the actual file, but as it seems, it looks like it's reading
values stored in its memory when booted.
Of course I can just restart the server so values from the file will take
effect, but I don't want to start with a cold cache again, I want to
increase cache size while it's hot.

...or am I using the tool incorrectly?
I'm setting the cache capacity for only one host in the ring, not all hosts.

Thanks

Re: Key cache capacity: 1 when using KeysCached=50%

2010-05-24 Thread Ran Tavory

I'd like to have 100% keys cached. Sorry if my example of Super2 wasn't
correct, but I do think there's a problem. Here's with my own data:

When using actual numbers (in this case for RowsCached) it works as
expected, however when specifying KeysCached=100% I get only 1.

  ColumnFamily CompareWith=BytesType Name=KvAds
KeysCached=100%
RowsCached=1
/


Column Family: KvAds
SSTable count: 7
Space used (live): 797535964
Space used (total): 797535964
Memtable Columns Count: 42292
Memtable Data Size: 10514176
Memtable Switch Count: 24
Read Count: 2563704
Read Latency: 4.590 ms.
Write Count: 1963804
Write Latency: 0.025 ms.
Pending Tasks: 0
Key cache capacity: 1
Key cache size: 1
Key cache hit rate: 0.0
Row cache capacity: 1
Row cache size: 1
Row cache hit rate: 0.2206178354382234
Compacted row minimum size: 386
Compacted row maximum size: 9808
Compacted row mean size: 616


On Mon, May 24, 2010 at 6:30 PM, Jonathan Ellis jbel...@gmail.com wrote:

 If you really want a cache capacity of 0 then you need to use 0
 explicitly, otherwise the % versions will give you at least 1.

 On Mon, May 24, 2010 at 12:34 AM, Ran Tavory ran...@gmail.com wrote:
  I've noticed that when defining KeysCached=50% (or KeysCached=100%
 and I
  didn't test other values with %) then cfstats reports Key cache capacity:
 1
  This looks weird... is this expected? (version 0.6.1)
  For example, in the default configuration:
ColumnFamily Name=Super2
  ColumnType=Super
  CompareWith=UTF8Type
  CompareSubcolumnsWith=UTF8Type
  RowsCached=1
  KeysCached=50%/
 
  
  Keyspace: Keyspace1
  Read Count: 0
  Read Latency: NaN ms.
  Write Count: 0
  Write Latency: NaN ms.
  Pending Tasks: 0
  Column Family: Super1
  SSTable count: 0
  Space used (live): 0
  Space used (total): 0
  Memtable Columns Count: 0
  Memtable Data Size: 0
  Memtable Switch Count: 0
  Read Count: 0
  Read Latency: NaN ms.
  Write Count: 0
  Write Latency: NaN ms.
  Pending Tasks: 0
  Key cache capacity: 20
  Key cache size: 0
  Key cache hit rate: NaN
  Row cache: disabled
  Compacted row minimum size: 0
  Compacted row maximum size: 0
  Compacted row mean size: 0
  Column Family: Super2
  SSTable count: 0
  Space used (live): 0
  Space used (total): 0
  Memtable Columns Count: 0
  Memtable Data Size: 0
  Memtable Switch Count: 0
  Read Count: 0
  Read Latency: NaN ms.
  Write Count: 0
  Write Latency: NaN ms.
  Pending Tasks: 0
  Key cache capacity: 1
  Key cache size: 0
  Key cache hit rate: NaN
  Row cache capacity: 1
  Row cache size: 0
  Row cache hit rate: NaN
  Compacted row minimum size: 0
  Compacted row maximum size: 0
  Compacted row mean size: 0
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com

Is there a way to turn HH off?

2010-05-24 Thread Ran Tavory

For small clusters Hinted Handoff cost is not negligible. I'd like to test
its effect.
Is there a way to turn it off for my cluster?

Re: oom in ROW-MUTATION-STAGE

2010-05-23 Thread Ran Tavory

Is there another solution except adding capacity?

How does the ConcurrentReads (default 8) affect that? If I expect to have
similar number of reads and writes should I set the ConcurrentReads equal
to ConcurrentWrites (default 32) ?

thanks

On Sun, May 23, 2010 at 5:43 PM, Jonathan Ellis jbel...@gmail.com wrote:

 looks like reads are backing up, which in turn is making deserialize back
 up

 On Sun, May 23, 2010 at 4:25 AM, Ran Tavory ran...@gmail.com wrote:
  Here's tpstats on a server with traffic that I think will get OOM
 shortly.
  We have 4k pending reads and 123k pending at MESSAGE-DESERIALIZER-POOL
  Is there something I can do to prevent that? (other than adding RAM...)
  Pool NameActive   Pending  Completed
  FILEUTILS-DELETE-POOL 0 0 55
  STREAM-STAGE  0 0  6
  RESPONSE-STAGE0 0  0
  ROW-READ-STAGE8  40887537229
  LB-OPERATIONS 0 0  0
  MESSAGE-DESERIALIZER-POOL 1123799   22198459
  GMFD  0 0 471827
  LB-TARGET 0 0  0
  CONSISTENCY-MANAGER   0 0  0
  ROW-MUTATION-STAGE0 0   14142351
  MESSAGE-STREAMING-POOL0 0 16
  LOAD-BALANCER-STAGE   0 0  0
  FLUSH-SORTER-POOL 0 0  0
  MEMTABLE-POST-FLUSHER 0 0128
  FLUSH-WRITER-POOL 0 0128
  AE-SERVICE-STAGE  1 1  8
  HINTED-HANDOFF-POOL   0 0 10
 
  On Sat, May 22, 2010 at 11:05 PM, Ran Tavory ran...@gmail.com wrote:
 
  The message deserializer has 10m pending tasks before the oom. What do
 you
  think makes the message deserializer blow up? I'd suspect that when it
 goes
  up to 10m pending tasks, don't know how much mem a task actually takes
 up,
  but they may consume a lot of memory. Is there a setting I need to
 tweak?
  (or am I barking at the wrong tree?).
  I'll add the counters
  from http://github.com/jbellis/cassandra-munin-plugins but I already
 have
  most of them monitored, so I attached the graphs of the ones that seemed
 the
  most suspicious in the previous email.
  The system keyspace and HH CF don't look too bad, I think, here they
 are:
  Keyspace: system
  Read Count: 154
  Read Latency: 0.875012987012987 ms.
  Write Count: 9
  Write Latency: 0.20054 ms.
  Pending Tasks: 0
  Column Family: LocationInfo
  SSTable count: 1
  Space used (live): 2714
  Space used (total): 2714
  Memtable Columns Count: 0
  Memtable Data Size: 0
  Memtable Switch Count: 3
  Read Count: 2
  Read Latency: NaN ms.
  Write Count: 9
  Write Latency: 0.011 ms.
  Pending Tasks: 0
  Key cache capacity: 1
  Key cache size: 1
  Key cache hit rate: NaN
  Row cache: disabled
  Compacted row minimum size: 203
  Compacted row maximum size: 397
  Compacted row mean size: 300
  Column Family: HintsColumnFamily
  SSTable count: 1
  Space used (live): 1457
  Space used (total): 4371
  Memtable Columns Count: 0
  Memtable Data Size: 0
  Memtable Switch Count: 0
  Read Count: 152
  Read Latency: 0.369 ms.
  Write Count: 0
  Write Latency: NaN ms.
  Pending Tasks: 0
  Key cache capacity: 1
  Key cache size: 1
  Key cache hit rate: 0.07142857142857142
  Row cache: disabled
  Compacted row minimum size: 829
  Compacted row maximum size: 829
  Compacted row mean size: 829
 
 
 
 
  On Sat, May 22, 2010 at 4:14 AM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  Can you monitor cassandra-level metrics like the ones in
  http://github.com/jbellis/cassandra-munin-plugins ?
 
  the usual culprit is usually compaction but your compacted row size is
  small.  nothing else really comes to mind.
 
  (you should check system keyspace too tho, HH rows can get large)
 
  On Fri, May 21, 2010 at 2:36 PM, Ran Tavory ran...@gmail.com wrote:
   I see some OOM on one of the hosts in the cluster and I wonder if
   there's a
   formula that'll help me calculate what's the required memory setting
   given
   the parameters x,y,z...
   In short, I need

Re: oom in ROW-MUTATION-STAGE

2010-05-23 Thread Ran Tavory

I am disk bound, certainly. I'll try adding more keys and row caching, but I
suspect it's a short blanket, if I add more caching I'll have less free
memory so more chance to OOM again. (is the cache using soft ref so it won't
take mem from real objects?)

On Sun, May 23, 2010 at 8:15 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Sun, May 23, 2010 at 10:59 AM, Ran Tavory ran...@gmail.com wrote:
  Is there another solution except adding capacity?

 Either you need to get more performance/node or increase node count. :)

  How does the ConcurrentReads (default 8) affect that? If I expect to have
  similar number of reads and writes should I set the ConcurrentReads equal
  to ConcurrentWrites (default 32) ?

 You should figure out where the bottleneck is, before tweaking things:
 http://spyced.blogspot.com/2010/01/linux-performance-basics.html

 Increasing CR will only help if you are (a) cpu bound and (b) have so
 many cores that 8 threads isn't saturating them.

 Sight unseen, my guess is you are disk bound.  iostat can confirm this.

 If that's the case then you can try to reduce the disk load w/ row
 cache or key cache.

  On Sun, May 23, 2010 at 5:43 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  looks like reads are backing up, which in turn is making deserialize
 back
  up
 
  On Sun, May 23, 2010 at 4:25 AM, Ran Tavory ran...@gmail.com wrote:
   Here's tpstats on a server with traffic that I think will get OOM
   shortly.
   We have 4k pending reads and 123k pending at MESSAGE-DESERIALIZER-POOL
   Is there something I can do to prevent that? (other than adding
 RAM...)
   Pool NameActive   Pending  Completed
   FILEUTILS-DELETE-POOL 0 0 55
   STREAM-STAGE  0 0  6
   RESPONSE-STAGE0 0  0
   ROW-READ-STAGE8  40887537229
   LB-OPERATIONS 0 0  0
   MESSAGE-DESERIALIZER-POOL 1123799   22198459
   GMFD  0 0 471827
   LB-TARGET 0 0  0
   CONSISTENCY-MANAGER   0 0  0
   ROW-MUTATION-STAGE0 0   14142351
   MESSAGE-STREAMING-POOL0 0 16
   LOAD-BALANCER-STAGE   0 0  0
   FLUSH-SORTER-POOL 0 0  0
   MEMTABLE-POST-FLUSHER 0 0128
   FLUSH-WRITER-POOL 0 0128
   AE-SERVICE-STAGE  1 1  8
   HINTED-HANDOFF-POOL   0 0 10
  
   On Sat, May 22, 2010 at 11:05 PM, Ran Tavory ran...@gmail.com
 wrote:
  
   The message deserializer has 10m pending tasks before the oom. What
 do
   you
   think makes the message deserializer blow up? I'd suspect that when
 it
   goes
   up to 10m pending tasks, don't know how much mem a task actually
 takes
   up,
   but they may consume a lot of memory. Is there a setting I need to
   tweak?
   (or am I barking at the wrong tree?).
   I'll add the counters
   from http://github.com/jbellis/cassandra-munin-plugins but I already
   have
   most of them monitored, so I attached the graphs of the ones that
   seemed the
   most suspicious in the previous email.
   The system keyspace and HH CF don't look too bad, I think, here they
   are:
   Keyspace: system
   Read Count: 154
   Read Latency: 0.875012987012987 ms.
   Write Count: 9
   Write Latency: 0.20054 ms.
   Pending Tasks: 0
   Column Family: LocationInfo
   SSTable count: 1
   Space used (live): 2714
   Space used (total): 2714
   Memtable Columns Count: 0
   Memtable Data Size: 0
   Memtable Switch Count: 3
   Read Count: 2
   Read Latency: NaN ms.
   Write Count: 9
   Write Latency: 0.011 ms.
   Pending Tasks: 0
   Key cache capacity: 1
   Key cache size: 1
   Key cache hit rate: NaN
   Row cache: disabled
   Compacted row minimum size: 203
   Compacted row maximum size: 397
   Compacted row mean size: 300
   Column Family: HintsColumnFamily
   SSTable count: 1
   Space used (live): 1457
   Space used (total): 4371
   Memtable Columns Count: 0
   Memtable Data Size: 0
   Memtable Switch Count: 0
   Read Count: 152
   Read Latency: 0.369 ms.
   Write Count: 0
   Write Latency: NaN ms.
   Pending

Re: how to decommission two slow nodes?

2010-05-21 Thread Ran Tavory

Thanks, I'll  try that next time.

On May 21, 2010 5:23 PM, Jonathan Ellis jbel...@gmail.com wrote:

There is no other way to make the cluster forget a node w/o
decommission / removetoken.

You could do everything up to stop the entire cluster and do a
rolling restart instead, kill the 2 nodes you want to remove, and then
do removetoken, which would still do extra i/o but at least the slow
nodes would not be involved.


On Thu, May 20, 2010 at 8:54 PM, Ran Tavory ran...@gmail.com wrote:
 I forgot to mention that th...
--

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Ca...

Re: Disk usage doubled after nodetool decommission and node still in ring

2010-05-20 Thread Ran Tavory

Run nodetool streams.

On May 18, 2010 4:14 PM, Maxim Kramarenko maxi...@trackstudio.com wrote:

Hi!

After nodetool decomission data size on all nodes grow twice, node still up
and in ring, and no streaming now / tmp SSTables now.

BTW, I have ssh connection to server, so after run nodetool decommission I
expect, that server receive the command press Ctrl-C and close shell. It is
correct ?

What is the best way to check current node state to check, is decommission
is finished ? Should node accept new data after I run decommission command
?

Re: ConcurrentModificationException in gossiper while decommissioning another node

2010-05-20 Thread Ran Tavory

that sounds like it, thanks

On Tue, May 18, 2010 at 3:53 PM, roger schildmeijer
schildmei...@gmail.comwrote:

 This is hopefully fixed in trunk (CASSANDRA-757 (revision 938597));
 Replace synchronization in Gossiper with concurrent data structures and
 volatile fields.

 // Roger Schildmeijer


 On Tue, May 18, 2010 at 1:55 PM, Ran Tavory ran...@gmail.com wrote:

 While the node 192.168.252.61 was in the process of decommissioning I see
 this error in two other nodes:

  INFO [Timer-1] 2010-05-18 06:01:12,048 Gossiper.java (line 179)
 InetAddress /192.168.252.62 is now dead.
  INFO [GMFD:1] 2010-05-18 06:04:00,189 Gossiper.java (line 568)
 InetAddress /192.168.252.62 is now UP
  INFO [Timer-1] 2010-05-18 06:11:45,311 Gossiper.java (line 401) FatClient
 /192.168.252.61 has been silent for 360ms, removing from gossip
 ERROR [Timer-1] 2010-05-18 06:11:45,315 CassandraDaemon.java (line 88)
 Fatal exception in thread Thread[Timer-1,5,main]
 java.lang.RuntimeException: java.util.ConcurrentModificationException
 at
 org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:97)
 at java.util.TimerThread.mainLoop(Timer.java:512)
 at java.util.TimerThread.run(Timer.java:462)
 Caused by: java.util.ConcurrentModificationException
 at java.util.Hashtable$Enumerator.next(Hashtable.java:1031)
 at
 org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:382)
 at
 org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:91)
 ... 2 more


 .61 is the decommissioned node. .62 was under load (streams transferred to
 it from .61)

 I simply ran nodetool decommission on the 61 node and then (after an hour,
 I guess) I saw this error in two other live nodes.

 Does this ring any bell? It's either a bug, or that I wasn't
 running decommission correctly...

Re: decommission and org.apache.thrift.TApplicationException: get_slice failed: unknown result

2010-05-20 Thread Ran Tavory

My decommission was progressing OK, although very slow, but I'll send
another question to the list about that...
The exception must be a hiccup, I hope I won't get it again I suppose...

On Tue, May 18, 2010 at 4:10 PM, Gary Dusbabek gdusba...@gmail.com wrote:

 If I had to guess, I'd say that something at the transport layer had
 trouble.  Possibly some kind of thrift hiccup that we haven't seen
 before.

 Your description makes it sound as if the decommission is proceeding
 normally though.

 Gary.

 On Tue, May 18, 2010 at 04:42, Ran Tavory ran...@gmail.com wrote:
  What's the correct way to remove a node from a cluster?
  According to this page http://wiki.apache.org/cassandra/Operations a
  decommission call should be enough.
  When decommissioning one of the nodes from my cluster I see an error in
 the
  client:
  org.apache.thrift.TApplicationException: get_slice failed: unknown result
 at
 
 org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:407)
 at
 
 org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:367)
 
  The client isn't talking to the decommissioned node, it's connected to
  another node, so I'd expect all operations to continue as normal
 (although
  slower), right?
  I simply called nodetool -h ... decommission on the host and waited.
 After
  a while, while the node is still decommissioning I saw the error at the
  client.
  The current state of the node is Decommissioned and it's not in the ring
  now. It is still moving streams to other hosts, though. I can't be sure,
  though whether the error happened during the time it was Leaving the ring
 or
  was it already Decommissioned.
  The server logs don't show something of note (no errors or warnings).
  What do you think?

how to decommission two slow nodes?

2010-05-20 Thread Ran Tavory

In my cluster setup I have two datacenters with 5 hosts in one DC and 3 in
the other.
In the 5 hosts DC I'd like to remove two hosts so I'd get 3 and 3 in each.
The two nodes I'd like to decommission have less RAM than the other 3 so
they operate slower.
What's the most effective way to decommission them?

At first I thought I'd decommission the first and then when it's done,
decommission the second, but the problem was that when I decommissioned the
first it started streaming its data to the second node (as well as others I
think) and since the second node was under heavy load, and not enough ram,
it was busy GCing and worked horribly slow. Eventually, after almost 24h of
horribly slow streaming I gave up. This also caused the entire cluster to
operate horribly slow.

So, is there a better way to decommission the two under provisioned nodes
without slowing down the cluster, or at least with a minimum effect?

My replication is 2 and I'm using a RackAwareStrategy so (if everything is
configured correctly with the EndPointSnitch) then at any given time, two
copies of the data exist, one in each DC.

Thanks

mapreduce from cassandra to cassandra

2010-05-18 Thread Ran Tavory

In the wordcount example the process reads from cassandra and the result is
written to a local file at /tmp/word_count*
Is it possible to read from cassandra and write the result back to cassandra
to a specified cf/row/column?

I see that there exists a ColumnFamilyInputFormat but not
ColumnFamilyOutputFormat or something like that (in
http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/hadoop/
)

My knowledge about hadoop and mr is pretty basic so maybe I'm missing
something simple, lmk, thanks!

Re: mapreduce from cassandra to cassandra

2010-05-18 Thread Ran Tavory

hbase - yes. But is that reusable for cassandra?

On Tue, May 18, 2010 at 12:17 PM, Jeff Zhang zjf...@gmail.com wrote:

 I believe it is possible to write result back to cassandra. If I
 remember correctly, HBase has both InputFormat and OutputFormat for
 hadoop.




 On Tue, May 18, 2010 at 5:08 PM, Ran Tavory ran...@gmail.com wrote:
  In the wordcount example the process reads from cassandra and the result
 is
  written to a local file at /tmp/word_count*
  Is it possible to read from cassandra and write the result back to
 cassandra
  to a specified cf/row/column?
  I see that there exists a ColumnFamilyInputFormat but not
  ColumnFamilyOutputFormat or something like that
  (in
 http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/hadoop/
 )
  My knowledge about hadoop and mr is pretty basic so maybe I'm missing
  something simple, lmk, thanks!



 --
 Best Regards

 Jeff Zhang

decommission and org.apache.thrift.TApplicationException: get_slice failed: unknown result

2010-05-18 Thread Ran Tavory

What's the correct way to remove a node from a cluster?
According to this page http://wiki.apache.org/cassandra/Operations a
decommission call should be enough.

When decommissioning one of the nodes from my cluster I see an error in the
client:

org.apache.thrift.TApplicationException: get_slice failed: unknown result
   at
org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:407)
   at
org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:367)

The client isn't talking to the decommissioned node, it's connected to
another node, so I'd expect all operations to continue as normal (although
slower), right?

I simply called nodetool -h ... decommission on the host and waited. After
a while, while the node is still decommissioning I saw the error at the
client.

The current state of the node is Decommissioned and it's not in the ring
now. It is still moving streams to other hosts, though. I can't be sure,
though whether the error happened during the time it was Leaving the ring or
was it already Decommissioned.

The server logs don't show something of note (no errors or warnings).

What do you think?

ConcurrentModificationException in gossiper while decommissioning another node

2010-05-18 Thread Ran Tavory

While the node 192.168.252.61 was in the process of decommissioning I see
this error in two other nodes:

 INFO [Timer-1] 2010-05-18 06:01:12,048 Gossiper.java (line 179) InetAddress
/192.168.252.62 is now dead.
 INFO [GMFD:1] 2010-05-18 06:04:00,189 Gossiper.java (line 568) InetAddress
/192.168.252.62 is now UP
 INFO [Timer-1] 2010-05-18 06:11:45,311 Gossiper.java (line 401) FatClient /
192.168.252.61 has been silent for 360ms, removing from gossip
ERROR [Timer-1] 2010-05-18 06:11:45,315 CassandraDaemon.java (line 88) Fatal
exception in thread Thread[Timer-1,5,main]
java.lang.RuntimeException: java.util.ConcurrentModificationException
at
org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:97)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
Caused by: java.util.ConcurrentModificationException
at java.util.Hashtable$Enumerator.next(Hashtable.java:1031)
at
org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:382)
at
org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:91)
... 2 more


.61 is the decommissioned node. .62 was under load (streams transferred to
it from .61)

I simply ran nodetool decommission on the 61 node and then (after an hour, I
guess) I saw this error in two other live nodes.

Does this ring any bell? It's either a bug, or that I wasn't
running decommission correctly...

Re: is it possible to trace/debug cassandra?

2010-05-18 Thread Ran Tavory

Add to cassandra.in.sh  -Xdebug
-Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=n
to the JVM_OPTS section.
Then connect with jdb (
http://java.sun.com/j2se/1.3/docs/tooldocs/solaris/jdb.html) or your IDE as
a remote process

On Tue, May 18, 2010 at 1:18 PM, S Ahmed sahmed1...@gmail.com wrote:

 Would it be possible to put cassandra in debug mode, so I could actually
 step through, line by line, the execution flow of operations I execute
 against it?

 If yes, any help would be great.

Re: JMX metrics for monitoring

2010-05-17 Thread Ran Tavory

There are many, but here's what I found useful so far:
Per CF you have:
- Recent read/write latency
- PendingTasks
- Read/Write count

Globally you have, for each of the stages
(e.g. org.apache.cassandra.concurrent:type=ROW-READ-STAGE):
- PendingTasks
- ActiveCount

... and as you go you'll find more

On Tue, May 18, 2010 at 1:02 AM, Maxim Kramarenko
maxi...@trackstudio.comwrote:

 Hi!

 Which JMX metrics do you use for Cassandra monitoring ? Which values can be
 used for alerts ?

Re: what/how do you guys monitor slow nodes?

2010-05-12 Thread Ran Tavory

There is a per cf read and write latency jmx.

On May 12, 2010 12:55 AM, Jordan Pittier - Rezel jor...@rezel.net wrote:

For sure you have to pay particular attention to memory allocation on each
node, especially be sure your servers dont swap. Then you can monitor how
load are balanced among your nodes (nodetools -h XX ring).



On Tue, May 11, 2010 at 11:46 PM, S Ahmed sahmed1...@gmail.com wrote:

 If you have 3-4 nodes,...

1 2 >

1 - 100 of 125 matches

Mail list logo