Re: SuperColumns

2010-04-15 Thread Vijay
Yes a super column can only have columns in it.

Regards,
/VJ



On Wed, Apr 14, 2010 at 10:28 PM, Christian Torres chtor...@gmail.comwrote:

 I'm defining a ColumnFamily (Table) type Super, It's posible to have a
 SuperColumn inside another SuperColumn or SuperColumns can only have normal
 columns?

 --
 Christian Torres * Desarrollador Web * Guegue.com *
 Celular: +505 84 65 92 62 * Loving of the Programming



Row key: string or binary (byte[])?

2010-04-15 Thread Roland Hänel
Is there any effort ongoing to make the row key a binary (byte[]) instead of
a string? In the current cassandra.thrift file (0.6.0), I find:

const string VERSION = 2.1.0
[...]
struct KeySlice {
1: required *string* key,
2: required listColumnOrSuperColumn columns,
}

while on the current (?) SVN
https://svn.apache.org/repos/asf/cassandra/trunk/interface/cassandra.thriftit
reads:

const string VERSION = 4.0.0
[...]
struct KeySlice {
1: required *binary* key,
2: required listColumnOrSuperColumn columns,
}

Thanks for enlightening me. :-)

Greetings,
Roland


AssertionError: DecoratedKey(...) != DecoratedKey(...)

2010-04-15 Thread Ran Tavory
When restarting one of the nodes in my cluster I found this error in the
log. What does this mean?

 INFO [GC inspection] 2010-04-15 05:03:04,898 GCInspector.java (line 110) GC
for ConcurrentMarkSweep: 712 ms, 11149016 reclaimed leaving 442336680 used;
max is 4432068608
ERROR [HINTED-HANDOFF-POOL:1] 2010-04-15 05:03:17,948
DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask
java.util.concurrent.ExecutionException: java.lang.AssertionError:
DecoratedKey(163143070370570938845670096830182058073,
1K2i35+B8RuuRDP7Gwz3Xw==) !=
DecoratedKey(163143368384879375649994309361429628039,
4k54mGvj7JoT5rBH68K+9A==) in
/outbrain/cassandra/data/outbrain/DocumentMapping-305-Data.db
at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.AssertionError:
DecoratedKey(163143070370570938845670096830182058073,
1K2i35+B8RuuRDP7Gwz3Xw==) !=
DecoratedKey(163143368384879375649994309361429628039,
4k54mGvj7JoT5rBH68K+9A==) in
/outbrain/cassandra/data/outbrain/DocumentMapping-305-Data.db
at
org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.init(SSTableSliceIterator.java:127)
at
org.apache.cassandra.db.filter.SSTableSliceIterator.init(SSTableSliceIterator.java:59)
at
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:830)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:750)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:719)
at
org.apache.cassandra.db.HintedHandOffManager.sendMessage(HintedHandOffManager.java:122)
at
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:250)
at
org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:80)
at
org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:280)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
... 2 more


Re: Time-series data model

2010-04-15 Thread Jean-Pierre Bergamin

Am 14.04.2010 15:22, schrieb Ted Zlatanov:

On Wed, 14 Apr 2010 15:02:29 +0200 Jean-Pierre Bergaminja...@ractive.ch  
wrote:

JB  The metrics are stored together with a timestamp. The queries we want to
JB  perform are:
JB   * The last value of a specific metric of a device
JB   * The values of a specific metric of a device between two timestamps t1 
and
JB  t2

Make your key devicename-metricname-MMDD-HHMM (with whatever time
sharding makes sense to you; I use UTC by-hours and by-day in my
environment).  Then your supercolumn is the collection time as a
LongType and your columns inside the supercolumn can express the metric
in detail (collector agent, detailed breakdown, etc.).
   
Just for my understanding. What is time sharding? I couldn't find an 
explanation somewhere. Do you mean that the time-series data is rolled 
up in 5 minues, 1 hour, 1 day etc. slices?


So this would be defined as:
ColumnFamily Name=measurements ColumnType=Super 
CompareWith=UTF8Type  CompareSubcolumnsWith=LongType /


So when i want to read all values of one metric between two timestamps 
t0 and t1, I'd have to read the supercolumns that match a key range 
(device1:metric1:t0 - device1:metric1:t1) and then all the supercolumns 
for this key?



Regards
James


Re: AssertionError: DecoratedKey(...) != DecoratedKey(...)

2010-04-15 Thread Gary Dusbabek
Ran,

It looks like you're seeing
https://issues.apache.org/jira/browse/CASSANDRA-866.  It's fixed in
0.6.1.

Gary

On Thu, Apr 15, 2010 at 04:06, Ran Tavory ran...@gmail.com wrote:
 When restarting one of the nodes in my cluster I found this error in the
 log. What does this mean?

  INFO [GC inspection] 2010-04-15 05:03:04,898 GCInspector.java (line 110) GC
 for ConcurrentMarkSweep: 712 ms, 11149016 reclaimed leaving 442336680 used;
 max is 4432068608
 ERROR [HINTED-HANDOFF-POOL:1] 2010-04-15 05:03:17,948
 DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask
 java.util.concurrent.ExecutionException: java.lang.AssertionError:
 DecoratedKey(163143070370570938845670096830182058073,
 1K2i35+B8RuuRDP7Gwz3Xw==) !=
 DecoratedKey(163143368384879375649994309361429628039,
 4k54mGvj7JoT5rBH68K+9A==) in
 /outbrain/cassandra/data/outbrain/DocumentMapping-305-Data.db
         at
 java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
         at
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86)
         at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
         at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:619)
 Caused by: java.lang.AssertionError:
 DecoratedKey(163143070370570938845670096830182058073,
 1K2i35+B8RuuRDP7Gwz3Xw==) !=
 DecoratedKey(163143368384879375649994309361429628039,
 4k54mGvj7JoT5rBH68K+9A==) in
 /outbrain/cassandra/data/outbrain/DocumentMapping-305-Data.db
         at
 org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.init(SSTableSliceIterator.java:127)
         at
 org.apache.cassandra.db.filter.SSTableSliceIterator.init(SSTableSliceIterator.java:59)
         at
 org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63)
         at
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:830)
         at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:750)
         at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:719)
         at
 org.apache.cassandra.db.HintedHandOffManager.sendMessage(HintedHandOffManager.java:122)
         at
 org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:250)
         at
 org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:80)
         at
 org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:280)
         at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
         at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
         at
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
         at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         ... 2 more



How to implement TOP TEN in Cassandra

2010-04-15 Thread Allen He
Hi , all

How to implement *TOP TEN* in Cassandra,

For example , *Top ten stories in Digg.com*

How to model.

Thanks


Get super-columns using SimpleCassie

2010-04-15 Thread Yésica Rey

I'm using SimpleCassie like cassandra client.
I have a question: can I get all super-columns that there in one 
column-family?
If yes, how can i do it? 


Regards!


Re: TException: Error: TSocket: timed out reading 1024 bytes from 10.1.1.27:9160

2010-04-15 Thread Jonathan Ellis
sounds like https://issues.apache.org/jira/browse/THRIFT-347

On Wed, Apr 14, 2010 at 11:58 PM, richard yao richard.yao2...@gmail.com wrote:
 I am having a try on cassandra, and I use php to access cassandra by thrift
 API.
 I got an error like this:
     TException:  Error: TSocket: timed out reading 1024 bytes from
 10.1.1.27:9160
 What's wrong?
 Thanks.


Re: TException: Error: TSocket: timed out reading 1024 bytes from 10.1.1.27:9160

2010-04-15 Thread richard yao
Thank you!


Re: AssertionError: DecoratedKey(...) != DecoratedKey(...)

2010-04-15 Thread Ran Tavory
yes, this looks like the same issue, thanks Gary.

Other than seeing the errors in the log I haven't seen any other
irregularities. (maybe there are, but they haven't surfaced). Does this
assertion mean data corruption or something else that's worth waiting to
0.6.1 for?

On Thu, Apr 15, 2010 at 2:00 PM, Gary Dusbabek gdusba...@gmail.com wrote:

 Ran,

 It looks like you're seeing
 https://issues.apache.org/jira/browse/CASSANDRA-866.  It's fixed in
 0.6.1.

 Gary

 On Thu, Apr 15, 2010 at 04:06, Ran Tavory ran...@gmail.com wrote:
  When restarting one of the nodes in my cluster I found this error in the
  log. What does this mean?
 
   INFO [GC inspection] 2010-04-15 05:03:04,898 GCInspector.java (line 110)
 GC
  for ConcurrentMarkSweep: 712 ms, 11149016 reclaimed leaving 442336680
 used;
  max is 4432068608
  ERROR [HINTED-HANDOFF-POOL:1] 2010-04-15 05:03:17,948
  DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask
  java.util.concurrent.ExecutionException: java.lang.AssertionError:
  DecoratedKey(163143070370570938845670096830182058073,
  1K2i35+B8RuuRDP7Gwz3Xw==) !=
  DecoratedKey(163143368384879375649994309361429628039,
  4k54mGvj7JoT5rBH68K+9A==) in
  /outbrain/cassandra/data/outbrain/DocumentMapping-305-Data.db
  at
  java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
  at java.util.concurrent.FutureTask.get(FutureTask.java:83)
  at
 
 org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:619)
  Caused by: java.lang.AssertionError:
  DecoratedKey(163143070370570938845670096830182058073,
  1K2i35+B8RuuRDP7Gwz3Xw==) !=
  DecoratedKey(163143368384879375649994309361429628039,
  4k54mGvj7JoT5rBH68K+9A==) in
  /outbrain/cassandra/data/outbrain/DocumentMapping-305-Data.db
  at
 
 org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.init(SSTableSliceIterator.java:127)
  at
 
 org.apache.cassandra.db.filter.SSTableSliceIterator.init(SSTableSliceIterator.java:59)
  at
 
 org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63)
  at
 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:830)
  at
 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:750)
  at
 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:719)
  at
 
 org.apache.cassandra.db.HintedHandOffManager.sendMessage(HintedHandOffManager.java:122)
  at
 
 org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:250)
  at
 
 org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:80)
  at
 
 org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:280)
  at
  org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
  at
  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
  at
  java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  ... 2 more
 



Re: AssertionError: DecoratedKey(...) != DecoratedKey(...)

2010-04-15 Thread Gary Dusbabek
No data corruption.  There was a bug in the way that the index was
scanned that was manifesting itself when when the index got bigger
than 2GB.

Gary.


On Thu, Apr 15, 2010 at 08:03, Ran Tavory ran...@gmail.com wrote:
 yes, this looks like the same issue, thanks Gary.
 Other than seeing the errors in the log I haven't seen any other
 irregularities. (maybe there are, but they haven't surfaced). Does this
 assertion mean data corruption or something else that's worth waiting to
 0.6.1 for?

 On Thu, Apr 15, 2010 at 2:00 PM, Gary Dusbabek gdusba...@gmail.com wrote:

 Ran,

 It looks like you're seeing
 https://issues.apache.org/jira/browse/CASSANDRA-866.  It's fixed in
 0.6.1.

 Gary

 On Thu, Apr 15, 2010 at 04:06, Ran Tavory ran...@gmail.com wrote:
  When restarting one of the nodes in my cluster I found this error in the
  log. What does this mean?
 
   INFO [GC inspection] 2010-04-15 05:03:04,898 GCInspector.java (line
  110) GC
  for ConcurrentMarkSweep: 712 ms, 11149016 reclaimed leaving 442336680
  used;
  max is 4432068608
  ERROR [HINTED-HANDOFF-POOL:1] 2010-04-15 05:03:17,948
  DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask
  java.util.concurrent.ExecutionException: java.lang.AssertionError:
  DecoratedKey(163143070370570938845670096830182058073,
  1K2i35+B8RuuRDP7Gwz3Xw==) !=
  DecoratedKey(163143368384879375649994309361429628039,
  4k54mGvj7JoT5rBH68K+9A==) in
  /outbrain/cassandra/data/outbrain/DocumentMapping-305-Data.db
          at
  java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
          at java.util.concurrent.FutureTask.get(FutureTask.java:83)
          at
 
  org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86)
          at
 
  java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
          at
 
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          at java.lang.Thread.run(Thread.java:619)
  Caused by: java.lang.AssertionError:
  DecoratedKey(163143070370570938845670096830182058073,
  1K2i35+B8RuuRDP7Gwz3Xw==) !=
  DecoratedKey(163143368384879375649994309361429628039,
  4k54mGvj7JoT5rBH68K+9A==) in
  /outbrain/cassandra/data/outbrain/DocumentMapping-305-Data.db
          at
 
  org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.init(SSTableSliceIterator.java:127)
          at
 
  org.apache.cassandra.db.filter.SSTableSliceIterator.init(SSTableSliceIterator.java:59)
          at
 
  org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63)
          at
 
  org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:830)
          at
 
  org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:750)
          at
 
  org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:719)
          at
 
  org.apache.cassandra.db.HintedHandOffManager.sendMessage(HintedHandOffManager.java:122)
          at
 
  org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:250)
          at
 
  org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:80)
          at
 
  org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:280)
          at
  org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
          at
  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
          at
  java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          at
 
  java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          ... 2 more
 




Re: timestamp not found

2010-04-15 Thread Mike Malone
Looks like the timestamp, in this case, is 0. Does Cassandra allow zero
timestamps? Could be a bug in Cassandra doing an implicit boolean coercion
in a conditional where it shouldn't.

Mike

On Thu, Apr 15, 2010 at 8:39 AM, Lee Parker l...@socialagency.com wrote:

 We are currently migrating about 70G of data from mysql to cassandra.  I am
 occasionally getting the following error:

 Required field 'timestamp' was not found in serialized data! Struct:
 Column(name:74 65 78 74, value:44 61 73 20 6C 69 65 62 20 69 63 68 20 76 6F
 6E 20 23 49 6E 61 3A 20 68 74 74 70 3A 2F 2F 77 77 77 2E 79 6F 75 74 75 62
 65 2E 63 6F 6D 2F 77 61 74 63 68 3F 76 3D 70 75 38 4B 54 77 79 64 56 77 6B
 26 66 65 61 74 75 72 65 3D 72 65 6C 61 74 65 64 20 40 70 6A 80 01 00 01 00,
 timestamp:0)

 The loop which is building out the mutation map for the batch_mutate call
 is adding a timestamp to each column.  I have verified that the time stamp
 is there for several calls and I feel like if the logic was bad, i would see
 the error more frequently.  Does anyone have suggestions as to what may be
 causing this?

 Lee Parker
 l...@spredfast.com

 [image: Spredfast]



Re: timestamp not found

2010-04-15 Thread Lee Parker
When I am verifying the columns in the mutation map before sending it to
cassandra, none of the timestamps are 0.  I have had a difficult time
recreating the error in a controlled environment so I can see the mutation
map that was actually sent.

Lee Parker
l...@spredfast.com

[image: Spredfast]
On Thu, Apr 15, 2010 at 10:45 AM, Mike Malone m...@simplegeo.com wrote:

 Looks like the timestamp, in this case, is 0. Does Cassandra allow zero
 timestamps? Could be a bug in Cassandra doing an implicit boolean coercion
 in a conditional where it shouldn't.

 Mike


 On Thu, Apr 15, 2010 at 8:39 AM, Lee Parker l...@socialagency.com wrote:

 We are currently migrating about 70G of data from mysql to cassandra.  I
 am occasionally getting the following error:

 Required field 'timestamp' was not found in serialized data! Struct:
 Column(name:74 65 78 74, value:44 61 73 20 6C 69 65 62 20 69 63 68 20 76 6F
 6E 20 23 49 6E 61 3A 20 68 74 74 70 3A 2F 2F 77 77 77 2E 79 6F 75 74 75 62
 65 2E 63 6F 6D 2F 77 61 74 63 68 3F 76 3D 70 75 38 4B 54 77 79 64 56 77 6B
 26 66 65 61 74 75 72 65 3D 72 65 6C 61 74 65 64 20 40 70 6A 80 01 00 01 00,
 timestamp:0)

 The loop which is building out the mutation map for the batch_mutate call
 is adding a timestamp to each column.  I have verified that the time stamp
 is there for several calls and I feel like if the logic was bad, i would see
 the error more frequently.  Does anyone have suggestions as to what may be
 causing this?

 Lee Parker
 l...@spredfast.com

 [image: Spredfast]





Re: timestamp not found

2010-04-15 Thread Lee Parker
I'm actually using PHP.  I do have several php processes running, but each
one should have it's own Thrift connection.

Lee Parker
l...@spredfast.com

[image: Spredfast]
On Thu, Apr 15, 2010 at 10:53 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Looks like you are using C++ and not setting the isset flag on the
 timestamp field, so it's getting the default value for a Java long (0).

 If it works most of the time then possibly you are using a Thrift
 connection from multiple threads at the same time, which is not safe.


 On Thu, Apr 15, 2010 at 10:39 AM, Lee Parker l...@socialagency.com wrote:

 We are currently migrating about 70G of data from mysql to cassandra.  I
 am occasionally getting the following error:

 Required field 'timestamp' was not found in serialized data! Struct:
 Column(name:74 65 78 74, value:44 61 73 20 6C 69 65 62 20 69 63 68 20 76 6F
 6E 20 23 49 6E 61 3A 20 68 74 74 70 3A 2F 2F 77 77 77 2E 79 6F 75 74 75 62
 65 2E 63 6F 6D 2F 77 61 74 63 68 3F 76 3D 70 75 38 4B 54 77 79 64 56 77 6B
 26 66 65 61 74 75 72 65 3D 72 65 6C 61 74 65 64 20 40 70 6A 80 01 00 01 00,
 timestamp:0)

 The loop which is building out the mutation map for the batch_mutate call
 is adding a timestamp to each column.  I have verified that the time stamp
 is there for several calls and I feel like if the logic was bad, i would see
 the error more frequently.  Does anyone have suggestions as to what may be
 causing this?

 Lee Parker
 l...@spredfast.com

 [image: Spredfast]





Re: RackAware and replication strategy

2010-04-15 Thread Benjamin Black
Have a look at locator/DatacenterShardStrategy.java.

On Thu, Apr 15, 2010 at 8:16 AM, Ran Tavory ran...@gmail.com wrote:
 I'm reading this on this
 page http://wiki.apache.org/cassandra/ArchitectureInternals :

 AbstractReplicationStrategy controls what nodes get secondary, tertiary,
 etc. replicas of each key range. Primary replica is always determined by the
 token ring (in TokenMetadata) but you can do a lot of variation with the
 others. RackUnaware just puts replicas on the next N-1 nodes in the ring.
 RackAware puts the first non-primary replica in the next node in the ring in
 ANOTHER data center than the primary; then the remaining replicas in the
 same as the primary.

 So I just want to make sure I got this right and that documentation is up to
 date.
 I have two data centers and rack-aware.
 When replication factor is 2: is it always the case that the primary replica
 goes to one DC and the second replica to the second DC?
 When replication factor is 3: First replica in DC1, second in DC2 and third
 in DC1
 When replication factor is 4: First replica in DC1, second in DC2, third in
 DC1, fourth in DC1 etc
 If I have 4 hosts in each DC, which replication factors make sense?
 N=1 - When I don't care about losing data, cool
 N=2 - When I want to make sure each DC has a copy; useful for local fast
 access and allows recovery if only one host down.
 N=3 - If I want to make sure each DC has a copy plus recovery can be made
 faster in certain cases, and more resilient to two hosts down.
 N=4 - Like N=3 but even more resilient. etc
 Say I want to have two replicas in each DC, can this be done?



busy thread on IncomingStreamReader ?

2010-04-15 Thread Ingram Chen
Hi all,

 We setup two nodes and simply set replication factor=2 for test run.

After both nodes, say, node A and node B, serve several hours, we found that
node A always keep 300% cpu usage.
(the other node is under 100% cpu, which is normal)

thread dump on node A shows that there are 3 busy threads related to
IncomingStreamReader:

==

Thread-66 prio=10 tid=0x2aade4018800 nid=0x69e7 runnable
[0x4030a000]
   java.lang.Thread.State: RUNNABLE
at sun.misc.Unsafe.setMemory(Native Method)
at sun.nio.ch.Util.erase(Util.java:202)
at
sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560)
at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
at
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)

Thread-65 prio=10 tid=0x2aade4017000 nid=0x69e6 runnable
[0x4d44b000]
   java.lang.Thread.State: RUNNABLE
at sun.misc.Unsafe.setMemory(Native Method)
at sun.nio.ch.Util.erase(Util.java:202)
at
sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560)
at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
at
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)

Thread-62 prio=10 tid=0x2aade4014800 nid=0x4150 runnable
[0x4d34a000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileChannelImpl.size0(Native Method)
at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:309)
- locked 0x2aaac450dcd0 (a java.lang.Object)
at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:597)
at
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)

===

Is there anyone experience similar issue ?

environments:

OS   --- CentOS 5.4, Linux 2.6.18-164.15.1.el5 SMP x86_64 GNU/Linux
Java --- build 1.6.0_16-b01, Java HotSpot(TM) 64-Bit Server VM (build
14.2-b01, mixed mode)
Cassandra --- 0.6.0
Node configuration --- node A and node B. both nodes use node A as Seed
client --- Java thrift clients pick one node randomly to do read and write.


-- 
Ingram Chen
online share order: http://dinbendon.net
blog: http://www.javaworld.com.tw/roller/page/ingramchen


Re: BMT flush on windows?

2010-04-15 Thread Sonny Heer
From the jconsole, I go under
ColumnFamilyStores-CF1-Column1-Operations and clicked force
flush().

I'm getting a Operation return value null OK message box.  what am I
doing wrong?


On Tue, Apr 13, 2010 at 3:12 PM, Jonathan Ellis jbel...@gmail.com wrote:
 you have three options

 (a) connect with jconsole or another jmx client and invoke flush that way
 (b) run org.apache.cassandra.tools.NodeCmd manually
 (b) write a bat file for NodeCmd like the nodetool shell script in bin/

 On Tue, Apr 13, 2010 at 5:08 PM, Sonny Heer sonnyh...@gmail.com wrote:
 Is there any way to run a keyspace flush on a windows box?




Re: Recovery from botched compaction

2010-04-15 Thread Jonathan Ellis
On Tue, Apr 13, 2010 at 3:59 PM, Anthony Molinaro
antho...@alumni.caltech.edu wrote:
 I actually got lucky and while it hovered in the 91-95% full, compaction
 finished and its now at 60%.  However, I still have around a dozen or so
 data files.  I thought 'nodeprobe compact' did a major compaction, and
 that a major compaction would shrink to one file?

2 possibilities, probably both of which are affecting you:

1. If there isn't enough disk space to compact everything, cassandra
will remove files from the to-compact list until it has room to do
what you asked it to do.  (But, you you can still run out of space if
you write enough data while the compaction happens.)

2. 0.5's minor compactions don't combine as many sstables as they
should automatically.  This is fixed in 0.6

 Okay, sounds good, I may leave it for the moment, as last time I tried
 any sort of move/decommision with 0.5.x I was unable to figure out if
 anything was happening, so I may just wait and revisit when I upgrade.

Yes, 0.5 sucks there.  0.6 is still a little opaque but you can at
least see what is happening if you know where to look:
http://wiki.apache.org/cassandra/Streaming

-Jonathan


Re: batch_mutate silently failing

2010-04-15 Thread Jonathan Ellis
Could you create a ticket for us to return an error message in this
situation?

-Jonathan

On Tue, Apr 13, 2010 at 4:24 PM, Lee Parker l...@socialagency.com wrote:

 nevermind.  I figured out what the problem was.  I was not putting the
 column inside a ColumnOrSuperColumn container.


 Lee Parker
 l...@spredfast.com

 [image: Spredfast]
 On Tue, Apr 13, 2010 at 4:19 PM, Lee Parker l...@socialagency.com wrote:

 I upgraded my dev environment to 0.6.0 today in expectation of upgrading
 our prod environment soon.  I am trying to rewrite some of our code to use
 batch_mutate with the Thrift PHP library directly.  I'm not getting any
 result back, not even an exception or failure message, but the data is never
 showing up in the single node cassandra setup.  Here is a dump of my
 mutation map:

 array(1) {
   [testkey]=
   array(1) {
 [StreamItems]=
 array(2) {
   [0]=
   object(cassandra_Mutation)#156 (2) {
 [column_or_supercolumn]=
 object(cassandra_Column)#157 (3) {
   [name]=
   string(4) test
   [value]=
   string(14) this is a test
   [timestamp]=
   float(1271193181943.1)
 }
 [deletion]=
 NULL
   }
   [1]=
   object(cassandra_Mutation)#158 (2) {
 [column_or_supercolumn]=
 object(cassandra_Column)#159 (3) {
   [name]=
   string(5) test2
   [value]=
   string(19) Another test column
   [timestamp]=
   float(1271193181943.2)
 }
 [deletion]=
 NULL
   }
 }
   }
 }

 When I pass this into client-batch_mutate, nothing seems to happen.  Any
 ideas about what could be going on?  I have been able to insert data using
 cassandra-cli without issue.

 Lee Parker
 l...@spredfast.com

 [image: Spredfast]





Re: batch_mutate silently failing

2010-04-15 Thread Lee Parker
The entire thing was completely my own fault.  I was making an invalid
request and, somewhere in the code, I was catching the exception and not
handling it at all.  So it only appeared to be silent when in reality it was
throwing a nice descriptive exception.

Lee Parker
l...@spredfast.com

[image: Spredfast]
On Thu, Apr 15, 2010 at 12:28 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Could you create a ticket for us to return an error message in this
 situation?

 -Jonathan


 On Tue, Apr 13, 2010 at 4:24 PM, Lee Parker l...@socialagency.com wrote:

 nevermind.  I figured out what the problem was.  I was not putting the
 column inside a ColumnOrSuperColumn container.


 Lee Parker
 l...@spredfast.com

 [image: Spredfast]
 On Tue, Apr 13, 2010 at 4:19 PM, Lee Parker l...@socialagency.com wrote:

 I upgraded my dev environment to 0.6.0 today in expectation of upgrading
 our prod environment soon.  I am trying to rewrite some of our code to use
 batch_mutate with the Thrift PHP library directly.  I'm not getting any
 result back, not even an exception or failure message, but the data is never
 showing up in the single node cassandra setup.  Here is a dump of my
 mutation map:

 array(1) {
   [testkey]=
   array(1) {
 [StreamItems]=
 array(2) {
   [0]=
   object(cassandra_Mutation)#156 (2) {
 [column_or_supercolumn]=
 object(cassandra_Column)#157 (3) {
   [name]=
   string(4) test
   [value]=
   string(14) this is a test
   [timestamp]=
   float(1271193181943.1)
 }
 [deletion]=
 NULL
   }
   [1]=
   object(cassandra_Mutation)#158 (2) {
 [column_or_supercolumn]=
 object(cassandra_Column)#159 (3) {
   [name]=
   string(5) test2
   [value]=
   string(19) Another test column
   [timestamp]=
   float(1271193181943.2)
 }
 [deletion]=
 NULL
   }
 }
   }
 }

 When I pass this into client-batch_mutate, nothing seems to happen.  Any
 ideas about what could be going on?  I have been able to insert data using
 cassandra-cli without issue.

 Lee Parker
 l...@spredfast.com

 [image: Spredfast]






Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift

2010-04-15 Thread Jonathan Ellis
You're right, to get those numbers on debian something is very wrong.

Have you looked at
http://spyced.blogspot.com/2010/01/linux-performance-basics.html ?
What is the bottleneck on the linux machines?

With the kind of speed you are seeing I wouldn't be surprised if it is swapping.

-Jonathan

On Tue, Apr 13, 2010 at 11:38 PM, Heath Oderman he...@526valley.com wrote:
 Hi,
 I wrote a few days ago and got a few good suggestions.  I'm still seeing
 dramatic differences between Cassandra 0.5.0 on OSX vs. Debian Linux.
 I've tried on Debian with the Sun JRE and the Open JDK with nearly identical
 results. I've tried a mix of hardware.
 Attached are some graphs I've produced of my results which show that in OSX,
 Cassandra takes longer with a greater load but is wicked fast (expected).
 In the SunJDK or Open JDK on Debian I get amazingly consistent time taken to
 do the writes, regardless of the load and the times are always ridiculously
 high.  It's insanely slow.
 I genuinely believe that I must be doing something very wrong in my Debian
 setups, but they are all vanilla installs, both 64 bit and 32 bit machines,
 64bit and 32 bit installs.  Cassandra packs taken from
 http://www.apache.org/dist/cassandra/debian.
 I am using Thrift, and I'm using a c# client because that's how I intend to
 actually use Cassandra and it seems pretty sensible.
 An example of what I'm seeing is:
 5 Threads Each writing 100,000 Simple Entries
 OSX: 1 min 16 seconds ~ 6515 Entries / second
 Debian: 1 hour 15 seconds ~ 138 Records / second
 15 Threads Each writing 100,000 Simple Entries
 OSX: 2min 30 seconds seconds writing ~10,000 Entries / second
 Debian: 1 hour 1.5 minutes ~406 Entries / second
 20 Threads Each Writing 100,000 Simple Entries
 OSX: 3min 19 seconds ~ 10,050 Entries / second
 Debian: 1 hour 20 seconds ~ 492 Entries / second
 If anyone has any suggestions or pointers I'd be glad to hear them.
 Thanks,
 Stu
 Attached:
 1. CassLoadTesting.ods (all my results and graphs in OpenOffice format
 downloaded from Google Docs)
 2. OSX Records per Second - a graph of how many entries get written per
 second for 10,000  100,000 entries as thread count is increased in OSX.
 3. Open JDK Records per Second - the same graph but of Open JDK on Debian
 4. Open JDK Total Time By Thread - the total time taken from test start to
 finish (all threads completed) to write 10,000  100,000 entries as thread
 count is increased in Debian with Open JDK
 5. OSX Total time by Thread - same as 4, but for OSX.




Re: batch_mutate silently failing

2010-04-15 Thread Jonathan Ellis
Ah, I see.  Glad you resolved that. :)

On Thu, Apr 15, 2010 at 12:31 PM, Lee Parker l...@socialagency.com wrote:

 The entire thing was completely my own fault.  I was making an invalid
 request and, somewhere in the code, I was catching the exception and not
 handling it at all.  So it only appeared to be silent when in reality it was
 throwing a nice descriptive exception.


 Lee Parker
 l...@spredfast.com

 [image: Spredfast]
 On Thu, Apr 15, 2010 at 12:28 PM, Jonathan Ellis jbel...@gmail.comwrote:

 Could you create a ticket for us to return an error message in this
 situation?

 -Jonathan


 On Tue, Apr 13, 2010 at 4:24 PM, Lee Parker l...@socialagency.com wrote:

 nevermind.  I figured out what the problem was.  I was not putting the
 column inside a ColumnOrSuperColumn container.


 Lee Parker
 l...@spredfast.com

 [image: Spredfast]
 On Tue, Apr 13, 2010 at 4:19 PM, Lee Parker l...@socialagency.comwrote:

 I upgraded my dev environment to 0.6.0 today in expectation of upgrading
 our prod environment soon.  I am trying to rewrite some of our code to use
 batch_mutate with the Thrift PHP library directly.  I'm not getting any
 result back, not even an exception or failure message, but the data is 
 never
 showing up in the single node cassandra setup.  Here is a dump of my
 mutation map:

 array(1) {
   [testkey]=
   array(1) {
 [StreamItems]=
 array(2) {
   [0]=
   object(cassandra_Mutation)#156 (2) {
 [column_or_supercolumn]=
 object(cassandra_Column)#157 (3) {
   [name]=
   string(4) test
   [value]=
   string(14) this is a test
   [timestamp]=
   float(1271193181943.1)
 }
 [deletion]=
 NULL
   }
   [1]=
   object(cassandra_Mutation)#158 (2) {
 [column_or_supercolumn]=
 object(cassandra_Column)#159 (3) {
   [name]=
   string(5) test2
   [value]=
   string(19) Another test column
   [timestamp]=
   float(1271193181943.2)
 }
 [deletion]=
 NULL
   }
 }
   }
 }

 When I pass this into client-batch_mutate, nothing seems to happen.
  Any ideas about what could be going on?  I have been able to insert data
 using cassandra-cli without issue.

 Lee Parker
 l...@spredfast.com

 [image: Spredfast]







Re: server crash - how to invertigate

2010-04-15 Thread Jonathan Ellis
There's a few things it could be:

Out of memory: usually it can log the exception before dying but not
always.  there will be a java_$pid.hprof file with the heap dumped.

JVM crash: there will be hs_err$pid.log file

OS bug or hardware problem: sometimes your OS will log something

-Jonathan

On Wed, Apr 14, 2010 at 6:04 AM, Ran Tavory ran...@gmail.com wrote:
 I'm running a 0.6.0 cluster with four nodes and one of them just crashed.
 The logs all seem normal and I haven't seen anything special in the jmx
 counters before the crash.
 I have one client writing and reading using 10 threads and using 3 different
 column families: KvAds, KvImpressions and KvUsers
 the client had got a few UnavailableException, TimedOutException and
 TTransportException but was able to complete the read/write operation by
 failing over to another available host. I can't tell if the exceptions were
 from the crashed host or from other hosts in the ring.
 Any hints how to investigate this are greatly appreciated. So far I'm
 lost...
 Here's a snippet from the log just before it went down. It doesn't seem to
 have anything special in it, everything is INFO level.
 The only thing that seems a bit strange is that last message: Compacting [].
 This message usually comes with things inside the [], such as Compacting
 [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassdata/data/system/LocationInfo-1-Data.db'),...]
 but this time it was just empty.
 However, this is not the only place in the log were I see an empty
 Compacting []. There are other places and they didn't end up in a crash, so
 I don't know if it's related.
 here's the log:
  INFO [ROW-MUTATION-STAGE:6] 2010-04-14 05:55:07,014 ColumnFamilyStore.java
 (line 357) KvImpressions has reached its threshold; switching in a fresh
 Memtable at
 CommitLogContext(file='/outbrain/cassdata/commitlog/CommitLog-1271238432773.log',
 position=68606651)
  INFO [ROW-MUTATION-STAGE:6] 2010-04-14 05:55:07,015 ColumnFamilyStore.java
 (line 609) Enqueuing flush of Memtable(KvImpressions)@258729366
  INFO [FLUSH-WRITER-POOL:1] 2010-04-14 05:55:07,015 Memtable.java (line 148)
 Writing Memtable(KvImpressions)@258729366
  INFO [FLUSH-WRITER-POOL:1] 2010-04-14 05:55:10,130 Memtable.java (line 162)
 Completed flushing
 /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-24-Data.db
  INFO [COMMIT-LOG-WRITER] 2010-04-14 05:55:10,154 CommitLog.java (line 407)
 Discarding obsolete commit
 log:CommitLogSegment(/outbrain/cassdata/commitlog/CommitLog-1271238049425.log)
  INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,415
 SSTableDeletingReference.java (line 104) Deleted
 /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-16-Data.db
  INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,440
 SSTableDeletingReference.java (line 104) Deleted
 /outbrain/cassdata/data/outbrain_kvdb/KvAds-8-Data.db
  INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,454
 SSTableDeletingReference.java (line 104) Deleted
 /outbrain/cassdata/data/outbrain_kvdb/KvAds-10-Data.db
  INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,526
 SSTableDeletingReference.java (line 104) Deleted
 /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-5-Data.db
  INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,585
 SSTableDeletingReference.java (line 104) Deleted
 /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-11-Data.db
  INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,602
 SSTableDeletingReference.java (line 104) Deleted
 /outbrain/cassdata/data/outbrain_kvdb/KvAds-11-Data.db
  INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,614
 SSTableDeletingReference.java (line 104) Deleted
 /outbrain/cassdata/data/outbrain_kvdb/KvAds-9-Data.db
  INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,682
 SSTableDeletingReference.java (line 104) Deleted
 /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-21-Data.db
  INFO [COMMIT-LOG-WRITER] 2010-04-14 05:55:52,254 CommitLogSegment.java
 (line 50) Creating new commitlog segment
 /outbrain/cassdata/commitlog/CommitLog-1271238952254.log
  INFO [ROW-MUTATION-STAGE:16] 2010-04-14 05:56:25,347 ColumnFamilyStore.java
 (line 357) KvImpressions has reached its threshold; switching in a fresh
 Memtable at
 CommitLogContext(file='/outbrain/cassdata/commitlog/CommitLog-1271238952254.log',
 position=47568158)
  INFO [ROW-MUTATION-STAGE:16] 2010-04-14 05:56:25,348 ColumnFamilyStore.java
 (line 609) Enqueuing flush of Memtable(KvImpressions)@1955587316
  INFO [FLUSH-WRITER-POOL:1] 2010-04-14 05:56:25,348 Memtable.java (line 148)
 Writing Memtable(KvImpressions)@1955587316
  INFO [FLUSH-WRITER-POOL:1] 2010-04-14 05:56:30,572 Memtable.java (line 162)
 Completed flushing
 /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-25-Data.db
  INFO [COMMIT-LOG-WRITER] 2010-04-14 05:57:26,790 CommitLogSegment.java
 (line 50) Creating new commitlog segment
 /outbrain/cassdata/commitlog/CommitLog-1271239046790.log
  INFO [ROW-MUTATION-STAGE:7] 2010-04-14 05:57:59,513 ColumnFamilyStore.java
 (line 357) KvImpressions has reached its 

Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift

2010-04-15 Thread Heath Oderman
I upgraded to 0.6 yesterday and it's bang on the same.  I'll go read up on
py_stress and give it a try.

On Thu, Apr 15, 2010 at 1:57 PM, Jonathan Ellis jbel...@gmail.com wrote:

 What kind of numbers do you get from contrib/py_stress?

 (that's located somewhere else in 0.5, but you should really be using
 0.6 anyway.)

 On Thu, Apr 15, 2010 at 12:53 PM, Heath Oderman he...@526valley.com
 wrote:
  So checking it out quickly:
  vmstat -
  Never swaps.  si and so  stay at 0 during the load.
  iostat -x
  the %util never climbs above 0.00, but the avgrg-sz jumps bewteen samples
  from 0 - 30 - 90 - 0 (5 second intervals)
  top shows the cpu barely working and mem utilization is below 20%.
  Still slow.  :(
  Thanks for the suggestions.  In your article on your blog it'd be awesome
 to
  include some implications, like avgrg-sz over 250 may mean XXX  Even if
  it's utterly hardware and system dependent it'd give a guy like me an
 idea
  if what I was seeing was bad or good. :D
  Thanks again,
  Heath
 
  On Thu, Apr 15, 2010 at 1:34 PM, Heath Oderman he...@526valley.com
 wrote:
 
  Thanks Jonathan, I'll check this out right away.
 
  On Thu, Apr 15, 2010 at 1:32 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  You're right, to get those numbers on debian something is very wrong.
 
  Have you looked at
  http://spyced.blogspot.com/2010/01/linux-performance-basics.html ?
  What is the bottleneck on the linux machines?
 
  With the kind of speed you are seeing I wouldn't be surprised if it is
  swapping.
 
  -Jonathan
 
  On Tue, Apr 13, 2010 at 11:38 PM, Heath Oderman he...@526valley.com
  wrote:
   Hi,
   I wrote a few days ago and got a few good suggestions.  I'm still
   seeing
   dramatic differences between Cassandra 0.5.0 on OSX vs. Debian Linux.
   I've tried on Debian with the Sun JRE and the Open JDK with nearly
   identical
   results. I've tried a mix of hardware.
   Attached are some graphs I've produced of my results which show that
 in
   OSX,
   Cassandra takes longer with a greater load but is wicked fast
   (expected).
   In the SunJDK or Open JDK on Debian I get amazingly consistent time
   taken to
   do the writes, regardless of the load and the times are always
   ridiculously
   high.  It's insanely slow.
   I genuinely believe that I must be doing something very wrong in my
   Debian
   setups, but they are all vanilla installs, both 64 bit and 32 bit
   machines,
   64bit and 32 bit installs.  Cassandra packs taken from
   http://www.apache.org/dist/cassandra/debian.
   I am using Thrift, and I'm using a c# client because that's how I
   intend to
   actually use Cassandra and it seems pretty sensible.
   An example of what I'm seeing is:
   5 Threads Each writing 100,000 Simple Entries
   OSX: 1 min 16 seconds ~ 6515 Entries / second
   Debian: 1 hour 15 seconds ~ 138 Records / second
   15 Threads Each writing 100,000 Simple Entries
   OSX: 2min 30 seconds seconds writing ~10,000 Entries / second
   Debian: 1 hour 1.5 minutes ~406 Entries / second
   20 Threads Each Writing 100,000 Simple Entries
   OSX: 3min 19 seconds ~ 10,050 Entries / second
   Debian: 1 hour 20 seconds ~ 492 Entries / second
   If anyone has any suggestions or pointers I'd be glad to hear them.
   Thanks,
   Stu
   Attached:
   1. CassLoadTesting.ods (all my results and graphs in OpenOffice
 format
   downloaded from Google Docs)
   2. OSX Records per Second - a graph of how many entries get written
 per
   second for 10,000  100,000 entries as thread count is increased in
   OSX.
   3. Open JDK Records per Second - the same graph but of Open JDK on
   Debian
   4. Open JDK Total Time By Thread - the total time taken from test
 start
   to
   finish (all threads completed) to write 10,000  100,000 entries as
   thread
   count is increased in Debian with Open JDK
   5. OSX Total time by Thread - same as 4, but for OSX.
  
  
 
 
 



Re: Time-series data model

2010-04-15 Thread Dan Di Spaltro
This is actually fairly similar to how we store metrics at Cloudkick.
Below has a much more in depth explanation of some of that

https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/

So we store each natural point in the NumericArchive table.

ColumnFamily CompareWith=LongType
  Name=NumericArchive /

ColumnFamily CompareWith=LongType Name=Rollup5m
ColumnType=Super CompareSubcolumnsWith=BytesType /
ColumnFamily CompareWith=LongType Name=Rollup20m
ColumnType=Super CompareSubcolumnsWith=BytesType /
ColumnFamily CompareWith=LongType Name=Rollup30m
ColumnType=Super CompareSubcolumnsWith=BytesType /
ColumnFamily CompareWith=LongType Name=Rollup60m
ColumnType=Super CompareSubcolumnsWith=BytesType /
ColumnFamily CompareWith=LongType Name=Rollup4h
ColumnType=Super CompareSubcolumnsWith=BytesType /
ColumnFamily CompareWith=LongType Name=Rollup12h
ColumnType=Super CompareSubcolumnsWith=BytesType /
ColumnFamily CompareWith=LongType Name=Rollup1d
ColumnType=Super CompareSubcolumnsWith=BytesType /

our keys look like:
serviceuuid.metric-name

Anyways, this has been working out very well for us.

2010/4/15 Ted Zlatanov t...@lifelogs.com:
 On Thu, 15 Apr 2010 11:27:47 +0200 Jean-Pierre Bergamin ja...@ractive.ch 
 wrote:

 JB Am 14.04.2010 15:22, schrieb Ted Zlatanov:
 On Wed, 14 Apr 2010 15:02:29 +0200 Jean-Pierre Bergaminja...@ractive.ch 
  wrote:

 JB The metrics are stored together with a timestamp. The queries we want to
 JB perform are:
 JB * The last value of a specific metric of a device
 JB * The values of a specific metric of a device between two timestamps t1 
 and
 JB t2

 Make your key devicename-metricname-MMDD-HHMM (with whatever time
 sharding makes sense to you; I use UTC by-hours and by-day in my
 environment).  Then your supercolumn is the collection time as a
 LongType and your columns inside the supercolumn can express the metric
 in detail (collector agent, detailed breakdown, etc.).

 JB Just for my understanding. What is time sharding? I couldn't find an
 JB explanation somewhere. Do you mean that the time-series data is rolled
 JB up in 5 minues, 1 hour, 1 day etc. slices?

 Yes.  The usual meaning of shard in RDBMS world is to segment your
 database by some criteria, e.g. US vs. Europe in Amazon AWS because
 their data centers are laid out so.  I was taking a linguistic shortcut
 to mean break down your rows by some convenient criteria.  You can
 actually set up your Partitioner in Cassandra to literally shard your
 keyspace rows based on the key, but I just meant slice in my note.

 JB So this would be defined as:
 JB ColumnFamily Name=measurements ColumnType=Super
 JB CompareWith=UTF8Type  CompareSubcolumnsWith=LongType /

 JB So when i want to read all values of one metric between two timestamps
 JB t0 and t1, I'd have to read the supercolumns that match a key range
 JB (device1:metric1:t0 - device1:metric1:t1) and then all the
 JB supercolumns for this key?

 Yes.  This is a single multiget if you can construct the key range
 explicitly.  Cassandra loads a lot of this in memory already and filters
 it after the fact, that's why it pays to slice your keys and to stitch
 them together on the client side if you have to go across a time
 boundary.  You'll also get better key load balancing with deeper slicing
 if you use the randomizing partitioner.

 In the result set, you'll get each matching supercolumn with all the
 columns inside it.  You may have to page through supercolumns.

 Ted





-- 
Dan Di Spaltro


Re: framed transport

2010-04-15 Thread Nathan McCall
FWIW, We just exposed this as an option in hector.

-Nate

On Thu, Apr 15, 2010 at 8:38 AM, Miguel Verde miguelitov...@gmail.com wrote:
 On Thu, Apr 15, 2010 at 10:22 AM, Eric Evans eev...@rackspace.com wrote:

 But, if you've enabled framing on the server, you will not
 be able to use C# clients (last I checked, there was no framed transport
 for C#).


 There *are* many clients that don't have framed transports, but the C#
 client had it added in November:
 https://issues.apache.org/jira/browse/THRIFT-210


Re: BMT flush on windows?

2010-04-15 Thread Sonny Heer
Hmmm. Same code runs on ubuntu, and I'm able to flush using the nodetool.

What is the difference between inserting data using :
StorageProxy.mutateBlocking vs. sending oneway message using the
MessagingService?

On Thu, Apr 15, 2010 at 10:14 AM, Jonathan Ellis jbel...@gmail.com wrote:
 probably because there is nothing to flush.

 On Thu, Apr 15, 2010 at 11:53 AM, Sonny Heer sonnyh...@gmail.com wrote:
 From the jconsole, I go under
 ColumnFamilyStores-CF1-Column1-Operations and clicked force
 flush().

 I'm getting a Operation return value null OK message box.  what am I
 doing wrong?


 On Tue, Apr 13, 2010 at 3:12 PM, Jonathan Ellis jbel...@gmail.com wrote:
 you have three options

 (a) connect with jconsole or another jmx client and invoke flush that way
 (b) run org.apache.cassandra.tools.NodeCmd manually
 (b) write a bat file for NodeCmd like the nodetool shell script in bin/

 On Tue, Apr 13, 2010 at 5:08 PM, Sonny Heer sonnyh...@gmail.com wrote:
 Is there any way to run a keyspace flush on a windows box?






Re: timestamp not found

2010-04-15 Thread Lee Parker
I have done more error checking and I am relatively certain that I am
sending a valid timestamp to the thrift library.  I was testing a switch to
the Framed Transport instead of Buffered Transport and I am getting fewer
errors, but now the cassandra server dies when this happens.  It is starting
to feel like this is a bug in Thrift or the Cassandra Thrift interface.  Can
anyone offer any other insight?  I'm using the current stable release of
Thrift 0.2.0, and Cassandra 0.6.0.

It seems to happen more under heavy load. I don't know if that is meaningful
or not.

Lee Parker

On Thu, Apr 15, 2010 at 11:00 AM, Lee Parker l...@socialagency.com wrote:

 I'm actually using PHP.  I do have several php processes running, but each
 one should have it's own Thrift connection.


 Lee Parker
 l...@spredfast.com

 [image: Spredfast]
 On Thu, Apr 15, 2010 at 10:53 AM, Jonathan Ellis jbel...@gmail.comwrote:

 Looks like you are using C++ and not setting the isset flag on the
 timestamp field, so it's getting the default value for a Java long (0).

 If it works most of the time then possibly you are using a Thrift
 connection from multiple threads at the same time, which is not safe.


 On Thu, Apr 15, 2010 at 10:39 AM, Lee Parker l...@socialagency.comwrote:

 We are currently migrating about 70G of data from mysql to cassandra.  I
 am occasionally getting the following error:

 Required field 'timestamp' was not found in serialized data! Struct:
 Column(name:74 65 78 74, value:44 61 73 20 6C 69 65 62 20 69 63 68 20 76 6F
 6E 20 23 49 6E 61 3A 20 68 74 74 70 3A 2F 2F 77 77 77 2E 79 6F 75 74 75 62
 65 2E 63 6F 6D 2F 77 61 74 63 68 3F 76 3D 70 75 38 4B 54 77 79 64 56 77 6B
 26 66 65 61 74 75 72 65 3D 72 65 6C 61 74 65 64 20 40 70 6A 80 01 00 01 00,
 timestamp:0)

 The loop which is building out the mutation map for the batch_mutate call
 is adding a timestamp to each column.  I have verified that the time stamp
 is there for several calls and I feel like if the logic was bad, i would see
 the error more frequently.  Does anyone have suggestions as to what may be
 causing this?

 Lee Parker
 l...@spredfast.com

 [image: Spredfast]






json2sstable

2010-04-15 Thread Lee Parker
Has anyone used json2sstable to migrate a large amount of data into
cassandra?  What was your methodology?  I assume that this will be much
faster than stepping through my data and doing writes via PHP/Thrift.

Lee Parker


Re: framed transport

2010-04-15 Thread Lee Parker
It appears that after some testing, the buffered transport seems more
stable.  I am occasionally getting a missing timestamp error during
batch_mutate calls.  It happens both on framed and buffered transports, but
when it happens on a framed transport, the server crashes.  Is this typical?

Lee Parker
On Thu, Apr 15, 2010 at 1:12 PM, Nathan McCall n...@vervewireless.comwrote:

 FWIW, We just exposed this as an option in hector.

 -Nate

 On Thu, Apr 15, 2010 at 8:38 AM, Miguel Verde miguelitov...@gmail.com
 wrote:
  On Thu, Apr 15, 2010 at 10:22 AM, Eric Evans eev...@rackspace.com
 wrote:
 
  But, if you've enabled framing on the server, you will not
  be able to use C# clients (last I checked, there was no framed transport
  for C#).
 
 
  There *are* many clients that don't have framed transports, but the C#
  client had it added in November:
  https://issues.apache.org/jira/browse/THRIFT-210



Re: framed transport

2010-04-15 Thread Jonathan Ellis
Have you tried other client machines?

It sounds like your client is generating garbage, which is Bad.

https://issues.apache.org/jira/browse/THRIFT-601

On Thu, Apr 15, 2010 at 4:20 PM, Lee Parker l...@socialagency.com wrote:
 It appears that after some testing, the buffered transport seems more
 stable.  I am occasionally getting a missing timestamp error during
 batch_mutate calls.  It happens both on framed and buffered transports, but
 when it happens on a framed transport, the server crashes.  Is this typical?

 Lee Parker

 On Thu, Apr 15, 2010 at 1:12 PM, Nathan McCall n...@vervewireless.com
 wrote:

 FWIW, We just exposed this as an option in hector.

 -Nate

 On Thu, Apr 15, 2010 at 8:38 AM, Miguel Verde miguelitov...@gmail.com
 wrote:
  On Thu, Apr 15, 2010 at 10:22 AM, Eric Evans eev...@rackspace.com
  wrote:
 
  But, if you've enabled framing on the server, you will not
  be able to use C# clients (last I checked, there was no framed
  transport
  for C#).
 
 
  There *are* many clients that don't have framed transports, but the C#
  client had it added in November:
  https://issues.apache.org/jira/browse/THRIFT-210




Data model question - column names sort

2010-04-15 Thread Sonny Heer
Need a way to have two different types of indexes.

Key: aTextKey
ColumnName: aTextColumnName:55
Value: 

Key: aTextKey
ColumnName: 55:aTextColumnName
Value: 

All the valuable information is stored in the column name itself.
Above two can be in different column families...

Queries:
Given a key, page me a list of numerical values sorted on aTextColumnName
Given a key, page me a list of text values sorted on a numerical value

This approach would require left padding the numeric value for the
second index so cassandra can sort on column names correctly.

Is there any other way to accomplish this?


Clarification on Ring operations in Cassandra 0.5.1

2010-04-15 Thread Anthony Molinaro
Hi,

  I have a cluster running on ec2, and would like to do some ring
management.  Specifically, I'd like to replace an existing node
without another node (I want to change the instance type).

  I was looking over http://wiki.apache.org/cassandra/Operations
and it seems like I could do something like.

1) shutdown cassandra on instance I want to replace
2) create a new instance, start cassandra with AutoBootstrap = true
3) run nodeprobe removetoken against the token of the instance I am
   replacing

Then according to the 'Handling failure' the new instance will find the
appropriate position automatically.  However, it's not clear to me
if this means it will take the same range as the shutdown node or not,
because normally AutoBootstrap == true means it will take half the keys
from the node with the most disk space used. (from the 'Bootstrap' section).

So will the process I describe above result in what I want, a new node
replacing an old one?

Also, if the new instance takes over the range of the old instance how
does removetoken know which instance to remove, does it remove the Down
instance?

Another hopefully minor question, if I bring up a new node with
AutoBootstrap = false, what happens?
Does it join the ring but without data and without token range?
Can I then 'nodeprobe move token for range I want to take over', and
achieve the same as step 2 above?

Thanks,

-Anthony

-- 

Anthony Molinaro   antho...@alumni.caltech.edu


Re: Is it possible to get all records in a CF?

2010-04-15 Thread Gary Dusbabek
You'll have to scan the CF.  If you're using
OrderPreservingPartitioner please see 'get_range_slices'
(http://wiki.apache.org/cassandra/API).  It would help if you had an
idea of where the key might be, so you would know where to start
scanning.

Gary.

On Thu, Apr 15, 2010 at 21:01, Jared Laprise ja...@webonyx.com wrote:
 If you do not have the key for SuperColumn in a ColumnFamily is it not
 possible to browse all the data in the ColumnFamily? Thus far I’ve only been
 able to find a way to pull out data if I know the key.




Re: json2sstable

2010-04-15 Thread 孔令华
I tried that and found that it cannot handle large file at present.
But you can write a tool according to it.
eg: first sorting your data file according to it's hash key; second, write
to a SSTable directly

On Fri, Apr 16, 2010 at 4:47 AM, Lee Parker l...@socialagency.com wrote:

 Has anyone used json2sstable to migrate a large amount of data into
 cassandra?  What was your methodology?  I assume that this will be much
 faster than stepping through my data and doing writes via PHP/Thrift.

 Lee Parker



Re: json2sstable

2010-04-15 Thread Brandon Williams
On Thu, Apr 15, 2010 at 3:47 PM, Lee Parker l...@socialagency.com wrote:

 Has anyone used json2sstable to migrate a large amount of data into
 cassandra?  What was your methodology?  I assume that this will be much
 faster than stepping through my data and doing writes via PHP/Thrift.


If you're looking to do a bulk import, peek at contrib/bmt_example.

-Brandon