Re: SuperColumns
Yes a super column can only have columns in it. Regards, /VJ On Wed, Apr 14, 2010 at 10:28 PM, Christian Torres chtor...@gmail.comwrote: I'm defining a ColumnFamily (Table) type Super, It's posible to have a SuperColumn inside another SuperColumn or SuperColumns can only have normal columns? -- Christian Torres * Desarrollador Web * Guegue.com * Celular: +505 84 65 92 62 * Loving of the Programming
Row key: string or binary (byte[])?
Is there any effort ongoing to make the row key a binary (byte[]) instead of a string? In the current cassandra.thrift file (0.6.0), I find: const string VERSION = 2.1.0 [...] struct KeySlice { 1: required *string* key, 2: required listColumnOrSuperColumn columns, } while on the current (?) SVN https://svn.apache.org/repos/asf/cassandra/trunk/interface/cassandra.thriftit reads: const string VERSION = 4.0.0 [...] struct KeySlice { 1: required *binary* key, 2: required listColumnOrSuperColumn columns, } Thanks for enlightening me. :-) Greetings, Roland
AssertionError: DecoratedKey(...) != DecoratedKey(...)
When restarting one of the nodes in my cluster I found this error in the log. What does this mean? INFO [GC inspection] 2010-04-15 05:03:04,898 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 712 ms, 11149016 reclaimed leaving 442336680 used; max is 4432068608 ERROR [HINTED-HANDOFF-POOL:1] 2010-04-15 05:03:17,948 DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask java.util.concurrent.ExecutionException: java.lang.AssertionError: DecoratedKey(163143070370570938845670096830182058073, 1K2i35+B8RuuRDP7Gwz3Xw==) != DecoratedKey(163143368384879375649994309361429628039, 4k54mGvj7JoT5rBH68K+9A==) in /outbrain/cassandra/data/outbrain/DocumentMapping-305-Data.db at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.AssertionError: DecoratedKey(163143070370570938845670096830182058073, 1K2i35+B8RuuRDP7Gwz3Xw==) != DecoratedKey(163143368384879375649994309361429628039, 4k54mGvj7JoT5rBH68K+9A==) in /outbrain/cassandra/data/outbrain/DocumentMapping-305-Data.db at org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.init(SSTableSliceIterator.java:127) at org.apache.cassandra.db.filter.SSTableSliceIterator.init(SSTableSliceIterator.java:59) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:830) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:750) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:719) at org.apache.cassandra.db.HintedHandOffManager.sendMessage(HintedHandOffManager.java:122) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:250) at org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:80) at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:280) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... 2 more
Re: Time-series data model
Am 14.04.2010 15:22, schrieb Ted Zlatanov: On Wed, 14 Apr 2010 15:02:29 +0200 Jean-Pierre Bergaminja...@ractive.ch wrote: JB The metrics are stored together with a timestamp. The queries we want to JB perform are: JB * The last value of a specific metric of a device JB * The values of a specific metric of a device between two timestamps t1 and JB t2 Make your key devicename-metricname-MMDD-HHMM (with whatever time sharding makes sense to you; I use UTC by-hours and by-day in my environment). Then your supercolumn is the collection time as a LongType and your columns inside the supercolumn can express the metric in detail (collector agent, detailed breakdown, etc.). Just for my understanding. What is time sharding? I couldn't find an explanation somewhere. Do you mean that the time-series data is rolled up in 5 minues, 1 hour, 1 day etc. slices? So this would be defined as: ColumnFamily Name=measurements ColumnType=Super CompareWith=UTF8Type CompareSubcolumnsWith=LongType / So when i want to read all values of one metric between two timestamps t0 and t1, I'd have to read the supercolumns that match a key range (device1:metric1:t0 - device1:metric1:t1) and then all the supercolumns for this key? Regards James
Re: AssertionError: DecoratedKey(...) != DecoratedKey(...)
Ran, It looks like you're seeing https://issues.apache.org/jira/browse/CASSANDRA-866. It's fixed in 0.6.1. Gary On Thu, Apr 15, 2010 at 04:06, Ran Tavory ran...@gmail.com wrote: When restarting one of the nodes in my cluster I found this error in the log. What does this mean? INFO [GC inspection] 2010-04-15 05:03:04,898 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 712 ms, 11149016 reclaimed leaving 442336680 used; max is 4432068608 ERROR [HINTED-HANDOFF-POOL:1] 2010-04-15 05:03:17,948 DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask java.util.concurrent.ExecutionException: java.lang.AssertionError: DecoratedKey(163143070370570938845670096830182058073, 1K2i35+B8RuuRDP7Gwz3Xw==) != DecoratedKey(163143368384879375649994309361429628039, 4k54mGvj7JoT5rBH68K+9A==) in /outbrain/cassandra/data/outbrain/DocumentMapping-305-Data.db at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.AssertionError: DecoratedKey(163143070370570938845670096830182058073, 1K2i35+B8RuuRDP7Gwz3Xw==) != DecoratedKey(163143368384879375649994309361429628039, 4k54mGvj7JoT5rBH68K+9A==) in /outbrain/cassandra/data/outbrain/DocumentMapping-305-Data.db at org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.init(SSTableSliceIterator.java:127) at org.apache.cassandra.db.filter.SSTableSliceIterator.init(SSTableSliceIterator.java:59) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:830) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:750) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:719) at org.apache.cassandra.db.HintedHandOffManager.sendMessage(HintedHandOffManager.java:122) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:250) at org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:80) at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:280) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... 2 more
How to implement TOP TEN in Cassandra
Hi , all How to implement *TOP TEN* in Cassandra, For example , *Top ten stories in Digg.com* How to model. Thanks
Get super-columns using SimpleCassie
I'm using SimpleCassie like cassandra client. I have a question: can I get all super-columns that there in one column-family? If yes, how can i do it? Regards!
Re: TException: Error: TSocket: timed out reading 1024 bytes from 10.1.1.27:9160
sounds like https://issues.apache.org/jira/browse/THRIFT-347 On Wed, Apr 14, 2010 at 11:58 PM, richard yao richard.yao2...@gmail.com wrote: I am having a try on cassandra, and I use php to access cassandra by thrift API. I got an error like this: TException: Error: TSocket: timed out reading 1024 bytes from 10.1.1.27:9160 What's wrong? Thanks.
Re: TException: Error: TSocket: timed out reading 1024 bytes from 10.1.1.27:9160
Thank you!
Re: AssertionError: DecoratedKey(...) != DecoratedKey(...)
yes, this looks like the same issue, thanks Gary. Other than seeing the errors in the log I haven't seen any other irregularities. (maybe there are, but they haven't surfaced). Does this assertion mean data corruption or something else that's worth waiting to 0.6.1 for? On Thu, Apr 15, 2010 at 2:00 PM, Gary Dusbabek gdusba...@gmail.com wrote: Ran, It looks like you're seeing https://issues.apache.org/jira/browse/CASSANDRA-866. It's fixed in 0.6.1. Gary On Thu, Apr 15, 2010 at 04:06, Ran Tavory ran...@gmail.com wrote: When restarting one of the nodes in my cluster I found this error in the log. What does this mean? INFO [GC inspection] 2010-04-15 05:03:04,898 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 712 ms, 11149016 reclaimed leaving 442336680 used; max is 4432068608 ERROR [HINTED-HANDOFF-POOL:1] 2010-04-15 05:03:17,948 DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask java.util.concurrent.ExecutionException: java.lang.AssertionError: DecoratedKey(163143070370570938845670096830182058073, 1K2i35+B8RuuRDP7Gwz3Xw==) != DecoratedKey(163143368384879375649994309361429628039, 4k54mGvj7JoT5rBH68K+9A==) in /outbrain/cassandra/data/outbrain/DocumentMapping-305-Data.db at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.AssertionError: DecoratedKey(163143070370570938845670096830182058073, 1K2i35+B8RuuRDP7Gwz3Xw==) != DecoratedKey(163143368384879375649994309361429628039, 4k54mGvj7JoT5rBH68K+9A==) in /outbrain/cassandra/data/outbrain/DocumentMapping-305-Data.db at org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.init(SSTableSliceIterator.java:127) at org.apache.cassandra.db.filter.SSTableSliceIterator.init(SSTableSliceIterator.java:59) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:830) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:750) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:719) at org.apache.cassandra.db.HintedHandOffManager.sendMessage(HintedHandOffManager.java:122) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:250) at org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:80) at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:280) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... 2 more
Re: AssertionError: DecoratedKey(...) != DecoratedKey(...)
No data corruption. There was a bug in the way that the index was scanned that was manifesting itself when when the index got bigger than 2GB. Gary. On Thu, Apr 15, 2010 at 08:03, Ran Tavory ran...@gmail.com wrote: yes, this looks like the same issue, thanks Gary. Other than seeing the errors in the log I haven't seen any other irregularities. (maybe there are, but they haven't surfaced). Does this assertion mean data corruption or something else that's worth waiting to 0.6.1 for? On Thu, Apr 15, 2010 at 2:00 PM, Gary Dusbabek gdusba...@gmail.com wrote: Ran, It looks like you're seeing https://issues.apache.org/jira/browse/CASSANDRA-866. It's fixed in 0.6.1. Gary On Thu, Apr 15, 2010 at 04:06, Ran Tavory ran...@gmail.com wrote: When restarting one of the nodes in my cluster I found this error in the log. What does this mean? INFO [GC inspection] 2010-04-15 05:03:04,898 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 712 ms, 11149016 reclaimed leaving 442336680 used; max is 4432068608 ERROR [HINTED-HANDOFF-POOL:1] 2010-04-15 05:03:17,948 DebuggableThreadPoolExecutor.java (line 94) Error in executor futuretask java.util.concurrent.ExecutionException: java.lang.AssertionError: DecoratedKey(163143070370570938845670096830182058073, 1K2i35+B8RuuRDP7Gwz3Xw==) != DecoratedKey(163143368384879375649994309361429628039, 4k54mGvj7JoT5rBH68K+9A==) in /outbrain/cassandra/data/outbrain/DocumentMapping-305-Data.db at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.AssertionError: DecoratedKey(163143070370570938845670096830182058073, 1K2i35+B8RuuRDP7Gwz3Xw==) != DecoratedKey(163143368384879375649994309361429628039, 4k54mGvj7JoT5rBH68K+9A==) in /outbrain/cassandra/data/outbrain/DocumentMapping-305-Data.db at org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.init(SSTableSliceIterator.java:127) at org.apache.cassandra.db.filter.SSTableSliceIterator.init(SSTableSliceIterator.java:59) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:63) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:830) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:750) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:719) at org.apache.cassandra.db.HintedHandOffManager.sendMessage(HintedHandOffManager.java:122) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:250) at org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:80) at org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:280) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... 2 more
Re: timestamp not found
Looks like the timestamp, in this case, is 0. Does Cassandra allow zero timestamps? Could be a bug in Cassandra doing an implicit boolean coercion in a conditional where it shouldn't. Mike On Thu, Apr 15, 2010 at 8:39 AM, Lee Parker l...@socialagency.com wrote: We are currently migrating about 70G of data from mysql to cassandra. I am occasionally getting the following error: Required field 'timestamp' was not found in serialized data! Struct: Column(name:74 65 78 74, value:44 61 73 20 6C 69 65 62 20 69 63 68 20 76 6F 6E 20 23 49 6E 61 3A 20 68 74 74 70 3A 2F 2F 77 77 77 2E 79 6F 75 74 75 62 65 2E 63 6F 6D 2F 77 61 74 63 68 3F 76 3D 70 75 38 4B 54 77 79 64 56 77 6B 26 66 65 61 74 75 72 65 3D 72 65 6C 61 74 65 64 20 40 70 6A 80 01 00 01 00, timestamp:0) The loop which is building out the mutation map for the batch_mutate call is adding a timestamp to each column. I have verified that the time stamp is there for several calls and I feel like if the logic was bad, i would see the error more frequently. Does anyone have suggestions as to what may be causing this? Lee Parker l...@spredfast.com [image: Spredfast]
Re: timestamp not found
When I am verifying the columns in the mutation map before sending it to cassandra, none of the timestamps are 0. I have had a difficult time recreating the error in a controlled environment so I can see the mutation map that was actually sent. Lee Parker l...@spredfast.com [image: Spredfast] On Thu, Apr 15, 2010 at 10:45 AM, Mike Malone m...@simplegeo.com wrote: Looks like the timestamp, in this case, is 0. Does Cassandra allow zero timestamps? Could be a bug in Cassandra doing an implicit boolean coercion in a conditional where it shouldn't. Mike On Thu, Apr 15, 2010 at 8:39 AM, Lee Parker l...@socialagency.com wrote: We are currently migrating about 70G of data from mysql to cassandra. I am occasionally getting the following error: Required field 'timestamp' was not found in serialized data! Struct: Column(name:74 65 78 74, value:44 61 73 20 6C 69 65 62 20 69 63 68 20 76 6F 6E 20 23 49 6E 61 3A 20 68 74 74 70 3A 2F 2F 77 77 77 2E 79 6F 75 74 75 62 65 2E 63 6F 6D 2F 77 61 74 63 68 3F 76 3D 70 75 38 4B 54 77 79 64 56 77 6B 26 66 65 61 74 75 72 65 3D 72 65 6C 61 74 65 64 20 40 70 6A 80 01 00 01 00, timestamp:0) The loop which is building out the mutation map for the batch_mutate call is adding a timestamp to each column. I have verified that the time stamp is there for several calls and I feel like if the logic was bad, i would see the error more frequently. Does anyone have suggestions as to what may be causing this? Lee Parker l...@spredfast.com [image: Spredfast]
Re: timestamp not found
I'm actually using PHP. I do have several php processes running, but each one should have it's own Thrift connection. Lee Parker l...@spredfast.com [image: Spredfast] On Thu, Apr 15, 2010 at 10:53 AM, Jonathan Ellis jbel...@gmail.com wrote: Looks like you are using C++ and not setting the isset flag on the timestamp field, so it's getting the default value for a Java long (0). If it works most of the time then possibly you are using a Thrift connection from multiple threads at the same time, which is not safe. On Thu, Apr 15, 2010 at 10:39 AM, Lee Parker l...@socialagency.com wrote: We are currently migrating about 70G of data from mysql to cassandra. I am occasionally getting the following error: Required field 'timestamp' was not found in serialized data! Struct: Column(name:74 65 78 74, value:44 61 73 20 6C 69 65 62 20 69 63 68 20 76 6F 6E 20 23 49 6E 61 3A 20 68 74 74 70 3A 2F 2F 77 77 77 2E 79 6F 75 74 75 62 65 2E 63 6F 6D 2F 77 61 74 63 68 3F 76 3D 70 75 38 4B 54 77 79 64 56 77 6B 26 66 65 61 74 75 72 65 3D 72 65 6C 61 74 65 64 20 40 70 6A 80 01 00 01 00, timestamp:0) The loop which is building out the mutation map for the batch_mutate call is adding a timestamp to each column. I have verified that the time stamp is there for several calls and I feel like if the logic was bad, i would see the error more frequently. Does anyone have suggestions as to what may be causing this? Lee Parker l...@spredfast.com [image: Spredfast]
Re: RackAware and replication strategy
Have a look at locator/DatacenterShardStrategy.java. On Thu, Apr 15, 2010 at 8:16 AM, Ran Tavory ran...@gmail.com wrote: I'm reading this on this page http://wiki.apache.org/cassandra/ArchitectureInternals : AbstractReplicationStrategy controls what nodes get secondary, tertiary, etc. replicas of each key range. Primary replica is always determined by the token ring (in TokenMetadata) but you can do a lot of variation with the others. RackUnaware just puts replicas on the next N-1 nodes in the ring. RackAware puts the first non-primary replica in the next node in the ring in ANOTHER data center than the primary; then the remaining replicas in the same as the primary. So I just want to make sure I got this right and that documentation is up to date. I have two data centers and rack-aware. When replication factor is 2: is it always the case that the primary replica goes to one DC and the second replica to the second DC? When replication factor is 3: First replica in DC1, second in DC2 and third in DC1 When replication factor is 4: First replica in DC1, second in DC2, third in DC1, fourth in DC1 etc If I have 4 hosts in each DC, which replication factors make sense? N=1 - When I don't care about losing data, cool N=2 - When I want to make sure each DC has a copy; useful for local fast access and allows recovery if only one host down. N=3 - If I want to make sure each DC has a copy plus recovery can be made faster in certain cases, and more resilient to two hosts down. N=4 - Like N=3 but even more resilient. etc Say I want to have two replicas in each DC, can this be done?
busy thread on IncomingStreamReader ?
Hi all, We setup two nodes and simply set replication factor=2 for test run. After both nodes, say, node A and node B, serve several hours, we found that node A always keep 300% cpu usage. (the other node is under 100% cpu, which is normal) thread dump on node A shows that there are 3 busy threads related to IncomingStreamReader: == Thread-66 prio=10 tid=0x2aade4018800 nid=0x69e7 runnable [0x4030a000] java.lang.Thread.State: RUNNABLE at sun.misc.Unsafe.setMemory(Native Method) at sun.nio.ch.Util.erase(Util.java:202) at sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560) at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) Thread-65 prio=10 tid=0x2aade4017000 nid=0x69e6 runnable [0x4d44b000] java.lang.Thread.State: RUNNABLE at sun.misc.Unsafe.setMemory(Native Method) at sun.nio.ch.Util.erase(Util.java:202) at sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560) at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) Thread-62 prio=10 tid=0x2aade4014800 nid=0x4150 runnable [0x4d34a000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileChannelImpl.size0(Native Method) at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:309) - locked 0x2aaac450dcd0 (a java.lang.Object) at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:597) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66) === Is there anyone experience similar issue ? environments: OS --- CentOS 5.4, Linux 2.6.18-164.15.1.el5 SMP x86_64 GNU/Linux Java --- build 1.6.0_16-b01, Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) Cassandra --- 0.6.0 Node configuration --- node A and node B. both nodes use node A as Seed client --- Java thrift clients pick one node randomly to do read and write. -- Ingram Chen online share order: http://dinbendon.net blog: http://www.javaworld.com.tw/roller/page/ingramchen
Re: BMT flush on windows?
From the jconsole, I go under ColumnFamilyStores-CF1-Column1-Operations and clicked force flush(). I'm getting a Operation return value null OK message box. what am I doing wrong? On Tue, Apr 13, 2010 at 3:12 PM, Jonathan Ellis jbel...@gmail.com wrote: you have three options (a) connect with jconsole or another jmx client and invoke flush that way (b) run org.apache.cassandra.tools.NodeCmd manually (b) write a bat file for NodeCmd like the nodetool shell script in bin/ On Tue, Apr 13, 2010 at 5:08 PM, Sonny Heer sonnyh...@gmail.com wrote: Is there any way to run a keyspace flush on a windows box?
Re: Recovery from botched compaction
On Tue, Apr 13, 2010 at 3:59 PM, Anthony Molinaro antho...@alumni.caltech.edu wrote: I actually got lucky and while it hovered in the 91-95% full, compaction finished and its now at 60%. However, I still have around a dozen or so data files. I thought 'nodeprobe compact' did a major compaction, and that a major compaction would shrink to one file? 2 possibilities, probably both of which are affecting you: 1. If there isn't enough disk space to compact everything, cassandra will remove files from the to-compact list until it has room to do what you asked it to do. (But, you you can still run out of space if you write enough data while the compaction happens.) 2. 0.5's minor compactions don't combine as many sstables as they should automatically. This is fixed in 0.6 Okay, sounds good, I may leave it for the moment, as last time I tried any sort of move/decommision with 0.5.x I was unable to figure out if anything was happening, so I may just wait and revisit when I upgrade. Yes, 0.5 sucks there. 0.6 is still a little opaque but you can at least see what is happening if you know where to look: http://wiki.apache.org/cassandra/Streaming -Jonathan
Re: batch_mutate silently failing
Could you create a ticket for us to return an error message in this situation? -Jonathan On Tue, Apr 13, 2010 at 4:24 PM, Lee Parker l...@socialagency.com wrote: nevermind. I figured out what the problem was. I was not putting the column inside a ColumnOrSuperColumn container. Lee Parker l...@spredfast.com [image: Spredfast] On Tue, Apr 13, 2010 at 4:19 PM, Lee Parker l...@socialagency.com wrote: I upgraded my dev environment to 0.6.0 today in expectation of upgrading our prod environment soon. I am trying to rewrite some of our code to use batch_mutate with the Thrift PHP library directly. I'm not getting any result back, not even an exception or failure message, but the data is never showing up in the single node cassandra setup. Here is a dump of my mutation map: array(1) { [testkey]= array(1) { [StreamItems]= array(2) { [0]= object(cassandra_Mutation)#156 (2) { [column_or_supercolumn]= object(cassandra_Column)#157 (3) { [name]= string(4) test [value]= string(14) this is a test [timestamp]= float(1271193181943.1) } [deletion]= NULL } [1]= object(cassandra_Mutation)#158 (2) { [column_or_supercolumn]= object(cassandra_Column)#159 (3) { [name]= string(5) test2 [value]= string(19) Another test column [timestamp]= float(1271193181943.2) } [deletion]= NULL } } } } When I pass this into client-batch_mutate, nothing seems to happen. Any ideas about what could be going on? I have been able to insert data using cassandra-cli without issue. Lee Parker l...@spredfast.com [image: Spredfast]
Re: batch_mutate silently failing
The entire thing was completely my own fault. I was making an invalid request and, somewhere in the code, I was catching the exception and not handling it at all. So it only appeared to be silent when in reality it was throwing a nice descriptive exception. Lee Parker l...@spredfast.com [image: Spredfast] On Thu, Apr 15, 2010 at 12:28 PM, Jonathan Ellis jbel...@gmail.com wrote: Could you create a ticket for us to return an error message in this situation? -Jonathan On Tue, Apr 13, 2010 at 4:24 PM, Lee Parker l...@socialagency.com wrote: nevermind. I figured out what the problem was. I was not putting the column inside a ColumnOrSuperColumn container. Lee Parker l...@spredfast.com [image: Spredfast] On Tue, Apr 13, 2010 at 4:19 PM, Lee Parker l...@socialagency.com wrote: I upgraded my dev environment to 0.6.0 today in expectation of upgrading our prod environment soon. I am trying to rewrite some of our code to use batch_mutate with the Thrift PHP library directly. I'm not getting any result back, not even an exception or failure message, but the data is never showing up in the single node cassandra setup. Here is a dump of my mutation map: array(1) { [testkey]= array(1) { [StreamItems]= array(2) { [0]= object(cassandra_Mutation)#156 (2) { [column_or_supercolumn]= object(cassandra_Column)#157 (3) { [name]= string(4) test [value]= string(14) this is a test [timestamp]= float(1271193181943.1) } [deletion]= NULL } [1]= object(cassandra_Mutation)#158 (2) { [column_or_supercolumn]= object(cassandra_Column)#159 (3) { [name]= string(5) test2 [value]= string(19) Another test column [timestamp]= float(1271193181943.2) } [deletion]= NULL } } } } When I pass this into client-batch_mutate, nothing seems to happen. Any ideas about what could be going on? I have been able to insert data using cassandra-cli without issue. Lee Parker l...@spredfast.com [image: Spredfast]
Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift
You're right, to get those numbers on debian something is very wrong. Have you looked at http://spyced.blogspot.com/2010/01/linux-performance-basics.html ? What is the bottleneck on the linux machines? With the kind of speed you are seeing I wouldn't be surprised if it is swapping. -Jonathan On Tue, Apr 13, 2010 at 11:38 PM, Heath Oderman he...@526valley.com wrote: Hi, I wrote a few days ago and got a few good suggestions. I'm still seeing dramatic differences between Cassandra 0.5.0 on OSX vs. Debian Linux. I've tried on Debian with the Sun JRE and the Open JDK with nearly identical results. I've tried a mix of hardware. Attached are some graphs I've produced of my results which show that in OSX, Cassandra takes longer with a greater load but is wicked fast (expected). In the SunJDK or Open JDK on Debian I get amazingly consistent time taken to do the writes, regardless of the load and the times are always ridiculously high. It's insanely slow. I genuinely believe that I must be doing something very wrong in my Debian setups, but they are all vanilla installs, both 64 bit and 32 bit machines, 64bit and 32 bit installs. Cassandra packs taken from http://www.apache.org/dist/cassandra/debian. I am using Thrift, and I'm using a c# client because that's how I intend to actually use Cassandra and it seems pretty sensible. An example of what I'm seeing is: 5 Threads Each writing 100,000 Simple Entries OSX: 1 min 16 seconds ~ 6515 Entries / second Debian: 1 hour 15 seconds ~ 138 Records / second 15 Threads Each writing 100,000 Simple Entries OSX: 2min 30 seconds seconds writing ~10,000 Entries / second Debian: 1 hour 1.5 minutes ~406 Entries / second 20 Threads Each Writing 100,000 Simple Entries OSX: 3min 19 seconds ~ 10,050 Entries / second Debian: 1 hour 20 seconds ~ 492 Entries / second If anyone has any suggestions or pointers I'd be glad to hear them. Thanks, Stu Attached: 1. CassLoadTesting.ods (all my results and graphs in OpenOffice format downloaded from Google Docs) 2. OSX Records per Second - a graph of how many entries get written per second for 10,000 100,000 entries as thread count is increased in OSX. 3. Open JDK Records per Second - the same graph but of Open JDK on Debian 4. Open JDK Total Time By Thread - the total time taken from test start to finish (all threads completed) to write 10,000 100,000 entries as thread count is increased in Debian with Open JDK 5. OSX Total time by Thread - same as 4, but for OSX.
Re: batch_mutate silently failing
Ah, I see. Glad you resolved that. :) On Thu, Apr 15, 2010 at 12:31 PM, Lee Parker l...@socialagency.com wrote: The entire thing was completely my own fault. I was making an invalid request and, somewhere in the code, I was catching the exception and not handling it at all. So it only appeared to be silent when in reality it was throwing a nice descriptive exception. Lee Parker l...@spredfast.com [image: Spredfast] On Thu, Apr 15, 2010 at 12:28 PM, Jonathan Ellis jbel...@gmail.comwrote: Could you create a ticket for us to return an error message in this situation? -Jonathan On Tue, Apr 13, 2010 at 4:24 PM, Lee Parker l...@socialagency.com wrote: nevermind. I figured out what the problem was. I was not putting the column inside a ColumnOrSuperColumn container. Lee Parker l...@spredfast.com [image: Spredfast] On Tue, Apr 13, 2010 at 4:19 PM, Lee Parker l...@socialagency.comwrote: I upgraded my dev environment to 0.6.0 today in expectation of upgrading our prod environment soon. I am trying to rewrite some of our code to use batch_mutate with the Thrift PHP library directly. I'm not getting any result back, not even an exception or failure message, but the data is never showing up in the single node cassandra setup. Here is a dump of my mutation map: array(1) { [testkey]= array(1) { [StreamItems]= array(2) { [0]= object(cassandra_Mutation)#156 (2) { [column_or_supercolumn]= object(cassandra_Column)#157 (3) { [name]= string(4) test [value]= string(14) this is a test [timestamp]= float(1271193181943.1) } [deletion]= NULL } [1]= object(cassandra_Mutation)#158 (2) { [column_or_supercolumn]= object(cassandra_Column)#159 (3) { [name]= string(5) test2 [value]= string(19) Another test column [timestamp]= float(1271193181943.2) } [deletion]= NULL } } } } When I pass this into client-batch_mutate, nothing seems to happen. Any ideas about what could be going on? I have been able to insert data using cassandra-cli without issue. Lee Parker l...@spredfast.com [image: Spredfast]
Re: server crash - how to invertigate
There's a few things it could be: Out of memory: usually it can log the exception before dying but not always. there will be a java_$pid.hprof file with the heap dumped. JVM crash: there will be hs_err$pid.log file OS bug or hardware problem: sometimes your OS will log something -Jonathan On Wed, Apr 14, 2010 at 6:04 AM, Ran Tavory ran...@gmail.com wrote: I'm running a 0.6.0 cluster with four nodes and one of them just crashed. The logs all seem normal and I haven't seen anything special in the jmx counters before the crash. I have one client writing and reading using 10 threads and using 3 different column families: KvAds, KvImpressions and KvUsers the client had got a few UnavailableException, TimedOutException and TTransportException but was able to complete the read/write operation by failing over to another available host. I can't tell if the exceptions were from the crashed host or from other hosts in the ring. Any hints how to investigate this are greatly appreciated. So far I'm lost... Here's a snippet from the log just before it went down. It doesn't seem to have anything special in it, everything is INFO level. The only thing that seems a bit strange is that last message: Compacting []. This message usually comes with things inside the [], such as Compacting [org.apache.cassandra.io.SSTableReader(path='/outbrain/cassdata/data/system/LocationInfo-1-Data.db'),...] but this time it was just empty. However, this is not the only place in the log were I see an empty Compacting []. There are other places and they didn't end up in a crash, so I don't know if it's related. here's the log: INFO [ROW-MUTATION-STAGE:6] 2010-04-14 05:55:07,014 ColumnFamilyStore.java (line 357) KvImpressions has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/outbrain/cassdata/commitlog/CommitLog-1271238432773.log', position=68606651) INFO [ROW-MUTATION-STAGE:6] 2010-04-14 05:55:07,015 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(KvImpressions)@258729366 INFO [FLUSH-WRITER-POOL:1] 2010-04-14 05:55:07,015 Memtable.java (line 148) Writing Memtable(KvImpressions)@258729366 INFO [FLUSH-WRITER-POOL:1] 2010-04-14 05:55:10,130 Memtable.java (line 162) Completed flushing /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-24-Data.db INFO [COMMIT-LOG-WRITER] 2010-04-14 05:55:10,154 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/outbrain/cassdata/commitlog/CommitLog-1271238049425.log) INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,415 SSTableDeletingReference.java (line 104) Deleted /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-16-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,440 SSTableDeletingReference.java (line 104) Deleted /outbrain/cassdata/data/outbrain_kvdb/KvAds-8-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,454 SSTableDeletingReference.java (line 104) Deleted /outbrain/cassdata/data/outbrain_kvdb/KvAds-10-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,526 SSTableDeletingReference.java (line 104) Deleted /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-5-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,585 SSTableDeletingReference.java (line 104) Deleted /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-11-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,602 SSTableDeletingReference.java (line 104) Deleted /outbrain/cassdata/data/outbrain_kvdb/KvAds-11-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,614 SSTableDeletingReference.java (line 104) Deleted /outbrain/cassdata/data/outbrain_kvdb/KvAds-9-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2010-04-14 05:55:28,682 SSTableDeletingReference.java (line 104) Deleted /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-21-Data.db INFO [COMMIT-LOG-WRITER] 2010-04-14 05:55:52,254 CommitLogSegment.java (line 50) Creating new commitlog segment /outbrain/cassdata/commitlog/CommitLog-1271238952254.log INFO [ROW-MUTATION-STAGE:16] 2010-04-14 05:56:25,347 ColumnFamilyStore.java (line 357) KvImpressions has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/outbrain/cassdata/commitlog/CommitLog-1271238952254.log', position=47568158) INFO [ROW-MUTATION-STAGE:16] 2010-04-14 05:56:25,348 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(KvImpressions)@1955587316 INFO [FLUSH-WRITER-POOL:1] 2010-04-14 05:56:25,348 Memtable.java (line 148) Writing Memtable(KvImpressions)@1955587316 INFO [FLUSH-WRITER-POOL:1] 2010-04-14 05:56:30,572 Memtable.java (line 162) Completed flushing /outbrain/cassdata/data/outbrain_kvdb/KvImpressions-25-Data.db INFO [COMMIT-LOG-WRITER] 2010-04-14 05:57:26,790 CommitLogSegment.java (line 50) Creating new commitlog segment /outbrain/cassdata/commitlog/CommitLog-1271239046790.log INFO [ROW-MUTATION-STAGE:7] 2010-04-14 05:57:59,513 ColumnFamilyStore.java (line 357) KvImpressions has reached its
Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift
I upgraded to 0.6 yesterday and it's bang on the same. I'll go read up on py_stress and give it a try. On Thu, Apr 15, 2010 at 1:57 PM, Jonathan Ellis jbel...@gmail.com wrote: What kind of numbers do you get from contrib/py_stress? (that's located somewhere else in 0.5, but you should really be using 0.6 anyway.) On Thu, Apr 15, 2010 at 12:53 PM, Heath Oderman he...@526valley.com wrote: So checking it out quickly: vmstat - Never swaps. si and so stay at 0 during the load. iostat -x the %util never climbs above 0.00, but the avgrg-sz jumps bewteen samples from 0 - 30 - 90 - 0 (5 second intervals) top shows the cpu barely working and mem utilization is below 20%. Still slow. :( Thanks for the suggestions. In your article on your blog it'd be awesome to include some implications, like avgrg-sz over 250 may mean XXX Even if it's utterly hardware and system dependent it'd give a guy like me an idea if what I was seeing was bad or good. :D Thanks again, Heath On Thu, Apr 15, 2010 at 1:34 PM, Heath Oderman he...@526valley.com wrote: Thanks Jonathan, I'll check this out right away. On Thu, Apr 15, 2010 at 1:32 PM, Jonathan Ellis jbel...@gmail.com wrote: You're right, to get those numbers on debian something is very wrong. Have you looked at http://spyced.blogspot.com/2010/01/linux-performance-basics.html ? What is the bottleneck on the linux machines? With the kind of speed you are seeing I wouldn't be surprised if it is swapping. -Jonathan On Tue, Apr 13, 2010 at 11:38 PM, Heath Oderman he...@526valley.com wrote: Hi, I wrote a few days ago and got a few good suggestions. I'm still seeing dramatic differences between Cassandra 0.5.0 on OSX vs. Debian Linux. I've tried on Debian with the Sun JRE and the Open JDK with nearly identical results. I've tried a mix of hardware. Attached are some graphs I've produced of my results which show that in OSX, Cassandra takes longer with a greater load but is wicked fast (expected). In the SunJDK or Open JDK on Debian I get amazingly consistent time taken to do the writes, regardless of the load and the times are always ridiculously high. It's insanely slow. I genuinely believe that I must be doing something very wrong in my Debian setups, but they are all vanilla installs, both 64 bit and 32 bit machines, 64bit and 32 bit installs. Cassandra packs taken from http://www.apache.org/dist/cassandra/debian. I am using Thrift, and I'm using a c# client because that's how I intend to actually use Cassandra and it seems pretty sensible. An example of what I'm seeing is: 5 Threads Each writing 100,000 Simple Entries OSX: 1 min 16 seconds ~ 6515 Entries / second Debian: 1 hour 15 seconds ~ 138 Records / second 15 Threads Each writing 100,000 Simple Entries OSX: 2min 30 seconds seconds writing ~10,000 Entries / second Debian: 1 hour 1.5 minutes ~406 Entries / second 20 Threads Each Writing 100,000 Simple Entries OSX: 3min 19 seconds ~ 10,050 Entries / second Debian: 1 hour 20 seconds ~ 492 Entries / second If anyone has any suggestions or pointers I'd be glad to hear them. Thanks, Stu Attached: 1. CassLoadTesting.ods (all my results and graphs in OpenOffice format downloaded from Google Docs) 2. OSX Records per Second - a graph of how many entries get written per second for 10,000 100,000 entries as thread count is increased in OSX. 3. Open JDK Records per Second - the same graph but of Open JDK on Debian 4. Open JDK Total Time By Thread - the total time taken from test start to finish (all threads completed) to write 10,000 100,000 entries as thread count is increased in Debian with Open JDK 5. OSX Total time by Thread - same as 4, but for OSX.
Re: Time-series data model
This is actually fairly similar to how we store metrics at Cloudkick. Below has a much more in depth explanation of some of that https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/ So we store each natural point in the NumericArchive table. ColumnFamily CompareWith=LongType Name=NumericArchive / ColumnFamily CompareWith=LongType Name=Rollup5m ColumnType=Super CompareSubcolumnsWith=BytesType / ColumnFamily CompareWith=LongType Name=Rollup20m ColumnType=Super CompareSubcolumnsWith=BytesType / ColumnFamily CompareWith=LongType Name=Rollup30m ColumnType=Super CompareSubcolumnsWith=BytesType / ColumnFamily CompareWith=LongType Name=Rollup60m ColumnType=Super CompareSubcolumnsWith=BytesType / ColumnFamily CompareWith=LongType Name=Rollup4h ColumnType=Super CompareSubcolumnsWith=BytesType / ColumnFamily CompareWith=LongType Name=Rollup12h ColumnType=Super CompareSubcolumnsWith=BytesType / ColumnFamily CompareWith=LongType Name=Rollup1d ColumnType=Super CompareSubcolumnsWith=BytesType / our keys look like: serviceuuid.metric-name Anyways, this has been working out very well for us. 2010/4/15 Ted Zlatanov t...@lifelogs.com: On Thu, 15 Apr 2010 11:27:47 +0200 Jean-Pierre Bergamin ja...@ractive.ch wrote: JB Am 14.04.2010 15:22, schrieb Ted Zlatanov: On Wed, 14 Apr 2010 15:02:29 +0200 Jean-Pierre Bergaminja...@ractive.ch wrote: JB The metrics are stored together with a timestamp. The queries we want to JB perform are: JB * The last value of a specific metric of a device JB * The values of a specific metric of a device between two timestamps t1 and JB t2 Make your key devicename-metricname-MMDD-HHMM (with whatever time sharding makes sense to you; I use UTC by-hours and by-day in my environment). Then your supercolumn is the collection time as a LongType and your columns inside the supercolumn can express the metric in detail (collector agent, detailed breakdown, etc.). JB Just for my understanding. What is time sharding? I couldn't find an JB explanation somewhere. Do you mean that the time-series data is rolled JB up in 5 minues, 1 hour, 1 day etc. slices? Yes. The usual meaning of shard in RDBMS world is to segment your database by some criteria, e.g. US vs. Europe in Amazon AWS because their data centers are laid out so. I was taking a linguistic shortcut to mean break down your rows by some convenient criteria. You can actually set up your Partitioner in Cassandra to literally shard your keyspace rows based on the key, but I just meant slice in my note. JB So this would be defined as: JB ColumnFamily Name=measurements ColumnType=Super JB CompareWith=UTF8Type CompareSubcolumnsWith=LongType / JB So when i want to read all values of one metric between two timestamps JB t0 and t1, I'd have to read the supercolumns that match a key range JB (device1:metric1:t0 - device1:metric1:t1) and then all the JB supercolumns for this key? Yes. This is a single multiget if you can construct the key range explicitly. Cassandra loads a lot of this in memory already and filters it after the fact, that's why it pays to slice your keys and to stitch them together on the client side if you have to go across a time boundary. You'll also get better key load balancing with deeper slicing if you use the randomizing partitioner. In the result set, you'll get each matching supercolumn with all the columns inside it. You may have to page through supercolumns. Ted -- Dan Di Spaltro
Re: framed transport
FWIW, We just exposed this as an option in hector. -Nate On Thu, Apr 15, 2010 at 8:38 AM, Miguel Verde miguelitov...@gmail.com wrote: On Thu, Apr 15, 2010 at 10:22 AM, Eric Evans eev...@rackspace.com wrote: But, if you've enabled framing on the server, you will not be able to use C# clients (last I checked, there was no framed transport for C#). There *are* many clients that don't have framed transports, but the C# client had it added in November: https://issues.apache.org/jira/browse/THRIFT-210
Re: BMT flush on windows?
Hmmm. Same code runs on ubuntu, and I'm able to flush using the nodetool. What is the difference between inserting data using : StorageProxy.mutateBlocking vs. sending oneway message using the MessagingService? On Thu, Apr 15, 2010 at 10:14 AM, Jonathan Ellis jbel...@gmail.com wrote: probably because there is nothing to flush. On Thu, Apr 15, 2010 at 11:53 AM, Sonny Heer sonnyh...@gmail.com wrote: From the jconsole, I go under ColumnFamilyStores-CF1-Column1-Operations and clicked force flush(). I'm getting a Operation return value null OK message box. what am I doing wrong? On Tue, Apr 13, 2010 at 3:12 PM, Jonathan Ellis jbel...@gmail.com wrote: you have three options (a) connect with jconsole or another jmx client and invoke flush that way (b) run org.apache.cassandra.tools.NodeCmd manually (b) write a bat file for NodeCmd like the nodetool shell script in bin/ On Tue, Apr 13, 2010 at 5:08 PM, Sonny Heer sonnyh...@gmail.com wrote: Is there any way to run a keyspace flush on a windows box?
Re: timestamp not found
I have done more error checking and I am relatively certain that I am sending a valid timestamp to the thrift library. I was testing a switch to the Framed Transport instead of Buffered Transport and I am getting fewer errors, but now the cassandra server dies when this happens. It is starting to feel like this is a bug in Thrift or the Cassandra Thrift interface. Can anyone offer any other insight? I'm using the current stable release of Thrift 0.2.0, and Cassandra 0.6.0. It seems to happen more under heavy load. I don't know if that is meaningful or not. Lee Parker On Thu, Apr 15, 2010 at 11:00 AM, Lee Parker l...@socialagency.com wrote: I'm actually using PHP. I do have several php processes running, but each one should have it's own Thrift connection. Lee Parker l...@spredfast.com [image: Spredfast] On Thu, Apr 15, 2010 at 10:53 AM, Jonathan Ellis jbel...@gmail.comwrote: Looks like you are using C++ and not setting the isset flag on the timestamp field, so it's getting the default value for a Java long (0). If it works most of the time then possibly you are using a Thrift connection from multiple threads at the same time, which is not safe. On Thu, Apr 15, 2010 at 10:39 AM, Lee Parker l...@socialagency.comwrote: We are currently migrating about 70G of data from mysql to cassandra. I am occasionally getting the following error: Required field 'timestamp' was not found in serialized data! Struct: Column(name:74 65 78 74, value:44 61 73 20 6C 69 65 62 20 69 63 68 20 76 6F 6E 20 23 49 6E 61 3A 20 68 74 74 70 3A 2F 2F 77 77 77 2E 79 6F 75 74 75 62 65 2E 63 6F 6D 2F 77 61 74 63 68 3F 76 3D 70 75 38 4B 54 77 79 64 56 77 6B 26 66 65 61 74 75 72 65 3D 72 65 6C 61 74 65 64 20 40 70 6A 80 01 00 01 00, timestamp:0) The loop which is building out the mutation map for the batch_mutate call is adding a timestamp to each column. I have verified that the time stamp is there for several calls and I feel like if the logic was bad, i would see the error more frequently. Does anyone have suggestions as to what may be causing this? Lee Parker l...@spredfast.com [image: Spredfast]
json2sstable
Has anyone used json2sstable to migrate a large amount of data into cassandra? What was your methodology? I assume that this will be much faster than stepping through my data and doing writes via PHP/Thrift. Lee Parker
Re: framed transport
It appears that after some testing, the buffered transport seems more stable. I am occasionally getting a missing timestamp error during batch_mutate calls. It happens both on framed and buffered transports, but when it happens on a framed transport, the server crashes. Is this typical? Lee Parker On Thu, Apr 15, 2010 at 1:12 PM, Nathan McCall n...@vervewireless.comwrote: FWIW, We just exposed this as an option in hector. -Nate On Thu, Apr 15, 2010 at 8:38 AM, Miguel Verde miguelitov...@gmail.com wrote: On Thu, Apr 15, 2010 at 10:22 AM, Eric Evans eev...@rackspace.com wrote: But, if you've enabled framing on the server, you will not be able to use C# clients (last I checked, there was no framed transport for C#). There *are* many clients that don't have framed transports, but the C# client had it added in November: https://issues.apache.org/jira/browse/THRIFT-210
Re: framed transport
Have you tried other client machines? It sounds like your client is generating garbage, which is Bad. https://issues.apache.org/jira/browse/THRIFT-601 On Thu, Apr 15, 2010 at 4:20 PM, Lee Parker l...@socialagency.com wrote: It appears that after some testing, the buffered transport seems more stable. I am occasionally getting a missing timestamp error during batch_mutate calls. It happens both on framed and buffered transports, but when it happens on a framed transport, the server crashes. Is this typical? Lee Parker On Thu, Apr 15, 2010 at 1:12 PM, Nathan McCall n...@vervewireless.com wrote: FWIW, We just exposed this as an option in hector. -Nate On Thu, Apr 15, 2010 at 8:38 AM, Miguel Verde miguelitov...@gmail.com wrote: On Thu, Apr 15, 2010 at 10:22 AM, Eric Evans eev...@rackspace.com wrote: But, if you've enabled framing on the server, you will not be able to use C# clients (last I checked, there was no framed transport for C#). There *are* many clients that don't have framed transports, but the C# client had it added in November: https://issues.apache.org/jira/browse/THRIFT-210
Data model question - column names sort
Need a way to have two different types of indexes. Key: aTextKey ColumnName: aTextColumnName:55 Value: Key: aTextKey ColumnName: 55:aTextColumnName Value: All the valuable information is stored in the column name itself. Above two can be in different column families... Queries: Given a key, page me a list of numerical values sorted on aTextColumnName Given a key, page me a list of text values sorted on a numerical value This approach would require left padding the numeric value for the second index so cassandra can sort on column names correctly. Is there any other way to accomplish this?
Clarification on Ring operations in Cassandra 0.5.1
Hi, I have a cluster running on ec2, and would like to do some ring management. Specifically, I'd like to replace an existing node without another node (I want to change the instance type). I was looking over http://wiki.apache.org/cassandra/Operations and it seems like I could do something like. 1) shutdown cassandra on instance I want to replace 2) create a new instance, start cassandra with AutoBootstrap = true 3) run nodeprobe removetoken against the token of the instance I am replacing Then according to the 'Handling failure' the new instance will find the appropriate position automatically. However, it's not clear to me if this means it will take the same range as the shutdown node or not, because normally AutoBootstrap == true means it will take half the keys from the node with the most disk space used. (from the 'Bootstrap' section). So will the process I describe above result in what I want, a new node replacing an old one? Also, if the new instance takes over the range of the old instance how does removetoken know which instance to remove, does it remove the Down instance? Another hopefully minor question, if I bring up a new node with AutoBootstrap = false, what happens? Does it join the ring but without data and without token range? Can I then 'nodeprobe move token for range I want to take over', and achieve the same as step 2 above? Thanks, -Anthony -- Anthony Molinaro antho...@alumni.caltech.edu
Re: Is it possible to get all records in a CF?
You'll have to scan the CF. If you're using OrderPreservingPartitioner please see 'get_range_slices' (http://wiki.apache.org/cassandra/API). It would help if you had an idea of where the key might be, so you would know where to start scanning. Gary. On Thu, Apr 15, 2010 at 21:01, Jared Laprise ja...@webonyx.com wrote: If you do not have the key for SuperColumn in a ColumnFamily is it not possible to browse all the data in the ColumnFamily? Thus far I’ve only been able to find a way to pull out data if I know the key.
Re: json2sstable
I tried that and found that it cannot handle large file at present. But you can write a tool according to it. eg: first sorting your data file according to it's hash key; second, write to a SSTable directly On Fri, Apr 16, 2010 at 4:47 AM, Lee Parker l...@socialagency.com wrote: Has anyone used json2sstable to migrate a large amount of data into cassandra? What was your methodology? I assume that this will be much faster than stepping through my data and doing writes via PHP/Thrift. Lee Parker
Re: json2sstable
On Thu, Apr 15, 2010 at 3:47 PM, Lee Parker l...@socialagency.com wrote: Has anyone used json2sstable to migrate a large amount of data into cassandra? What was your methodology? I assume that this will be much faster than stepping through my data and doing writes via PHP/Thrift. If you're looking to do a bulk import, peek at contrib/bmt_example. -Brandon