Re: repair never finishing 1.0.7
Hi Andras, I am not using a VPN. The system has been running successfully in this configuration for a couple of weeks until I noticed the repair is not working. What happens is that I configure the IP Tables of the machine on each Cassandra node to forward packets that are sent to any of the IPs in the other DC (on ports 7000, 9160 and 7199) to be sent to the gateway IP. The gateway does the NAT sending the packets on the other side to the real destination IP, having replaced the source IP with the initial sender's IP (at least in my understanding of it). What might be the problem given the configuration? How to fix this? Cheers, Alex On Mon, Jun 25, 2012 at 12:47 PM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: The DCs are communicating over a gateway where I do NAT for ports 7000, 9160 and 7199. Ah, that sounds familiar. You don't mention if you are VPN'd or not. I'll assume you are not. So, your nodes are behind network address translation - is that to say they advertise ( broadcast ) their internal or translated/forwarded IP to each other? Setting up a Cassandra ring across NAT ( without a VPN ) is impossible in my experience. Either the nodes on your local network won't be able to communicate with each other, because they broadcast their translated ( public ) address which is normally ( router configuration ) not routable from within the local network, or the nodes broadcast their internal IP, in which case the outside nodes are helpless in trying to connect to a local net. On DC2 nodes/the node you issue the repair on, check for any sockets being opened to the internal addresses of the nodes in DC1. regards, Andras On 25 Jun 2012, at 11:57, Alexandru Sicoe wrote: Hello everyone, I have a 2 DC (DC1:3 and DC2:6) Cassandra1.0.7 setup. I have about 300GB/node in the DC2. The DCs are communicating over a gateway where I do NAT for ports 7000, 9160 and 7199. I did a nodetool repair on a node in DC2 without any external load on the system. It took 5 hrs to finish the Merkle tree calculations (which is fine for me) but then in the streaming phase nothing happens (0% seen in nodetool netstats) and stays like that forever. Note: it has to stream to/from nodes in DC1! I tried another time and still the same. Looking around I found this thread http://www.mail-archive.com/user@cassandra.apache.org/msg22167.html which seems to describe the same problem. The thread gives 2 suggestions: - a full cluster restart allows the first attempted repair to complete (haven't tested yet; this is not practical even if it works) - issue https://issues.apache.org/jira/browse/CASSANDRA-4223 can be the problem Questions: 1) How can I make sure that the JIRA issue above is my real problem? (I see no errors or warns in the logs; no other activity) 2) What should I do to make the repairs work? (If the JIRA issue is the problem, then I see there is a fix for it in Version 1.0.11 which is not released yet) Thanks, Alex
Secondary index data gone after restart (1.1.1)
Hi, I am running into some problems with secondary indexes that I am unable to track down. When I restart the cassandra service, the secondary index data won't load and I get the following error during startup: INFO 08:29:42,127 Opening /var/myproject/cassandra/data/mykeyspace/group_admin/mykeyspace-group_admin.group_admin_groupId_idx-hd-1 (20808 bytes) ERROR 08:29:42,159 Exception in thread Thread[SSTableBatchOpen:1,5,main] java.lang.ClassCastException: java.math.BigInteger cannot be cast to java.nio.ByteBuffer at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:37) at org.apache.cassandra.dht.LocalToken.compareTo(LocalToken.java:45) at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:89) at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:38) at java.util.TreeMap.getEntry(TreeMap.java:328) at java.util.TreeMap.containsKey(TreeMap.java:209) at java.util.TreeSet.contains(TreeSet.java:217) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:396) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:187) at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:225) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) When the service starts I can still select data from the column family, but not using the secondary index. After I execute nodetool rebuild_index the secondary index works fine again until the next restart. The error only seems to occur on the column groupId (TimeUUIDType). The other index on userId seems to work. I have the following column family definition: create column family group_admin with comparator = UTF8Type and key_validation_class = UTF8Type and column_metadata = [ {column_name: id, validation_class: UTF8Type}, {column_name: added, validation_class: LongType}, {column_name: userId, validation_class: BytesType, index_type: KEYS}, {column_name: requestMessage, validation_class: UTF8Type}, {column_name: status, validation_class: LongType}, {column_name: groupId, validation_class: TimeUUIDType, index_type: KEYS} ]; Thank you very much for your help! Ivo
Re: Secondary index data gone after restart (1.1.1)
Hi please refer JDK nio package's ByteBuffer, I don't think that ByteBuffer can be cast from the BigInteger directly, it seems you need make some conversion before put it into a ByteBuffer. Thanks Fei On Tue, Jun 26, 2012 at 12:07 AM, Ivo Meißner i...@overtronic.com wrote: Hi, I am running into some problems with secondary indexes that I am unable to track down. When I restart the cassandra service, the secondary index data won't load and I get the following error during startup: INFO 08:29:42,127 Opening /var/myproject/cassandra/data/mykeyspace/group_admin/mykeyspace-group_admin.group_admin_groupId_idx-hd-1 (20808 bytes) ERROR 08:29:42,159 Exception in thread Thread[SSTableBatchOpen:1,5,main] java.lang.ClassCastException: java.math.BigInteger cannot be cast to java.nio.ByteBuffer at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:37) at org.apache.cassandra.dht.LocalToken.compareTo(LocalToken.java:45) at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:89) at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:38) at java.util.TreeMap.getEntry(TreeMap.java:328) at java.util.TreeMap.containsKey(TreeMap.java:209) at java.util.TreeSet.contains(TreeSet.java:217) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:396) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:187) at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:225) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) When the service starts I can still select data from the column family, but not using the secondary index. After I execute nodetool rebuild_index the secondary index works fine again until the next restart. The error only seems to occur on the column groupId (TimeUUIDType). The other index on userId seems to work. I have the following column family definition: create column family group_admin with comparator = UTF8Type and key_validation_class = UTF8Type and column_metadata = [ {column_name: id, validation_class: UTF8Type}, {column_name: added, validation_class: LongType}, {column_name: userId, validation_class: BytesType, index_type: KEYS}, {column_name: requestMessage, validation_class: UTF8Type}, {column_name: status, validation_class: LongType}, {column_name: groupId, validation_class: TimeUUIDType, index_type: KEYS} ]; Thank you very much for your help! Ivo
How to use row caching to enable faster retrieval of rows in Cassandra
Dear all, I am trying to understand whether I can fasten the retrieval process using cache. Please can you help me write the code for setting the cache properties in Cassandra. Please help Thanks and Regards Prakrati This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: Secondary index data gone after restart (1.1.1)
Hi, but if the data must be converted, this is something that should be fixed inside cassandra… Is this a bug, should I file a bug report? Or is there some kind of setting I can change to make it work for now? Maybe it is related to this issue, but this should have been fixed in 1.1.0: https://issues.apache.org/jira/browse/CASSANDRA-3954 Thanks Ivo Am 26.06.2012 um 09:26 schrieb Fei Shan: Hi please refer JDK nio package's ByteBuffer, I don't think that ByteBuffer can be cast from the BigInteger directly, it seems you need make some conversion before put it into a ByteBuffer. Thanks Fei On Tue, Jun 26, 2012 at 12:07 AM, Ivo Meißner i...@overtronic.com wrote: Hi, I am running into some problems with secondary indexes that I am unable to track down. When I restart the cassandra service, the secondary index data won't load and I get the following error during startup: INFO 08:29:42,127 Opening /var/myproject/cassandra/data/mykeyspace/group_admin/mykeyspace-group_admin.group_admin_groupId_idx-hd-1 (20808 bytes) ERROR 08:29:42,159 Exception in thread Thread[SSTableBatchOpen:1,5,main] java.lang.ClassCastException: java.math.BigInteger cannot be cast to java.nio.ByteBuffer at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:37) at org.apache.cassandra.dht.LocalToken.compareTo(LocalToken.java:45) at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:89) at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:38) at java.util.TreeMap.getEntry(TreeMap.java:328) at java.util.TreeMap.containsKey(TreeMap.java:209) at java.util.TreeSet.contains(TreeSet.java:217) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:396) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:187) at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:225) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) When the service starts I can still select data from the column family, but not using the secondary index. After I execute nodetool rebuild_index the secondary index works fine again until the next restart. The error only seems to occur on the column groupId (TimeUUIDType). The other index on userId seems to work. I have the following column family definition: create column family group_admin with comparator = UTF8Type and key_validation_class = UTF8Type and column_metadata = [ {column_name: id, validation_class: UTF8Type}, {column_name: added, validation_class: LongType}, {column_name: userId, validation_class: BytesType, index_type: KEYS}, {column_name: requestMessage, validation_class: UTF8Type}, {column_name: status, validation_class: LongType}, {column_name: groupId, validation_class: TimeUUIDType, index_type: KEYS} ]; Thank you very much for your help! Ivo
Re: Request Timeout with Composite Columns and CQL3
On Mon, Jun 25, 2012 at 11:10 PM, Henning Kropp kr...@nurago.com wrote: Hi, I am running into timeout issues using composite columns in cassandra 1.1.1 and cql 3. My keyspace and table is defined as the following: create keyspace bn_logs with strategy_options = [{replication_factor:1}] and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'; CREATE TABLE logs ( id text, ref text, time bigint, datum text, PRIMARY KEY(id, ref, time) ); I import some data to the table by using a combination of the thrift interface and the hector Composite.class by using its serialization as the column name: Column col = new Column(composite.serialize()); This all seems to work fine until I try to execute the following query which leads to a request timeout: SELECT datum FROM logs WHERE id='861' and ref = 'raaf' and time '3000'; If it timeouts the likely reason is that this query selects more data than the machine is able to fetch before the timeout. You can either add a limit to the query, or increase the timeout. If that doesn't seem to fix it, it might be worth checking the server log to see if there isn't an error. I really would like to figure out, why running this query on my laptop (single node, for development) will not finish. I also would like to know if the following query would actually work SELECT datum FROM logs WHERE id='861' and ref = 'raaf*' and time '3000'; It won't. You can perform the following query: SELECT datum FROM logs WHERE id='861' and ref = 'raaf'; which will select every datum whose ref starts with 'raaf', but then you cannot restrict the time parameter, so you will get ref where the time is = 3000. Of course you can always filter client side if that is an option. or how else there is a way to define a range for the second component of the column key? As described above, you can define a range on the second component, but then you won't be able to restrict on the 3rd component. Any thoughts? Thanks in advance and kind regards Henning
Re: Removing a counter columns using Thrift interface
On Mon, Jun 25, 2012 at 9:28 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Mon, Jun 25, 2012 at 9:06 AM, Patrik Modesto patrik.mode...@gmail.com wrote: I'm used to use Mutation for everything, so the first thing I tried was Deletion on Counter column. Well, nothing happened. No error and the Counter column was still there. That shouldn't happen. The second try was the remove_counter() method. When I set just the column_family of ColumnPath, nothing happened. No error and the Counter column was still there. I supposed it would work like the remove() method which would remove whole row. It should. If if it doesn't that would be a bug. If you can reproduce such a bug, then please do open a ticket. I've tried again today and found one bug in my test program. Now Deletion works as expected. The remove_counter() works as well, I misinterpreted the results. Regards, P.
Create column family fail
Hi, I create this column family: CREATE COLUMN FAMILY Clients WITH column_type='Super' AND key_validation_class = LongType -- master_id AND comparator = LongType -- client_id AND subcomparator = UTF8Type AND column_metadata = [ {column_name: client_name, validation_class: UTF8Type} ]; But column metadata is not saved, as you can see with cassandra-cli: create column family Clients with column_type = 'Super' and comparator = 'BytesType' and subcomparator = 'BytesType' and default_validation_class = 'BytesType' and key_validation_class = 'LongType' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 864000 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' and caching = 'KEYS_ONLY' and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'} Only happend with that column family and i don't know why. Comparator and subcomparator look bad too. Can help please? -- Juan Ezquerro LLanes Sofistic Team Telf: 618349107/964051479
Re: Create column family fail
Ok the '--' was the problem ... LOL 2012/6/26 Juan Ezquerro j...@sofistic.net Hi, I create this column family: CREATE COLUMN FAMILY Clients WITH column_type='Super' AND key_validation_class = LongType -- master_id AND comparator = LongType -- client_id AND subcomparator = UTF8Type AND column_metadata = [ {column_name: client_name, validation_class: UTF8Type} ]; But column metadata is not saved, as you can see with cassandra-cli: create column family Clients with column_type = 'Super' and comparator = 'BytesType' and subcomparator = 'BytesType' and default_validation_class = 'BytesType' and key_validation_class = 'LongType' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 864000 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' and caching = 'KEYS_ONLY' and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'} Only happend with that column family and i don't know why. Comparator and subcomparator look bad too. Can help please? -- Juan Ezquerro LLanes Sofistic Team Telf: 618349107/964051479 -- Juan Ezquerro LLanes Sofistic Team Telf: 618349107/964051479
AW: Request Timeout with Composite Columns and CQL3
Thanks for the reply. Should have thought about looking into the log files sooner. An AssertionError happens at execution. I haven't figured out yet why. Any input is very much appreciated: ERROR [ReadStage:1] 2012-06-26 15:49:54,481 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[ReadStage:1,5,main] java.lang.AssertionError: Added column does not sort as the last column at org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:130) at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:107) at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:102) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:141) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:139) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:283) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:63) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1321) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1183) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1118) at org.apache.cassandra.db.Table.getRow(Table.java:374) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:816) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1250) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) BTW: I really would love to understand as of why the combined comparator will not allow two ranges be specified for two key parts. Obviously I still lack a profound understanding of cassandras architecture to have a clue. And while client side filtering might seem like a valid option I am still trying to get might head around a cassandra data model that would allow this. best regards Von: Sylvain Lebresne [sylv...@datastax.com] Gesendet: Dienstag, 26. Juni 2012 10:21 Bis: user@cassandra.apache.org Betreff: Re: Request Timeout with Composite Columns and CQL3 On Mon, Jun 25, 2012 at 11:10 PM, Henning Kropp kr...@nurago.com wrote: Hi, I am running into timeout issues using composite columns in cassandra 1.1.1 and cql 3. My keyspace and table is defined as the following: create keyspace bn_logs with strategy_options = [{replication_factor:1}] and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'; CREATE TABLE logs ( id text, ref text, time bigint, datum text, PRIMARY KEY(id, ref, time) ); I import some data to the table by using a combination of the thrift interface and the hector Composite.class by using its serialization as the column name: Column col = new Column(composite.serialize()); This all seems to work fine until I try to execute the following query which leads to a request timeout: SELECT datum FROM logs WHERE id='861' and ref = 'raaf' and time '3000'; If it timeouts the likely reason is that this query selects more data than the machine is able to fetch before the timeout. You can either add a limit to the query, or increase the timeout. If that doesn't seem to fix it, it might be worth checking the server log to see if there isn't an error. I really would like to figure out, why running this query on my laptop (single node, for development) will not finish. I also would like to know if the following query would actually work SELECT datum FROM logs WHERE id='861' and ref = 'raaf*' and time '3000'; It won't. You can perform the following query: SELECT datum FROM logs WHERE id='861' and ref = 'raaf'; which will select every datum whose ref starts with 'raaf', but then you cannot restrict the time parameter, so you will get ref where the time is = 3000. Of course you can always filter client side if that is an option. or how else there is a way to define a range for the second component of the column key? As described above, you can define a range on the second component, but then you won't be able to restrict on the 3rd component. Any thoughts? Thanks in advance and kind regards Henning
Cassandra and massive TTL expirations cause HEAP issue
Hello, I am evaluating Cassandra in a log retrieval application. My ring conists of3 m2.xlarge instances (17.1 GB memory, 6.5 ECU (2 virtual cores with 3.25 EC2 Compute Units each), 420 GB of local instance storage, 64-bit platform) and I am writing at roughly 220 writes/sec. Per day I am adding roughly 60GB of data. All of this sounds simple and easy and all three nodes are humming along with basically no load. The issue is that I am writing all my data with a TTL of 10 days. After 10 days my cluster crashes due to a java.lang.OutOfMemoryError during compaction of the big column family that contains roughly 95% of the data. So basically after 10 days my data set is 600GB and after 10 days Cassandra would have to tombstone and purge 60GB of data at the same rate of roughly 220 deletes/second. I am not sure if Cassandra should be able to do it, whether I should take a partitioning approach (one CF per day), or if there is simply some tweaks I need to make in the yaml file. I have tried: 1. Decrease flush-largest-memtables-at to .4 2. reduce_cache_sizes_at and reduce_cache_capacity_to set to 1 Now, the issue remains the same: WARN [ScheduledTasks:1] 2012-06-11 19:39:42,017 GCInspector.java (line 145) Heap is 0.9920103380107628 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically. Eventually it will just die with this message. This affects all nodes in the cluster, not just one. Dump file is incomplete: file size limit ERROR 19:39:39,695 Exception in thread Thread[ReadStage:134,5,main] java.lang.OutOfMemoryError: Java heap space ERROR 19:39:39,724 Exception in thread Thread[MutationStage:57,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.utils.FBUtilities.hashToBigInteger(FBUtilities.java:213) at org.apache.cassandra.dht.RandomPartitioner.getToken(RandomPartitioner.java:154) at org.apache.cassandra.dht.RandomPartitioner.decorateKey(RandomPartitioner.java:47) at org.apache.cassandra.db.RowPosition.forKey(RowPosition.java:54) Any help is highly appreciated. It would be cool to tweak it in a way that I can have a moving window of 10 days in Cassandra while dropping the old data… Or, if there is any other recommended way to deal with such sliding time windows I am open for ideas. Thank you for your help!
Multi datacenter, WAN hiccups and replication
My Cassandra ring spans two DCs. I use local quorum with replication factor=3. I do a write in DC1 with local quorum. Data gets written to multiple nodes in DC1. For the same write to propagate to DC2 only one copy is sent from the coordinator node in DC1 to a coordinator node in DC2 for optimizing traffic over the WAN (from what I have read in the Cassandra documentation) Will a Wan hiccup result in a Hinted Handoff (HH) being created in DC1's coordinator for DC2 to be delivered when the Wan link is up again?
Re: Multi datacenter, WAN hiccups and replication
On Tue, Jun 26, 2012 at 7:52 AM, Karthik N karthik@gmail.com wrote: My Cassandra ring spans two DCs. I use local quorum with replication factor=3. I do a write in DC1 with local quorum. Data gets written to multiple nodes in DC1. For the same write to propagate to DC2 only one copy is sent from the coordinator node in DC1 to a coordinator node in DC2 for optimizing traffic over the WAN (from what I have read in the Cassandra documentation) Will a Wan hiccup result in a Hinted Handoff (HH) being created in DC1's coordinator for DC2 to be delivered when the Wan link is up again? I have seen hinted handoff messages in the log files when the remote DC is unreachable. But this mechanism is only used for a the time defined in cassandra.yaml file.
Re: Request Timeout with Composite Columns and CQL3
On Tue, Jun 26, 2012 at 4:00 PM, Henning Kropp kr...@nurago.com wrote: Thanks for the reply. Should have thought about looking into the log files sooner. An AssertionError happens at execution. I haven't figured out yet why. Any input is very much appreciated: ERROR [ReadStage:1] 2012-06-26 15:49:54,481 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[ReadStage:1,5,main] java.lang.AssertionError: Added column does not sort as the last column at org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:130) at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:107) at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:102) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:141) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:139) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:283) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:63) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1321) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1183) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1118) at org.apache.cassandra.db.Table.getRow(Table.java:374) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:816) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1250) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Obviously that shouldn't happen. You didn't happen to change the comparator for the column family or something like that from the hector side? Are you able to reproduce from a blank DB? -- Sylvain BTW: I really would love to understand as of why the combined comparator will not allow two ranges be specified for two key parts. Obviously I still lack a profound understanding of cassandras architecture to have a clue. And while client side filtering might seem like a valid option I am still trying to get might head around a cassandra data model that would allow this. best regards Von: Sylvain Lebresne [sylv...@datastax.com] Gesendet: Dienstag, 26. Juni 2012 10:21 Bis: user@cassandra.apache.org Betreff: Re: Request Timeout with Composite Columns and CQL3 On Mon, Jun 25, 2012 at 11:10 PM, Henning Kropp kr...@nurago.com wrote: Hi, I am running into timeout issues using composite columns in cassandra 1.1.1 and cql 3. My keyspace and table is defined as the following: create keyspace bn_logs with strategy_options = [{replication_factor:1}] and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'; CREATE TABLE logs ( id text, ref text, time bigint, datum text, PRIMARY KEY(id, ref, time) ); I import some data to the table by using a combination of the thrift interface and the hector Composite.class by using its serialization as the column name: Column col = new Column(composite.serialize()); This all seems to work fine until I try to execute the following query which leads to a request timeout: SELECT datum FROM logs WHERE id='861' and ref = 'raaf' and time '3000'; If it timeouts the likely reason is that this query selects more data than the machine is able to fetch before the timeout. You can either add a limit to the query, or increase the timeout. If that doesn't seem to fix it, it might be worth checking the server log to see if there isn't an error. I really would like to figure out, why running this query on my laptop (single node, for development) will not finish. I also would like to know if the following query would actually work SELECT datum FROM logs WHERE id='861' and ref = 'raaf*' and time '3000'; It won't. You can perform the following query: SELECT datum FROM logs WHERE id='861' and ref = 'raaf'; which will select every datum whose ref starts with 'raaf', but then you cannot restrict the time parameter, so you will get ref where the time is = 3000. Of course you can always filter client side if that is an option. or how else there is a way to define a range for the second component of the column key? As described above, you can define a range on the second component, but then you won't be able to restrict on the 3rd component. Any thoughts? Thanks in advance and kind
Re: Multi datacenter, WAN hiccups and replication
Since Cassandra optimizes and sends only one copy over the WAN, can I opt in only for HH for WAN replication and avoid HH for the local quorum? (since I know I have more copies) On Tuesday, June 26, 2012, Mohit Anchlia wrote: On Tue, Jun 26, 2012 at 7:52 AM, Karthik N karthik@gmail.comjavascript:_e({}, 'cvml', 'karthik@gmail.com'); wrote: My Cassandra ring spans two DCs. I use local quorum with replication factor=3. I do a write in DC1 with local quorum. Data gets written to multiple nodes in DC1. For the same write to propagate to DC2 only one copy is sent from the coordinator node in DC1 to a coordinator node in DC2 for optimizing traffic over the WAN (from what I have read in the Cassandra documentation) Will a Wan hiccup result in a Hinted Handoff (HH) being created in DC1's coordinator for DC2 to be delivered when the Wan link is up again? I have seen hinted handoff messages in the log files when the remote DC is unreachable. But this mechanism is only used for a the time defined in cassandra.yaml file. -- Thanks, Karthik
Re: Multi datacenter, WAN hiccups and replication
On Tue, Jun 26, 2012 at 8:16 AM, Karthik N karthik@gmail.com wrote: Since Cassandra optimizes and sends only one copy over the WAN, can I opt in only for HH for WAN replication and avoid HH for the local quorum? (since I know I have more copies) I am not sure if I understand your question. In general I don't think you can selectively decide on HH. Besides HH should only be used when the outage is in mts, for longer outages using HH would only create memory pressure. On Tuesday, June 26, 2012, Mohit Anchlia wrote: On Tue, Jun 26, 2012 at 7:52 AM, Karthik N karthik@gmail.com wrote: My Cassandra ring spans two DCs. I use local quorum with replication factor=3. I do a write in DC1 with local quorum. Data gets written to multiple nodes in DC1. For the same write to propagate to DC2 only one copy is sent from the coordinator node in DC1 to a coordinator node in DC2 for optimizing traffic over the WAN (from what I have read in the Cassandra documentation) Will a Wan hiccup result in a Hinted Handoff (HH) being created in DC1's coordinator for DC2 to be delivered when the Wan link is up again? I have seen hinted handoff messages in the log files when the remote DC is unreachable. But this mechanism is only used for a the time defined in cassandra.yaml file. -- Thanks, Karthik
Re: Consistency Problem with Quorum consistencyLevel configuration
Hi After enable Cassandra debug log, I got following log, it shows the delete mutation send to other two nodes rather then local node. And then the read command come to the local nodes. And local one found the mismatch. But I don't know why local node return the local dirty data. It supposed to repair the data, and return correct one? 192.168.0.6: DEBUG [MutationStage:61] 2012-06-26 23:09:00,036 RowMutationVerbHandler.java (line 60) RowMutation(keyspace='drc', key='33323130537570657254616e6730', modifications=[ColumnFamily(queue -deleted at 1340723340044000- [])]) applied. Sending response to 3555@/ 192.168.0.5 192.168.0.4: DEBUG [MutationStage:40] 2012-06-26 23:09:00,041 RowMutationVerbHandler.java (line 60) RowMutation(keyspace='drc', key='33323130537570657254616e6730', modifications=[ColumnFamily(queue -deleted at 1340723340044000- [])]) applied. Sending response to 3556@/ 192.168.0.5 192.168.0.5 (local one): DEBUG [pool-2-thread-20] 2012-06-26 23:09:00,105 StorageProxy.java (line 705) Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(7649972972837658739074639933581556, 33323130537570657254616e6730) (b20ac6ec0d29393d70e200027c094d13 vs d41d8cd98f00b204e9800998ecf8427e) 2012/6/25 Jason Tang ares.t...@gmail.com Hi I met the consistency problem when we have Quorum for both read and write. I use MultigetSubSliceQuery to query rows from super column limit size 100, and then read it, then delete it. And start another around. But I found, the row which should be delete by last query, it still shown from next around query. And also form normal Column Family, I updated the value of one column from status='FALSE' to status='TURE', and next time I query it, the status still 'FALSE'. More detail: - It not happened not every time (1/10,000) - The time between two round query is around 500 ms (but we found two query which 2 seconds happened later then the first one, still have this consistency problem) - We use ntp as our cluster time synchronization solution. - We have 6 nodes, and replication factor is 3 Some body say, Cassandra suppose to have such problem, because read may not happen before write inside Cassandra. But for two seconds?! And if so, it meaningless to have Quorum or other consistency level configuration. So first of all, is it the correct behavior of Cassandra, and if not, what data we need to analyze for further investment. BRs Ares
Re: Multi datacenter, WAN hiccups and replication
Let me attempt to articulate my question a little better. Say I choose LOCAL_QUORUM with a Replication Factor of 3. Cassandra stores three copies in my local datacenter. Therefore the cost associated with losing one node is not very high locally, and I usually HH, and use read repair/nodetool repair instead. However over the WAN network blips are quite normal and HH really helps. More so because for WAN replication Cassandra sends only one copy to a coordinator in the remote datacenter. Therefore I was wondering if Cassandra already intelligently optimizes for HH-over-WAN (since this is common) or alternately if there's a way to enable HH for WAN replication? Thank you. On Tue, Jun 26, 2012 at 9:22 AM, Mohit Anchlia mohitanch...@gmail.com wrote: On Tue, Jun 26, 2012 at 8:16 AM, Karthik N karthik@gmail.com wrote: Since Cassandra optimizes and sends only one copy over the WAN, can I opt in only for HH for WAN replication and avoid HH for the local quorum? (since I know I have more copies) I am not sure if I understand your question. In general I don't think you can selectively decide on HH. Besides HH should only be used when the outage is in mts, for longer outages using HH would only create memory pressure. On Tuesday, June 26, 2012, Mohit Anchlia wrote: On Tue, Jun 26, 2012 at 7:52 AM, Karthik N karthik@gmail.com wrote: My Cassandra ring spans two DCs. I use local quorum with replication factor=3. I do a write in DC1 with local quorum. Data gets written to multiple nodes in DC1. For the same write to propagate to DC2 only one copy is sent from the coordinator node in DC1 to a coordinator node in DC2 for optimizing traffic over the WAN (from what I have read in the Cassandra documentation) Will a Wan hiccup result in a Hinted Handoff (HH) being created in DC1's coordinator for DC2 to be delivered when the Wan link is up again? I have seen hinted handoff messages in the log files when the remote DC is unreachable. But this mechanism is only used for a the time defined in cassandra.yaml file. -- Thanks, Karthik
Re: Multi datacenter, WAN hiccups and replication
I re-read my last post and didn't think I had done a good job articulating. Sorry! I'll try again... Say I choose LOCAL_QUORUM with a Replication Factor of 3. Cassandra stores three copies in my local datacenter. Therefore the cost associated with losing one node is not very high locally, and I usually disable HH, and use read repair/nodetool repair instead. However over the WAN, network blips are quite normal and HH really helps. More so because for WAN replication Cassandra sends only one copy to a coordinator in the remote datacenter, and it's rather vital for that copy to make it over to keep the two datacenters in sync. Therefore I was wondering if Cassandra already intelligently special cases for HH-over-WAN (since this is common) even if HH is disabled or alternately if there's a way to enable HH for WAN replication only while disabling it for the LOCAL_QUORUM? Thank you. Thanks, Karthik On Tue, Jun 26, 2012 at 10:14 AM, Karthik N karthik@gmail.com wrote: Let me attempt to articulate my question a little better. Say I choose LOCAL_QUORUM with a Replication Factor of 3. Cassandra stores three copies in my local datacenter. Therefore the cost associated with losing one node is not very high locally, and I usually HH, and use read repair/nodetool repair instead. However over the WAN network blips are quite normal and HH really helps. More so because for WAN replication Cassandra sends only one copy to a coordinator in the remote datacenter. Therefore I was wondering if Cassandra already intelligently optimizes for HH-over-WAN (since this is common) or alternately if there's a way to enable HH for WAN replication? Thank you. On Tue, Jun 26, 2012 at 9:22 AM, Mohit Anchlia mohitanch...@gmail.com wrote: On Tue, Jun 26, 2012 at 8:16 AM, Karthik N karthik@gmail.com wrote: Since Cassandra optimizes and sends only one copy over the WAN, can I opt in only for HH for WAN replication and avoid HH for the local quorum? (since I know I have more copies) I am not sure if I understand your question. In general I don't think you can selectively decide on HH. Besides HH should only be used when the outage is in mts, for longer outages using HH would only create memory pressure. On Tuesday, June 26, 2012, Mohit Anchlia wrote: On Tue, Jun 26, 2012 at 7:52 AM, Karthik N karthik@gmail.com wrote: My Cassandra ring spans two DCs. I use local quorum with replication factor=3. I do a write in DC1 with local quorum. Data gets written to multiple nodes in DC1. For the same write to propagate to DC2 only one copy is sent from the coordinator node in DC1 to a coordinator node in DC2 for optimizing traffic over the WAN (from what I have read in the Cassandra documentation) Will a Wan hiccup result in a Hinted Handoff (HH) being created in DC1's coordinator for DC2 to be delivered when the Wan link is up again? I have seen hinted handoff messages in the log files when the remote DC is unreachable. But this mechanism is only used for a the time defined in cassandra.yaml file. -- Thanks, Karthik
Re: cassandra 1.0.9 error - Read an invalid frame size of 0
i have seen this as well, is it a known issue? On 18/06/2012 19:38, Gurpreet Singh wrote: I found a fix for this one, rather a workaround. I changed the rpc_server_type in cassandra.yaml, from hsha to sync, and the error went away. I guess, there is some issue with the thrift nonblocking server. Thanks Gurpreet On Wed, May 16, 2012 at 7:04 PM, Gurpreet Singh gurpreet.si...@gmail.com mailto:gurpreet.si...@gmail.com wrote: Thanks Aaron. will do! On Mon, May 14, 2012 at 1:14 PM, aaron morton aa...@thelastpickle.com mailto:aa...@thelastpickle.com wrote: Are you using framed transport on the client side ? Try the Hector user list for hector specific help https://groups.google.com/forum/?fromgroups#!searchin/hector-users https://groups.google.com/forum/?fromgroups#%21searchin/hector-users Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/05/2012, at 5:44 AM, Gurpreet Singh wrote: This is hampering our testing of cassandra a lot, and our move to cassandra 1.0.9. Has anyone seen this before? Should I be trying a different version of cassandra? /G On Thu, May 10, 2012 at 11:29 PM, Gurpreet Singh gurpreet.si...@gmail.com mailto:gurpreet.si...@gmail.com wrote: Hi, i have created 1 node cluster of cassandra 1.0.9. I am setting this up for testing reads/writes. I am seeing the following error in the server system.log ERROR [Selector-Thread-7] 2012-05-10 22:44:02,607 TNonblockingServer.java (line 467) Read an invalid frame size of 0. Are you using TFramedTransport on the client side? Initially i was using a old hector 0.7.x, but even after switching to hector 1.0-5 and thrift version 0.6.1, i still see this error. I am using 20 threads writing/reading from cassandra. The max write batch size is 10 with payload size constant per key to be 600 bytes. On the client side, i see Hector exceptions happenning coinciding with these messages on the server. Any ideas why these errors are happenning? Thanks Gurpreet
Ball is rolling on High Performance Cassandra Cookbook second edition
Hello all, It has not been very long since the first book was published but several things have been added to Cassandra and a few things have changed. I am putting together a list of changed content, for example features like the old per Column family memtable flush settings versus the new system with the global variable. My editors have given me the green light to grow the second edition from ~200 pages currently up to 300 pages! This gives us the ability to add more items/sections to the text. Some things were missing from the first edition such as Hector support. Nate has offered to help me in this area. Please feel contact me with any ideas and suggestions of recipes you would like to see in the book. Also get in touch if you want to write a recipe. Several people added content to the first edition and it would be great to see that type of participation again. Thank you, Edward
Amazingly bad compaction performance
We occasionally see fairly poor compaction performance on random nodes in our 7-node cluster, and I have no idea why. This is one example from the log: [CompactionExecutor:45] 2012-06-26 13:40:18,721 CompactionTask.java (line 221) Compacted to [/raid00/cassandra_data/main/basic/main-basic.basic_id_index-hd-160-Data.db,]. 26,632,210 to 26,679,667 (~100% of original) bytes for 2 keys at 0.006250MB/s. Time: 4,071,163ms. That particular event took over an hour to compact only 25 megabytes. During that time, there was very little disk IO, and the java process (OpenJDK 7) was pegged at 200% CPU. The node was also completely unresponsive to network requests until the compaction was finished. Most compactions run just over 7MB/s. This is an extreme outlier, but users definitely notice the hit when it occurs. I grabbed a sample of the process using jstack, and this was the only thread in CompactionExecutor: CompactionExecutor:54 daemon prio=1 tid=41247522816 nid=0x99a5ff740 runnable [140737253617664] java.lang.Thread.State: RUNNABLE at org.xerial.snappy.SnappyNative.rawCompress(Native Method) at org.xerial.snappy.Snappy.rawCompress(Snappy.java:358) at org.apache.cassandra.io.compress.SnappyCompressor.compress(SnappyCompressor.java:80) at org.apache.cassandra.io.compress.CompressedSequentialWriter.flushData(CompressedSequentialWriter.java:89) at org.apache.cassandra.io.util.SequentialWriter.flushInternal(SequentialWriter.java:196) at org.apache.cassandra.io.util.SequentialWriter.reBuffer(SequentialWriter.java:260) at org.apache.cassandra.io.util.SequentialWriter.writeAtMost(SequentialWriter.java:128) at org.apache.cassandra.io.util.SequentialWriter.write(SequentialWriter.java:112) at java.io.DataOutputStream.write(DataOutputStream.java:107) - locked 36527862064 (a java.io.DataOutputStream) at org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:142) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:156) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Is it possible that there is an issue with snappy compression? Based on the lousy compression ratio, I think we could get by without it just fine. Can compression be changed or disabled on-the-fly with cassandra? - .Dustin
bulk load problem
Dear all, I am trying to use sstableloader in cassandra 1.1.1, to bulk load some data into a single node cluster. I am running the following command: bin/sstableloader -d 192.168.100.1 /data/ssTable/tpch/tpch/ from another node (other than the node on which cassandra is running), while the data should be loaded into a keyspace named tpch. I made sure that the 2nd node, from which I run sstableloader, have the same copy of cassandra.yaml as the destination node. I have put tpch-cf0-hd-1-Data.db tpch-cf0-hd-1-Index.db under the path, I have passed to sstableloader. But I am getting the following error: Could not retrieve endpoint ranges: Any hint ? Thanks in advance, James
Re: Migrate keyspace from version 1.0.8 to 1.1.1
There is nothing listed in the News file https://github.com/apache/cassandra/blob/cassandra-1.1/NEWS.txt Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 26/06/2012, at 3:16 AM, Thierry Templier wrote: Hello, What is the correct way to migrate a keyspace version 1.0.8 to 1.1.1? Is there a documentation on this subject? Thanks for your help. Thierry