Exception in logs using LCS .
Hello, Recently we changed compaction strategy fora table to LCS from the default STCS. While bootstrapping a node , we are getting following Exception in the logs: ERROR [CompactionExecutor:81] 2016-05-11 13:48:54,694 CassandraDaemon.java (line 258) Exception in thread Thread[CompactionExecutor:81,1,main] java.lang.RuntimeException: Last written key DecoratedKey(-2711050270696519088, 623330333a313a35) >= current key DecoratedKey(-8631371593982690738, 623437393a313a30) writing into /cassandra/data/main/myCF/myCF-tmp-jb-326-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:143) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:166) at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:167) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) INFO [CompactionExecutor:68] 2016-05-11 14:10:28,957 ColumnFamilyStore.java (line 795) Enqueuing flush of Memtable-compactions_in_progress@541886922(0/0 serialized/live bytes, 1 ops) Questions: 1. Are these exceptions due to data corruption ? 2. I am unable to reproduce the problem. How can I reproduce the Exception? Is there any specific case when such exceptions are raised? 3. Without reproducing the Exception, how can I test the patch available at related JIRA : https://issues.apache.org/jira/browse/CASSANDRA-9935 Thanks, Prakash Chauhan.
RE: Exception in logs using LCS .
Hi Paulo, Thanks for the reply. Running scrub every time the exception occurs is not a possible solution for us. We don’t have a log patrolling system in place that verify the log on regular basis. I have some queries based on your reply: 1. You mentioned race condition. Can you please elaborate a little more what actual race condition may be happening in this scenario ? 2. What if this happens in production ? What would be the impact? Can we live with it? 3. Test system where the problem was reported has been scrapped so we can’t run user defined compaction now. Any suggestions to reproduce it on a new system? From: Paulo Motta [mailto:pauloricard...@gmail.com] Sent: Tuesday, June 28, 2016 5:43 PM To: user@cassandra.apache.org Subject: Re: Exception in logs using LCS . 1. Not necessarily data corruption, but it seems compaction is trying to write data in the wrong order most likely due to a temporary race condition/bug a la #9935, but since the compaction fails your original data is probably safe (you can try running scrub to verify/fix corruptions). 2. This is pretty tricky to reproduce because it will depend on which sstables were picked for compaction at a particular instant, but you could try running a user-defined compaction or scrub on the sstables that contain this key, see CASSANDRA-11337and https://gist.github.com/jeromatron/e238e5795b3e79866b83 3. clone the cassandra repository of your current version, git-cherry-pick the commit of CASSANDRA-9935, ant jar, replace the cassandra jar with the generated SNAPSHOT.jar and restart the node` 2016-06-28 7:55 GMT-03:00 Prakash Chauhan <prakash.chau...@ericsson.com<mailto:prakash.chau...@ericsson.com>>: Hello, Recently we changed compaction strategy fora table to LCS from the default STCS. While bootstrapping a node , we are getting following Exception in the logs: ERROR [CompactionExecutor:81] 2016-05-11 13:48:54,694 CassandraDaemon.java (line 258) Exception in thread Thread[CompactionExecutor:81,1,main] java.lang.RuntimeException: Last written key DecoratedKey(-2711050270696519088, 623330333a313a35) >= current key DecoratedKey(-8631371593982690738, 623437393a313a30) writing into /cassandra/data/main/myCF/myCF-tmp-jb-326-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:143) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:166) at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:167) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) INFO [CompactionExecutor:68] 2016-05-11 14:10:28,957 ColumnFamilyStore.java (line 795) Enqueuing flush of Memtable-compactions_in_progress@541886922(0/0 serialized/live bytes, 1 ops) Questions: 1. Are these exceptions due to data corruption ? 2. I am unable to reproduce the problem. How can I reproduce the Exception? Is there any specific case when such exceptions are raised? 3. Without reproducing the Exception, how can I test the patch available at related JIRA : https://issues.apache.org/jira/browse/CASSANDRA-9935 Thanks, Prakash Chauhan.
Slowness in C* cluster after implementing multiple network interface configuration.
Hi All , Need Help !!! Setup Details: Cassandra 2.0.14 Geo Red setup * DC1 - 3 nodes * DC2 - 3 nodes We were trying to implement multiple network interfaces with Cassandra 2.0.14 After doing all the steps mentioned in DataStax doc http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/configuration/configMultiNetworks.html, we observed that nodes were not able to see each other (checked using nodetool status). To resolve this issue, we followed the comment<https://issues.apache.org/jira/browse/CASSANDRA-9748?focusedCommentId=14903515=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14903515> mentioned in the JIRA : CASSANDRA-9748<https://issues.apache.org/jira/browse/CASSANDRA-9748> Exact steps that we followed are : 1. Stop Cassandra 2. Add rule to "iptables" to forward all packets on the public interface to the private interface. COMMAND: # iptables -t nat -A PREROUTING -p tcp -m tcp -d --dport 7000 -j DNAT --to-destination :7000 3. In Cassandra.yaml, add property "broadcast_address". 4. In Cassandra.yaml, change "listen_address" to private IP. 5. Clear the data from directory "peers". 6. Change Snitch to GossipingPropertyFileSnitch. 7. Append following property to the file "/etc/cassandra/conf/cassandra-env.sh" to purge gossip state. JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false" 8. Start Cassandra 9. After node has been started, remove following property from the file "/etc/cassandra/conf/cassandra-env.sh" (previously added in step 7) JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false" 10. Delete file "/etc/cassandra/conf/cassandra-topology.properties" Now We have an observation that after multiple restarts of Cassandra on multiple nodes, slowness is observed in the cluster. The problem gets resolved when we revert the steps mentioned above. Do u think there is any step that can cause the problem ? We are suspecting Step 2(iptable rule) but not very sure about it. Regards, Prakash Chauhan.
RE: Slowness in C* cluster after implementing multiple network interface configuration.
Hi All, We have a new observation. Earlier for implementing multiple network interfaces, we were deleting cassandra-topologies.properties in the last step (Steps are mentioned in mail trail). The rationale was that because we are using altogether a new endpoint_snitch , we don't require cassandra-topologies.properties file anymore. Now we have observed that if we don't delete cassandra-topologies.properties, the slowness is not there in the cluster (Even with multiple restarts) Is there some relationship between GossipingPropertyFileSnitch and cassandra-topologies.properties ? As per my knowledge, cassandra-topologies.properties file is only used as a fallback while doing snitch migration. If that's the case, why does Cassandra becomes slow with time ( and after doing multiple restarts ) after deleting cassandra-topologies.properties ? Regards, Prakash Chauhan. From: Cogumelos Maravilha [mailto:cogumelosmaravi...@sapo.pt] Sent: Wednesday, May 24, 2017 12:15 AM To: user@cassandra.apache.org Subject: Re: Slowness in C* cluster after implementing multiple network interface configuration. Hi, I never used version 2.0.x but I think port 7000 isn't enough. Try enable: 7000 inter-node 7001 SSL inter-node 9042 CQL 9160 Thrift is enable in that version And In Cassandra.yaml, add property "broadcast_address". = local ipv4 In Cassandra.yaml, change "listen_address" to private IP. = local ipv4 As a starting point. Cheers. On 22-05-2017 12:36, Prakash Chauhan wrote: Hi All , Need Help !!! Setup Details: Cassandra 2.0.14 Geo Red setup * DC1 - 3 nodes * DC2 - 3 nodes We were trying to implement multiple network interfaces with Cassandra 2.0.14 After doing all the steps mentioned in DataStax doc http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/configuration/configMultiNetworks.html, we observed that nodes were not able to see each other (checked using nodetool status). To resolve this issue, we followed the comment<https://issues.apache.org/jira/browse/CASSANDRA-9748?focusedCommentId=14903515=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14903515> mentioned in the JIRA : CASSANDRA-9748<https://issues.apache.org/jira/browse/CASSANDRA-9748> Exact steps that we followed are : 1. Stop Cassandra 2. Add rule to "iptables" to forward all packets on the public interface to the private interface. COMMAND: # iptables -t nat -A PREROUTING -p tcp -m tcp -d --dport 7000 -j DNAT --to-destination :7000 3. In Cassandra.yaml, add property "broadcast_address". 4. In Cassandra.yaml, change "listen_address" to private IP. 5. Clear the data from directory "peers". 6. Change Snitch to GossipingPropertyFileSnitch. 7. Append following property to the file "/etc/cassandra/conf/cassandra-env.sh" to purge gossip state. JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false" 8. Start Cassandra 9. After node has been started, remove following property from the file "/etc/cassandra/conf/cassandra-env.sh" (previously added in step 7) JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false" 10. Delete file "/etc/cassandra/conf/cassandra-topology.properties" Now We have an observation that after multiple restarts of Cassandra on multiple nodes, slowness is observed in the cluster. The problem gets resolved when we revert the steps mentioned above. Do u think there is any step that can cause the problem ? We are suspecting Step 2(iptable rule) but not very sure about it. Regards, Prakash Chauhan.