Exception in logs using LCS .

2016-06-28 Thread Prakash Chauhan
Hello,

Recently we changed compaction strategy fora table to LCS from the default 
STCS. While bootstrapping a node , we are getting following Exception in the 
logs:

ERROR [CompactionExecutor:81] 2016-05-11 13:48:54,694 CassandraDaemon.java 
(line 258) Exception in thread Thread[CompactionExecutor:81,1,main]
java.lang.RuntimeException: Last written key DecoratedKey(-2711050270696519088, 
623330333a313a35) >= current key DecoratedKey(-8631371593982690738, 
623437393a313a30) writing into /cassandra/data/main/myCF/myCF-tmp-jb-326-Data.db
at 
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:143)
at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:166)
at 
org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:167)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
INFO [CompactionExecutor:68] 2016-05-11 14:10:28,957 ColumnFamilyStore.java 
(line 795) Enqueuing flush of Memtable-compactions_in_progress@541886922(0/0 
serialized/live bytes, 1 ops)

Questions:

1.   Are these exceptions due to data corruption ?

2.   I am unable to reproduce the problem. How can I reproduce the 
Exception? Is there any specific case when such exceptions are raised?

3.   Without reproducing the Exception, how can I test the patch available 
at related JIRA : https://issues.apache.org/jira/browse/CASSANDRA-9935



Thanks,
Prakash Chauhan.









RE: Exception in logs using LCS .

2016-06-30 Thread Prakash Chauhan
Hi  Paulo,

Thanks for the reply. Running scrub every time the exception occurs is not a 
possible solution for us. We don’t have a log patrolling system in place that 
verify the log on regular basis.

I have some queries based on your reply:

1.   You mentioned race condition. Can you please elaborate a little more 
what actual race condition may be happening in this scenario ?

2.   What if this happens in production ? What would be the impact? Can we 
live with it?

3.   Test system where the problem was reported has been scrapped so we 
can’t run user defined compaction now. Any suggestions to reproduce it on a new 
system?



From: Paulo Motta [mailto:pauloricard...@gmail.com]
Sent: Tuesday, June 28, 2016 5:43 PM
To: user@cassandra.apache.org
Subject: Re: Exception in logs using LCS .

1. Not necessarily data corruption, but it seems compaction is trying to write 
data in the wrong order most likely due to a temporary race condition/bug a la 
#9935, but since the compaction fails your original data is probably safe (you 
can try running scrub to verify/fix corruptions).
2. This is pretty tricky to reproduce because it will depend on which sstables 
were picked for compaction at a particular instant, but you could try running a 
user-defined compaction or scrub on the sstables that contain this key, see 
CASSANDRA-11337and https://gist.github.com/jeromatron/e238e5795b3e79866b83
3. clone the cassandra repository of your current version, git-cherry-pick the 
commit of CASSANDRA-9935, ant jar, replace the cassandra jar with the generated 
SNAPSHOT.jar and restart the node`

2016-06-28 7:55 GMT-03:00 Prakash Chauhan 
<prakash.chau...@ericsson.com<mailto:prakash.chau...@ericsson.com>>:
Hello,

Recently we changed compaction strategy fora table to LCS from the default 
STCS. While bootstrapping a node , we are getting following Exception in the 
logs:

ERROR [CompactionExecutor:81] 2016-05-11 13:48:54,694 CassandraDaemon.java 
(line 258) Exception in thread Thread[CompactionExecutor:81,1,main]
java.lang.RuntimeException: Last written key DecoratedKey(-2711050270696519088, 
623330333a313a35) >= current key DecoratedKey(-8631371593982690738, 
623437393a313a30) writing into /cassandra/data/main/myCF/myCF-tmp-jb-326-Data.db
at 
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:143)
at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:166)
at 
org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:167)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
INFO [CompactionExecutor:68] 2016-05-11 14:10:28,957 ColumnFamilyStore.java 
(line 795) Enqueuing flush of Memtable-compactions_in_progress@541886922(0/0 
serialized/live bytes, 1 ops)

Questions:

1.   Are these exceptions due to data corruption ?

2.   I am unable to reproduce the problem. How can I reproduce the 
Exception? Is there any specific case when such exceptions are raised?

3.   Without reproducing the Exception, how can I test the patch available 
at related JIRA : https://issues.apache.org/jira/browse/CASSANDRA-9935



Thanks,
Prakash Chauhan.










Slowness in C* cluster after implementing multiple network interface configuration.

2017-05-22 Thread Prakash Chauhan
Hi All ,

Need Help !!!

Setup Details:
Cassandra 2.0.14
Geo Red setup

* DC1 - 3 nodes

* DC2 - 3 nodes


We were trying to implement multiple network interfaces with Cassandra 2.0.14
After doing all the steps mentioned in DataStax doc 
http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/configuration/configMultiNetworks.html,
 we observed that nodes were not able to see each other (checked using nodetool 
status).

To resolve this issue, we followed the 
comment<https://issues.apache.org/jira/browse/CASSANDRA-9748?focusedCommentId=14903515=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14903515>
 mentioned in the JIRA : 
CASSANDRA-9748<https://issues.apache.org/jira/browse/CASSANDRA-9748>

Exact steps that we followed are :


1.   Stop Cassandra

2.   Add rule to "iptables" to forward all packets on the public interface 
to the private interface.


COMMAND: # iptables -t nat -A PREROUTING -p tcp -m tcp -d  --dport 
7000 -j DNAT --to-destination :7000



3.   In Cassandra.yaml, add property "broadcast_address".

4.   In Cassandra.yaml, change "listen_address" to private IP.

5.   Clear the data from directory "peers".

6.   Change Snitch to GossipingPropertyFileSnitch.

7.   Append following property to the file 
"/etc/cassandra/conf/cassandra-env.sh" to purge gossip state.

JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false"



8.   Start Cassandra

9.   After node has been started, remove following property from the file 
"/etc/cassandra/conf/cassandra-env.sh" (previously added in step 7)

JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false"

10.   Delete file "/etc/cassandra/conf/cassandra-topology.properties"


Now We have an observation that after multiple restarts of Cassandra on 
multiple nodes, slowness is observed in the cluster.
The problem gets resolved when we revert the steps mentioned above.

Do u think there is any step that can cause the problem ?
We are suspecting Step 2(iptable rule) but not very sure about it.


Regards,
Prakash Chauhan.


RE: Slowness in C* cluster after implementing multiple network interface configuration.

2017-05-24 Thread Prakash Chauhan
Hi All,

We have a new observation.

Earlier for implementing multiple network interfaces, we were deleting 
cassandra-topologies.properties in the last step (Steps are mentioned in mail 
trail).
The rationale was that because we are using altogether a new endpoint_snitch , 
we don't require cassandra-topologies.properties file anymore.

Now we have observed that if we don't delete cassandra-topologies.properties, 
the slowness is not there in the cluster (Even with multiple restarts)

Is there some relationship between GossipingPropertyFileSnitch and 
cassandra-topologies.properties ?

As per my knowledge,  cassandra-topologies.properties file is only used as a 
fallback while doing snitch migration. If that's the case, why does Cassandra 
becomes slow with time ( and after doing multiple restarts ) after deleting 
cassandra-topologies.properties ?




Regards,
Prakash Chauhan.

From: Cogumelos Maravilha [mailto:cogumelosmaravi...@sapo.pt]
Sent: Wednesday, May 24, 2017 12:15 AM
To: user@cassandra.apache.org
Subject: Re: Slowness in C* cluster after implementing multiple network 
interface configuration.


Hi,

I never used version 2.0.x but I think port 7000 isn't enough.

Try enable:

7000 inter-node

7001 SSL inter-node

9042 CQL

9160 Thrift is enable in that version



And

In Cassandra.yaml, add property "broadcast_address".  = local ipv4

In Cassandra.yaml, change "listen_address" to private IP. = local ipv4



As a starting point.



Cheers.

On 22-05-2017 12:36, Prakash Chauhan wrote:
Hi All ,

Need Help !!!

Setup Details:
Cassandra 2.0.14
Geo Red setup

* DC1 - 3 nodes

* DC2 - 3 nodes


We were trying to implement multiple network interfaces with Cassandra 2.0.14
After doing all the steps mentioned in DataStax doc 
http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/configuration/configMultiNetworks.html,
 we observed that nodes were not able to see each other (checked using nodetool 
status).

To resolve this issue, we followed the 
comment<https://issues.apache.org/jira/browse/CASSANDRA-9748?focusedCommentId=14903515=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14903515>
 mentioned in the JIRA : 
CASSANDRA-9748<https://issues.apache.org/jira/browse/CASSANDRA-9748>

Exact steps that we followed are :


1.   Stop Cassandra

2.   Add rule to "iptables" to forward all packets on the public interface 
to the private interface.


COMMAND: # iptables -t nat -A PREROUTING -p tcp -m tcp -d  --dport 
7000 -j DNAT --to-destination :7000



3.   In Cassandra.yaml, add property "broadcast_address".

4.   In Cassandra.yaml, change "listen_address" to private IP.

5.   Clear the data from directory "peers".

6.   Change Snitch to GossipingPropertyFileSnitch.

7.   Append following property to the file 
"/etc/cassandra/conf/cassandra-env.sh" to purge gossip state.

JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false"



8.   Start Cassandra

9.   After node has been started, remove following property from the file 
"/etc/cassandra/conf/cassandra-env.sh" (previously added in step 7)

JVM_OPTS="$JVM_OPTS -Dcassandra.load_ring_state=false"

10.   Delete file "/etc/cassandra/conf/cassandra-topology.properties"


Now We have an observation that after multiple restarts of Cassandra on 
multiple nodes, slowness is observed in the cluster.
The problem gets resolved when we revert the steps mentioned above.

Do u think there is any step that can cause the problem ?
We are suspecting Step 2(iptable rule) but not very sure about it.


Regards,
Prakash Chauhan.