Markus created CASSANDRA-9562:
---------------------------------

             Summary: Queries are not possible anymore after some time
                 Key: CASSANDRA-9562
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9562
             Project: Cassandra
          Issue Type: Bug
          Components: Core
            Reporter: Markus


We run cassandra 2.1.5 installation with 3 nodes on different debian wheezy 
VMs. Now we have the issue that after some days of operation without problems, 
suddenly selects and inserts to a table are no longer possible. In the logs we 
see the following error message:

com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout 
during read query at consistency QUORUM (2 responses were required but only 1 
replica responded)
at 
com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69)
When I execute nodetool status on each node, it shows me that all nodes are in 
Status=Up and State=Normal.

If I run nodetool repair on a node I see thousands of exceptions like:

2015-06-03 16:40:58,023 ERROR [AntiEntropySessions:17] RepairSession.java:303 - 
[repair #858c8470-09fe-11e5-930b-d16ee278cb3a] session completed with the 
following error
java.io.IOException: Failed during snapshot creation.
    at 
org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344)
 ~[apache-cassandra-2.1.5.jar:2.1.5]
    at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:146) 
~[apache-cassandra-2.1.5.jar:2.1.5]
    at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) 
~[guava-16.0.jar:na]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_45]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_45]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
in the log. To repair cassandra, I need to restart the cassandra daemon on 
every node and then run nodetool repair on every node (Which after the restart 
of the nodes works without throwing the exception). Then it works again for 2-3 
days until the same issue appears again.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to