[
https://issues.apache.org/jira/browse/CASSANDRA-6210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850912#comment-13850912
]
Yuki Morishita commented on CASSANDRA-6210:
-------------------------------------------
I followed the steps bellow:
{code}
Node 1 up dc1
Stress
Node 2 up dc2
Alter keyspace
repair on node1
{code}
And with auto_bootstrap: false, I got the following and repair hung:
{code}
ERROR [AntiEntropyStage:1] 2013-12-17 15:03:08,945 CassandraDaemon.java (line
187) Exception in thread Thread[AntiEntropyStage:1,5,main]
java.lang.AssertionError: Unknown keyspace Keyspace1
at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:262)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
at
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:46)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{code}
We should 'catch-all' in RepairVerbHandler to prevent hang at least.
It was not the same exception.
[~rspitzer], can you reproduce with
'log4j.logger.org.apache.cassandra.streaming=DEBUG' in your
log4j-server.properties and attach the log here?
> Repair hangs when a new datacenter is added to a cluster
> --------------------------------------------------------
>
> Key: CASSANDRA-6210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6210
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Amazon Ec2
> 2 M1.large nodes
> Reporter: Russell Alexander Spitzer
> Assignee: Yuki Morishita
>
> Attempting to add a new datacenter to a cluster seems to cause repair
> operations to break. I've been reproducing this with 20~ node clusters but
> can get it to reliably occur on 2 node setups.
> {code}
> ##Basic Steps to reproduce
> #Node 1 is started using GossipingPropertyFileSnitch as dc1
> #Cassandra-stress is used to insert a minimal amount of data
> $CASSANDRA_STRESS -t 100 -R
> org.apache.cassandra.locator.NetworkTopologyStrategy --num-keys=1000
> --columns=10 --consistency-level=LOCAL_QUORUM --average-size-values -
> -compaction-strategy='LeveledCompactionStrategy' -O dc1:1
> --operation=COUNTER_ADD
> #Alter "Keyspace1"
> ALTER KEYSPACE "Keyspace1" WITH replication = {'class':
> 'NetworkTopologyStrategy', 'dc1': 1 , 'dc2': 1 };
> #Add node 2 using GossipingPropertyFileSnitch as dc2
> run repair on node 1
> run repair on node 2
> {code}
> The repair task on node 1 never completes and while there are no exceptions
> in the logs of node1, netstat reports the following repair tasks
> {code}
> Mode: NORMAL
> Repair 4e71a250-36b4-11e3-bedc-1d1bb5c9abab
> Repair 6c64ded0-36b4-11e3-bedc-1d1bb5c9abab
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool Name Active Pending Completed
> Commands n/a 0 10239
> Responses n/a 0 3839
> {code}
> Checking on node 2 we see the following exceptions
> {code}
> ERROR [STREAM-IN-/10.171.122.130] 2013-10-16 22:42:58,961 StreamSession.java
> (line 410) [Stream #4e71a250-36b4-11e3-bedc-1d1bb5c9abab] Streaming error
> occurred
> java.lang.NullPointerException
> at
> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:174)
> at
> org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436)
> at
> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:358)
> at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:293)
> at java.lang.Thread.run(Thread.java:724)
> ...
> ERROR [STREAM-IN-/10.171.122.130] 2013-10-16 22:43:49,214 StreamSession.java
> (line 410) [Stream #6c64ded0-36b4-11e3-bedc-1d1bb5c9abab] Streaming error
> occurred
> java.lang.NullPointerException
> at
> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:174)
> at
> org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436)
> at
> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:358)
> at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:293)
> at java.lang.Thread.run(Thread.java:724)
> {code}
> Netstats on node 2 reports
> {code}
> automaton@ip-10-171-15-234:~$ nodetool netstats
> Mode: NORMAL
> Repair 4e71a250-36b4-11e3-bedc-1d1bb5c9abab
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool Name Active Pending Completed
> Commands n/a 0 2562
> Responses n/a 0 4284
> {code}
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)