[
https://issues.apache.org/jira/browse/CASSANDRA-6210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798282#comment-13798282
]
Brandon Williams commented on CASSANDRA-6210:
---------------------------------------------
Can't repro on the 2.0 branch with these steps.
bq. Moving the alter of the keyspace to after bringing up the second Node fixes
this.
That's because there's nothing to repair without the second replica defined.
> Repair hangs when a new datacenter is added to a cluster
> --------------------------------------------------------
>
> Key: CASSANDRA-6210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6210
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Amazon Ec2
> 2 M1.large nodes
> Reporter: Russell Alexander Spitzer
> Assignee: Yuki Morishita
>
> Attempting to add a new datacenter to a cluster seems to cause repair
> operations to break. I've been reproducing this with 20~ node clusters but
> can get it to reliably occur on 2 node setups.
> {code}
> ##Basic Steps to reproduce
> #Node 1 is started using GossipingPropertyFileSnitch as dc1
> #Cassandra-stress is used to insert a minimal amount of data
> $CASSANDRA_STRESS -t 100 -R
> org.apache.cassandra.locator.NetworkTopologyStrategy --num-keys=1000
> --columns=10 --consistency-level=LOCAL_QUORUM --average-size-values -
> -compaction-strategy='LeveledCompactionStrategy' -O dc1:1
> --operation=COUNTER_ADD
> #Alter "Keyspace1"
> ALTER KEYSPACE "Keyspace1" WITH replication = {'class':
> 'NetworkTopologyStrategy', 'dc1': 1 , 'dc2': 1 };
> #Add node 2 using GossipingPropertyFileSnitch as dc2
> run repair on node 1
> run repair on node 2
> {code}
> The repair task on node 1 never completes and while there are no exceptions
> in the logs of node1, netstat reports the following repair tasks
> {code}
> Mode: NORMAL
> Repair 4e71a250-36b4-11e3-bedc-1d1bb5c9abab
> Repair 6c64ded0-36b4-11e3-bedc-1d1bb5c9abab
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool Name Active Pending Completed
> Commands n/a 0 10239
> Responses n/a 0 3839
> {code}
> Checking on node 2 we see the following exceptions
> {code}
> ERROR [STREAM-IN-/10.171.122.130] 2013-10-16 22:42:58,961 StreamSession.java
> (line 410) [Stream #4e71a250-36b4-11e3-bedc-1d1bb5c9abab] Streaming error
> occurred
> java.lang.NullPointerException
> at
> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:174)
> at
> org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436)
> at
> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:358)
> at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:293)
> at java.lang.Thread.run(Thread.java:724)
> ...
> ERROR [STREAM-IN-/10.171.122.130] 2013-10-16 22:43:49,214 StreamSession.java
> (line 410) [Stream #6c64ded0-36b4-11e3-bedc-1d1bb5c9abab] Streaming error
> occurred
> java.lang.NullPointerException
> at
> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:174)
> at
> org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:436)
> at
> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:358)
> at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:293)
> at java.lang.Thread.run(Thread.java:724)
> {code}
> Netstats on node 2 reports
> {code}
> automaton@ip-10-171-15-234:~$ nodetool netstats
> Mode: NORMAL
> Repair 4e71a250-36b4-11e3-bedc-1d1bb5c9abab
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool Name Active Pending Completed
> Commands n/a 0 2562
> Responses n/a 0 4284
> {code}
--
This message was sent by Atlassian JIRA
(v6.1#6144)