[jira] [Commented] (CASSANDRA-10288) Inconsistent behaviours on repair when a node in RF is missing

Yuki Morishita (JIRA) Tue, 08 Sep 2015 14:46:07 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735678#comment-14735678
 ]


Yuki Morishita commented on CASSANDRA-10288:
--------------------------------------------

It's actually the same behavior among versions (2.1 with -inc -par is the same 
as others).
Using incremental repair with nodes less than RF can hang, since coordinator 
sends prepare messages to all nodes (dead or alive) and wait for response.

I will fix this by checking live nodes beforehand.

> Inconsistent behaviours on repair when a node in RF is missing
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-10288
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10288
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Alan Boudreault
>            Assignee: Yuki Morishita
>         Attachments: repait_test.sh
>
>
> So with a cluster of 3 nodes and a RF=3 for my keyspace, I tried to repair my 
> data with a single node down. I got 3 different behaviours with different C* 
> versions. With:
> cassandra-2.1: it fails saying a node is down. (acceptable)
> cassandra-2.2: it hangs forever (???)
> cassandra-3.0: it completes successfully
> What is the correct behaviour of this repair use case? Obviously, 
> cassandra-2.2 has to be fixed, too.
> Here are the result logs when testing:
> cassandra-2.1
> {code}
> ccmlib.node.NodetoolError: Nodetool command 
> '/home/aboudreault/git/cstar/cassandra/bin/nodetool -h localhost -p 7100 
> repair test test' failed; exit status: 2; stdout: [2015-09-08 16:32:24,488] 
> Starting repair command #3, repairing 3 ranges for keyspace test 
> (parallelism=SEQUENTIAL, full=true)
> [2015-09-08 16:32:24,492] Repair session b69b5990-5668-11e5-b4ae-b3ffbc47f04c 
> for range (3074457345618258602,-9223372036854775808] failed with error 
> java.io.IOException: Cannot proceed on repair because a neighbor (/127.0.0.2) 
> is dead: session failed
> [2015-09-08 16:32:24,493] Repair session b69b80a0-5668-11e5-b4ae-b3ffbc47f04c 
> for range (-9223372036854775808,-3074457345618258603] failed with error 
> java.io.IOException: Cannot proceed on repair because a neighbor (/127.0.0.2) 
> is dead: session failed
> [2015-09-08 16:32:24,494] Repair session b69ba7b0-5668-11e5-b4ae-b3ffbc47f04c 
> for range (-3074457345618258603,3074457345618258602] failed with error 
> java.io.IOException: Cannot proceed on repair because a neighbor (/127.0.0.2) 
> is dead: session failed
> [2015-09-08 16:32:24,494] Repair command #3 finished
> ; stderr: error: nodetool failed, check server logs
> -- StackTrace --
> java.lang.RuntimeException: nodetool failed, check server logs
>         at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:291)
>         at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:203)
> {code}
> cassandra-2.2:
> {code}
> just hangs .... waited more than 10 minutes.
> {code}
> cassandra-3.0:
> {code}
> $ ccm node1 nodetool repair test test
> [2015-09-08 16:39:40,139] Starting repair command #1, repairing keyspace test 
> with repair options (parallelism: parallel, primary range: false, 
> incremental: true, job threads: 1, ColumnFamilies: [test], dataCenters: [], 
> hosts: [], # of ranges: 2)
> [2015-09-08 16:39:40,241] Repair session ba4a1440-5669-11e5-bc8e-b3ffbc47f04c 
> for range [(3074457345618258602,-9223372036854775808], 
> (-9223372036854775808,3074457345618258602]] finished (progress: 80%)
> [2015-09-08 16:39:40,267] Repair completed successfully
> [2015-09-08 16:39:40,270] Repair command #1 finished in 0 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10288) Inconsistent behaviours on repair when a node in RF is missing

Reply via email to