If node join fails process should recover or terminate
------------------------------------------------------

                 Key: CASSANDRA-1149
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1149
             Project: Cassandra
          Issue Type: Improvement
    Affects Versions: 0.6.1
            Reporter: Edward Capriolo


Being pro-active is great, but at times joining a node needs to be done when a 
cassandra cluster is overtaxed. A variety of (bad) things happen in this 
situation.

Scenario 1: NodeB joins cluster attempts to get TokenRange from NodeA. NodeA 
fails actually or high load causes the gossip of NodeB to detact NodeA as 
failed. NodeA will stay in bootstrap mode permanently.  

Scenario 2: NodeB joins cluster attempts to get TokenRange from NodeA. Neither 
node will fail noticable but a stream will stall. NodeA will stay in bootstrap 
mode permanently. 

Suggested feature wanted:

1. NodeB should give up and shutdown if streams fail. Currently user starts a 
streaming process and returns hours later, because no one is going to sit and 
watch. If I come back in a day and NodeB is down I know it failed I can try 
again. Currently I look at the cpu, streams on both nodes. Determine if the 
source node is compacting. Wait a while run streams again. No progress, restart.

2. Source node does not have the same (relevant) stream list as you do. In this 
case NodeA probably restarted. NodeB should restart bootstrap or terminate 

3. No progress on streams . If streams are not progressing and Node A is not 
compacting/anti-compacting. NodeB should give up

4. A possible solution would be to give each transfer a UUID, and if A dies, 
then B will restart that session if A hasn't heard of the uuid

It would be great if long running multi-step processes like a move could 
restart after failures, automatically without returning to the beginning of the 
operation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to