[ 
https://issues.apache.org/jira/browse/CASSANDRA-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-515:
-------------------------------------

    Description: 
Easy way to reproduce:

Start node A.
Start node B, with autobootstrap=false.
Kill B, wipe data dir, and restart (still w/ autobootstrap=false).

A will show B as down, with its old token.  (B will see both nodes correctly.)

This appears to be because when you wipe data dir, generation restarts at 1.  
(This is not just operator error; besides during testing, this could arise if a 
node dies completely and has to be replaced.)  Then gossip state is ignored 
until the new heartbeat is larger than the one previously reached.

It appears that initializing the generation to seconds-since-epoch would fix 
this.

  was:
Easy way to reproduce:

Start node A.
Start node B, with autobootstrap=false.
Kill B, wipe data dir, and restart (still w/ autobootstrap=false).

A will show B as down, with its old token.  (B will see both nodes correctly.)

This appears to be because when you wipe data dir, generation restarts at 1.  
(This is not just operator error; besides during testing, this could arise if a 
node dies completely and has to be replaced.)  Then gossip state is ignored 
until the new heartbeat is larger than the old one reached.

It appears that initializing the generation to seconds-since-epoch would fix 
this.


> Gossiper misses first updates when restarting a node
> ----------------------------------------------------
>
>                 Key: CASSANDRA-515
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-515
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: 515.patch
>
>
> Easy way to reproduce:
> Start node A.
> Start node B, with autobootstrap=false.
> Kill B, wipe data dir, and restart (still w/ autobootstrap=false).
> A will show B as down, with its old token.  (B will see both nodes correctly.)
> This appears to be because when you wipe data dir, generation restarts at 1.  
> (This is not just operator error; besides during testing, this could arise if 
> a node dies completely and has to be replaced.)  Then gossip state is ignored 
> until the new heartbeat is larger than the one previously reached.
> It appears that initializing the generation to seconds-since-epoch would fix 
> this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to