[
https://issues.apache.org/jira/browse/CASSANDRA-572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jaakko Laine updated CASSANDRA-572:
-----------------------------------
Attachment: use-same-APstate-for-all-node-state-gossip.patch
OK, here's a patch that uses same state name (NODE_STATE) to gossip all
movement information. Format is (BOOTSTRAPPING|NORMAL|LEAVING|LEFT)|token.
The main things caused by this modification to the state machine were:
(1) When a node is bootstrapping, we should clear pending ranges for this
endpoint, as well as remove it from token metadata. These checks are not
strictly necessary (I think), but are there to help transition from LEAVING ->
BOOTSTRAPPING in case we missed LEFT due to network partition.
(2) For handleStateLeaving and handleStateLeft remove pending ranges for this
endpoint before doing anything else. If we missed NORMAL, there might be
obsolete pending ranges from BOOTSTRAP. Distant possibility, but possibility
nonetheless.
Following additional check is not directly related to gossip format change and
could happen even using the current model. This is a very unlikely event, but
in a large (say, 200+ nodes) multi-DC cluster with lots of node movement, this
could very well happen even with relatively short DC-to-DC network outage:
(1) Added a check to handleStateLeaving and handleStateLeft for the case that a
node has made NORMAL -> LEAVING -> LEFT -> BOOTSTRAP -> NORMAL -> LEAVING
[->LEFT] movement cycle without us seeing the intermediate stages. In this case
we have information for the old token and now the node is leaving _new_ token.
We cannot simply assert this, as it is possible this happens.
Now of course this already touches the subject what conditions we must take
care of and what should be left to operators to handle. Some of them (like
removing all references to the endpoint before continuing to handle
bootstrapping) are questionable and might relax safety precautions, but if we
do not do that, a modest 30s network outage might cause us not to see
STATE_LEFT and we'd end up having strange pending ranges.
I don't expect this patch to be included as it is, but let's see what people
think of this gossip change and then discuss what checks should be made :)
> handle old gossip properly
> --------------------------
>
> Key: CASSANDRA-572
> URL: https://issues.apache.org/jira/browse/CASSANDRA-572
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.5
> Reporter: Jaakko Laine
> Fix For: 0.5
>
> Attachments: 572-handle-old-gossip.patch,
> use-same-APstate-for-all-node-state-gossip.patch
>
>
> (1) If a node has been moving in the ring, further bootstraps by other nodes
> will cause errors as they are handling STATE_LEAVING gossip without having
> such member in token metadata.
> (2) When a node bootstraps, it handles all ep states in the order they happen
> to arrive. If the first one to arrive has moved in the past (that is, it has
> STATE_LEAVING in its ep state), getNaturalEndpoint will throw
> ArrayIndexOutOfBounds exception as sortedTokens.size() == 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.