[
https://issues.apache.org/jira/browse/CASSANDRA-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781937#action_12781937
]
Jaakko Laine commented on CASSANDRA-150:
----------------------------------------
Network partition may happen if (1) cluster size is at least four nodes, (2)
all nodes are seeds and (3) at least two nodes boot "simultaneously".
Gossiping cycle works as follows:
(i) gossip to random live node
(ii) gossip to random unreachable node
(iii) if the node gossiped to at (i) was not seed, gossip to random seed
Suppose there are four nodes in the cluster: nodeA, nodeB, nodeC and nodeD, all
of them seeds. Suppose they are all brought online at the same time. Following
event sequence leads to partition:
(1) nodeA comes online. No live nodes (and no unreachable either, of course),
so gossip to random seed. Let's suppose nodeA chooses nodeB. It sends nodeB
gossip.
(2) nodeB gets nodeA's gossip and marks it live. It sends its own gossip, and
since it has a live node (nodeA), it sends gossip according to gossip's first
rule. nodeA is seed, so no gossip is sent to random seed at (iii).
(3) nodeC comes online. It has not seen other live nodes yet, so it will gossip
to random seed. Let's suppose it chooses nodeD.
(4) nodeD comes online and sees nodeC's gossip. Since it now has a live node,
it will send nodeC gossip according to the first rule. Since nodeC is seed,
again no gossip is sent to random seed.
(there are other sequences as well, but basic idea is the same)
Now all nodes know of one live node, so they will always send gossip according
to the first rule. Since this node is seed, they will never send gossip to
random seed according to rule three. This will prevent them from finding rest
of the cluster. One non-seed node will break this loop, as gossip sent to it
will trigger gossip to random seed.
While investigating this, I noticed we might have caused some harm to
scalability of gossip mechanism when we added two new application states for
node movement. I'll fix this bug tomorrow when checking if there is a problem.
> multiple seeds (only when seed count = node count?) can cause cluster
> partition
> -------------------------------------------------------------------------------
>
> Key: CASSANDRA-150
> URL: https://issues.apache.org/jira/browse/CASSANDRA-150
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Jonathan Ellis
> Priority: Minor
>
> happens fairly frequently on my test cluster of 5 nodes. (i normally restart
> all nodes at once when updating the code. haven't tested w/ restarting one
> machine at a time.)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.