[
https://issues.apache.org/jira/browse/CASSANDRA-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782713#action_12782713
]
Jaakko Laine commented on CASSANDRA-150:
----------------------------------------
This kind of partition cannot happen if there are less than four seeds or there
is even a single non-seed node. There must be enough seeds to form at least two
separate network closures of at least two seeds each. If there has been a
problem with 3/4 cluster, it must be different from this as there are two
preconditions that are not met.
Gossip rule #1 sends gossip to a live node and rule #3 sends to a random seed
if the node in #1 was not seed. If there is even a single non-seed node, it
will trigger gossip to a random seed every time gossip is sent to it.
Eventually this will break the network closures. What the patch basically does
is it aggressively searches for seeds as long as it has found at least as many
nodes as there are seeds. It does not matter even if this does not include all
seeds, as that means there are non-seeds in liveEndpoints, which triggers
search for random seed every time gossip is sent to it. So basically this is
just to help Gossiper to get started, not to find all seeds. Whether it finds
all seeds or at least one non-seed does not matter, it can continue from there.
Now of course the "correct" checks for this condition would be to on each
gossip round check (1) whether liveEndpoints and unreachableEndpoints include
all seeds or (2) if liveEndpoints includes at least one non-seed. However,
putting these checks on the normal execution path only for the sake of one
special case does not appeal to me, so decided to add this simple check instead.
Now that I think of it, there is one extremely special case that still could
cause a partition: cluster of 4 seeds and 2 non-seeds. First 2 seeds and 2
non-seeds come online -> everybody is happy as cluster size is the same as
number of seeds. Now both seeds go down, and then the other two seeds come up.
Again everybody is happy. Now suppose the two non-seeds go down, and after that
the two original seeds come up simultaneously, and happen to choose each other
from the list of random seeds. In this case all seeds will send gossip only to
the other seed, as they have 2 nodes in unreachableEndpoint, which makes the
total number of seen nodes equal number of seeds. To avoid this, we might relax
the condition a bit and send gossip to a seed if number of liveEndpoints is
less than seeds (that is, ignore unreachableEndpoints). This modification would
take care of the scenario above, but don't know if it is worth the trouble. If
either of the non-seeds recovers (or one of the seeds goes down), this deadlock
will be broken.
> multiple seeds (only when seed count = node count?) can cause cluster
> partition
> -------------------------------------------------------------------------------
>
> Key: CASSANDRA-150
> URL: https://issues.apache.org/jira/browse/CASSANDRA-150
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Jonathan Ellis
> Priority: Minor
> Attachments: 150.patch
>
>
> happens fairly frequently on my test cluster of 5 nodes. (i normally restart
> all nodes at once when updating the code. haven't tested w/ restarting one
> machine at a time.)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.