[jira] Commented: (CASSANDRA-150) multiple seeds (only when seed count = node count?) can cause cluster partition

Jaakko Laine (JIRA) Wed, 25 Nov 2009 17:34:06 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782713#action_12782713
 ]


Jaakko Laine commented on CASSANDRA-150:
----------------------------------------

This kind of partition cannot happen if there are less than four seeds or there 
is even a single non-seed node. There must be enough seeds to form at least two 
separate network closures of at least two seeds each. If there has been a 
problem with 3/4 cluster, it must be different from this as there are two 
preconditions that are not met.

Gossip rule #1 sends gossip to a live node and rule #3 sends to a random seed 
if the node in #1 was not seed. If there is even a single non-seed node, it 
will trigger gossip to a random seed every time gossip is sent to it. 
Eventually this will break the network closures. What the patch basically does 
is it aggressively searches for seeds as long as it has found at least as many 
nodes as there are seeds. It does not matter even if this does not include all 
seeds, as that means there are non-seeds in liveEndpoints, which triggers 
search for random seed every time gossip is sent to it. So basically this is 
just to help Gossiper to get started, not to find all seeds. Whether it finds 
all seeds or at least one non-seed does not matter, it can continue from there.

Now of course the "correct" checks for this condition would be to on each 
gossip round check (1) whether liveEndpoints and unreachableEndpoints include 
all seeds or (2) if liveEndpoints includes at least one non-seed. However, 
putting these checks on the normal execution path only for the sake of one 
special case does not appeal to me, so decided to add this simple check instead.

Now that I think of it, there is one extremely special case that still could 
cause a partition: cluster of 4 seeds and 2 non-seeds. First 2 seeds and 2 
non-seeds come online -> everybody is happy as cluster size is the same as 
number of seeds. Now both seeds go down, and then the other two seeds come up. 
Again everybody is happy. Now suppose the two non-seeds go down, and after that 
the two original seeds come up simultaneously, and happen to choose each other 
from the list of random seeds. In this case all seeds will send gossip only to 
the other seed, as they have 2 nodes in unreachableEndpoint, which makes the 
total number of seen nodes equal number of seeds. To avoid this, we might relax 
the condition a bit and send gossip to a seed if number of liveEndpoints is 
less than seeds (that is, ignore unreachableEndpoints). This modification would 
take care of the scenario above, but don't know if it is worth the trouble. If 
either of the non-seeds recovers (or one of the seeds goes down), this deadlock 
will be broken.


> multiple seeds (only when seed count = node count?) can cause cluster 
> partition
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-150
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-150
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>         Attachments: 150.patch
>
>
> happens fairly frequently on my test cluster of 5 nodes.  (i normally restart 
> all nodes at once when updating the code.  haven't tested w/ restarting one 
> machine at a time.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-150) multiple seeds (only when seed count = node count?) can cause cluster partition

Reply via email to