[ 
https://issues.apache.org/jira/browse/CASSANDRA-20011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17891425#comment-17891425
 ] 

Stefan Miklosovic edited comment on CASSANDRA-20011 at 10/21/24 7:21 AM:
-------------------------------------------------------------------------

[~mck] [~curlylrt] this is the most probably a duplicate / variation of 
CASSANDRA-18845

I welcome [~curlylrt] to create a formal patch for that on GitHub and further 
verify / improve on that work there.


was (Author: smiklosovic):
[~mck] [~curlylrt] this is the most probably a duplicate / variation of 
CASSANDRA-18845

> Gossip settled to early for new joining node leading to data loss
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-20011
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20011
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Runtian Liu
>            Priority: Normal
>
> Recently we found one issue with gossip settle to early leading to data loss 
> on the joining node.
> The new node joining the ring crashed a few times before it successfully join 
> the ring. When the new node joins the ring, it will try to wait for gossip to 
> settle, the issue we saw is that the gossip settled before it recognize the 
> entire cluster. This leads to the new node requesting ranges to the wrong 
> nodes and stream phase ended without getting any data because the requested 
> range on the target nodes are not the real owner of the token ranges.
> After checking the gossip settle code, I found that gossip may settle in 5 + 
> 3 = 8 seconds if the new node local gossip statemap size is not changing. 
> This may happen if the new node is busy with other gossip task and cannot 
> populate all nodes to its local gossip state map.
> Proposing a fix for this to add a env variable for bootstrapping node so that 
> we will also check the minimum number of nodes needed for a node to consider 
> gossip settle. PR will come later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to