[
https://issues.apache.org/jira/browse/CASSANDRA-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847459#comment-17847459
]
Jon Haddad commented on CASSANDRA-19644:
----------------------------------------
Ah. I didn't see CASSANDRA-16364. My preferred solution is different than
what's in there, I'll drop my comment on that one and close this out.
> deterministic token allocation combined with slow gossip propogation can lead
> to data loss
> ------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-19644
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19644
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jon Haddad
> Priority: Normal
>
> I've seen several cases now where starting nodes within a somewhat short time
> window (about a minute) when using the default allocation tokens for RF leads
> to token conflicts. Unfortunately this can easily go undetected with medium
> to large clusters.
> When this happens, different nodes in the cluster will have different
> understandings of the topology of the cluster. I've seen this go unnoticed
> in a production environment for several months, leading to data loss, data
> resurrection, and other odd behavior.
> We should apply some randomness to the tokens to ensure that even in the case
> of 1 nodes starting at once, it's still unlikely that they will ever have a
> conflict. Applying a random() value to the token value between - 2^8 and 2^8
> makes this statistically very, very unlikely that we'll ever have a collision
> while also preserving the balance of token distribution in the ring. In the
> case of 2 nodes starting at the same time, the operator will have weird token
> distribution instead of data loss.
>
> {noformat}
> INFO [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 -
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token
> -1938510198161598815. /10.0.2.134:7000 is the new owner
> INFO [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 -
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token
> -3478858378222500629. /10.0.2.134:7000 is the new owner
> INFO [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 -
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token
> 3562748272064835315. /10.0.2.134:7000 is the new owner
> INFO [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 -
> Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token
> 8085185010613503278. /10.0.2.134:7000 is the new owner{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]