Jon Haddad created CASSANDRA-19644:
--------------------------------------
Summary: deterministic token allocation combined with slow gossip
propogation can lead to data loss
Key: CASSANDRA-19644
URL: https://issues.apache.org/jira/browse/CASSANDRA-19644
Project: Cassandra
Issue Type: Bug
Reporter: Jon Haddad
I've seen several cases now where starting nodes within a somewhat short time
window (about a minute) when using the default allocation tokens for RF leads
to token conflicts. Unfortunately this can easily go undetected with medium to
large clusters.
When this happens, different nodes in the cluster will have different
understandings of the topology of the cluster. I've seen this go unnoticed in
a production environment for several months, leading to data loss, data
resurrection, and other odd behavior.
We should apply some randomness to the tokens to ensure that even in the case
of 1 nodes starting at once, it's still unlikely that they will ever have a
conflict. Applying a random() value to the token value between - 2^8 and 2^8
makes this statistically very, very unlikely that we'll ever have a collision
while also preserving the balance of token distribution in the ring. In the
case of 2 nodes starting at the same time, the operator will have weird token
distribution instead of data loss.
{noformat}
INFO [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - Nodes
/10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token
-1938510198161598815. /10.0.2.134:7000 is the new owner
INFO [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - Nodes
/10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token
-3478858378222500629. /10.0.2.134:7000 is the new owner
INFO [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - Nodes
/10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token
3562748272064835315. /10.0.2.134:7000 is the new owner
INFO [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - Nodes
/10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token
8085185010613503278. /10.0.2.134:7000 is the new owner{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]