[
https://issues.apache.org/jira/browse/CASSANDRA-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645445#comment-13645445
]
Richard Low commented on CASSANDRA-5525:
----------------------------------------
Could you attach the output of 'nodetool ring' to list all the tokens? Also
what is your replication factor?
There is a balancing problem when adding new nodes without running shuffle (or
decommissioning and bootstrapping each node). When Cassandra increases the
number of tokens from 1 to N (256 in your case), it splits the original ranges
into N consecutive ranges. This doesn't change where the data lives but does
increase the number of tokens.
Cassandra knows that the adjacent tokens are on the same node so doesn't try to
store replicas on the same node. It looks for the next range on another node,
just like how multi DC replication ensure replicas are in different data
centers.
Now when a new node is added, it doesn't choose adjacent tokens, it has them
spread randomly around the ring. Just one of these small ranges could hold
replicas for lots of data, because it becomes the next node in the ring. For
high enough replication factor and certain (quite likely) choices of tokens, a
new node could end up storing 100% of the data. This could explain what you
are seeing, will need to see the token list and RF to confirm.
> Adding nodes to 1.2 cluster w/ vnodes streamed more data than average node
> load
> -------------------------------------------------------------------------------
>
> Key: CASSANDRA-5525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5525
> Project: Cassandra
> Issue Type: Bug
> Reporter: John Watson
> Attachments: Screen Shot 2013-04-25 at 12.35.24 PM.png
>
>
> 12 node cluster upgraded from 1.1.9 to 1.2.3, enabled 'num_tokens: 256',
> restarted and ran upgradesstables and cleanup.
> Tried to join 2 additional nodes into the ring.
> However, 1 of the new nodes ran out of disk space. This started causing 'no
> host id' alerts in the live cluster when attempting to store hints for that
> node.
> {noformat}
> ERROR 10:12:02,408 Exception in thread Thread[MutationStage:190,5,main]
> java.lang.AssertionError: Missing host ID
> {noformat}
> The other node I killed to stop it from continuing to join. Since the live
> cluster was now in some sort of broken state dropping mutation messages on 3
> nodes. This was fixed by restarting them, however 1 node never stopped, so
> had to decomm it (leaving the original cluster at 11 nodes.)
> Ring pre-join:
> {noformat}
> Load Tokens Owns (effective) Host ID
> 147.55 GB 256 16.7% 754f9f4c-4ba7-4495-97e7-1f5b6755cb27
> 124.99 GB 256 16.7% 93f4400a-09d9-4ca0-b6a6-9bcca2427450
> 136.63 GB 256 16.7% ff821e8e-b2ca-48a9-ac3f-8234b16329ce
> 141.78 GB 253 100.0% 339c474f-cf19-4ada-9a47-8b10912d5eb3
> 137.74 GB 256 16.7% 6d726cbf-147d-426e-a735-e14928c95e45
> 135.9 GB 256 16.7% e59a02b3-8b91-4abd-990e-b3cb2a494950
> 165.96 GB 256 16.7% 83ca527c-60c5-4ea0-89a8-de53b92b99c8
> 135.41 GB 256 16.7% c3ea4026-551b-4a14-a346-480e8c1fe283
> 143.38 GB 256 16.7% df7ba879-74ad-400b-b371-91b45dcbed37
> 178.05 GB 256 25.0% 78192d73-be0b-4d49-a129-9bec0770efed
> 194.92 GB 256 25.0% 361d7e31-b155-4ce1-8890-451b3ddf46cf
> 150.5 GB 256 16.7% 9889280a-1433-439e-bb84-6b7e7f44d761
> {noformat}
> Ring after decomm bad node:
> {noformat}
> Load Tokens Owns (effective) Host ID
> 80.95 GB 256 16.7% 754f9f4c-4ba7-4495-97e7-1f5b6755cb27
> 87.15 GB 256 16.7% 93f4400a-09d9-4ca0-b6a6-9bcca2427450
> 98.16 GB 256 16.7% ff821e8e-b2ca-48a9-ac3f-8234b16329ce
> 142.6 GB 253 100.0% 339c474f-cf19-4ada-9a47-8b10912d5eb3
> 77.64 GB 256 16.7% e59a02b3-8b91-4abd-990e-b3cb2a494950
> 194.31 GB 256 25.0% 6d726cbf-147d-426e-a735-e14928c95e45
> 221.94 GB 256 33.3% 83ca527c-60c5-4ea0-89a8-de53b92b99c8
> 87.61 GB 256 16.7% c3ea4026-551b-4a14-a346-480e8c1fe283
> 101.02 GB 256 16.7% df7ba879-74ad-400b-b371-91b45dcbed37
> 172.44 GB 256 25.0% 78192d73-be0b-4d49-a129-9bec0770efed
> 108.5 GB 256 16.7% 9889280a-1433-439e-bb84-6b7e7f44d761
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira