[jira] [Commented] (CASSANDRA-5525) Adding nodes to 1.2 cluster w/ vnodes streamed more data than average node load

Richard Low (JIRA) Tue, 30 Apr 2013 03:50:23 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645445#comment-13645445
 ]


Richard Low commented on CASSANDRA-5525:
----------------------------------------

Could you attach the output of 'nodetool ring' to list all the tokens?  Also 
what is your replication factor?

There is a balancing problem when adding new nodes without running shuffle (or 
decommissioning and bootstrapping each node).  When Cassandra increases the 
number of tokens from 1 to N (256 in your case), it splits the original ranges 
into N consecutive ranges.  This doesn't change where the data lives but does 
increase the number of tokens.

Cassandra knows that the adjacent tokens are on the same node so doesn't try to 
store replicas on the same node.  It looks for the next range on another node, 
just like how multi DC replication ensure replicas are in different data 
centers.

Now when a new node is added, it doesn't choose adjacent tokens, it has them 
spread randomly around the ring.  Just one of these small ranges could hold 
replicas for lots of data, because it becomes the next node in the ring.  For 
high enough replication factor and certain (quite likely) choices of tokens, a 
new node could end up storing 100% of the data.  This could explain what you 
are seeing, will need to see the token list and RF to confirm.
                
> Adding nodes to 1.2 cluster w/ vnodes streamed more data than average node 
> load
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5525
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5525
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: John Watson
>         Attachments: Screen Shot 2013-04-25 at 12.35.24 PM.png
>
>
> 12 node cluster upgraded from 1.1.9 to 1.2.3, enabled 'num_tokens: 256', 
> restarted and ran upgradesstables and cleanup.
> Tried to join 2 additional nodes into the ring.
> However, 1 of the new nodes ran out of disk space. This started causing 'no 
> host id' alerts in the live cluster when attempting to store hints for that 
> node.
> {noformat}
> ERROR 10:12:02,408 Exception in thread Thread[MutationStage:190,5,main]
> java.lang.AssertionError: Missing host ID 
> {noformat}
> The other node I killed to stop it from continuing to join. Since the live 
> cluster was now in some sort of broken state dropping mutation messages on 3 
> nodes. This was fixed by restarting them, however 1 node never stopped, so 
> had to decomm it (leaving the original cluster at 11 nodes.)
> Ring pre-join:
> {noformat}
> Load       Tokens  Owns (effective)  Host ID                             
> 147.55 GB  256     16.7%             754f9f4c-4ba7-4495-97e7-1f5b6755cb27
> 124.99 GB  256     16.7%             93f4400a-09d9-4ca0-b6a6-9bcca2427450
> 136.63 GB  256     16.7%             ff821e8e-b2ca-48a9-ac3f-8234b16329ce
> 141.78 GB  253     100.0%            339c474f-cf19-4ada-9a47-8b10912d5eb3
> 137.74 GB  256     16.7%             6d726cbf-147d-426e-a735-e14928c95e45
> 135.9 GB   256     16.7%             e59a02b3-8b91-4abd-990e-b3cb2a494950
> 165.96 GB  256     16.7%             83ca527c-60c5-4ea0-89a8-de53b92b99c8
> 135.41 GB  256     16.7%             c3ea4026-551b-4a14-a346-480e8c1fe283
> 143.38 GB  256     16.7%             df7ba879-74ad-400b-b371-91b45dcbed37
> 178.05 GB  256     25.0%             78192d73-be0b-4d49-a129-9bec0770efed
> 194.92 GB  256     25.0%             361d7e31-b155-4ce1-8890-451b3ddf46cf
> 150.5 GB   256     16.7%             9889280a-1433-439e-bb84-6b7e7f44d761
> {noformat}
> Ring after decomm bad node:
> {noformat}
> Load       Tokens  Owns (effective)  Host ID
> 80.95 GB   256     16.7%             754f9f4c-4ba7-4495-97e7-1f5b6755cb27
> 87.15 GB   256     16.7%             93f4400a-09d9-4ca0-b6a6-9bcca2427450
> 98.16 GB   256     16.7%             ff821e8e-b2ca-48a9-ac3f-8234b16329ce
> 142.6 GB   253     100.0%            339c474f-cf19-4ada-9a47-8b10912d5eb3
> 77.64 GB   256     16.7%             e59a02b3-8b91-4abd-990e-b3cb2a494950
> 194.31 GB  256     25.0%             6d726cbf-147d-426e-a735-e14928c95e45
> 221.94 GB  256     33.3%             83ca527c-60c5-4ea0-89a8-de53b92b99c8
> 87.61 GB   256     16.7%             c3ea4026-551b-4a14-a346-480e8c1fe283
> 101.02 GB  256     16.7%             df7ba879-74ad-400b-b371-91b45dcbed37
> 172.44 GB  256     25.0%             78192d73-be0b-4d49-a129-9bec0770efed
> 108.5 GB   256     16.7%             9889280a-1433-439e-bb84-6b7e7f44d761
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-5525) Adding nodes to 1.2 cluster w/ vnodes streamed more data than average node load

Reply via email to