Re: fixing unbalanced cluster !?

Jonathan Colby Thu, 09 Jun 2011 08:20:48 -0700

Thanks Ben.   That's what I was afraid I had to do.  I can see how it's a lot 
easier if you simply double the cluster when adding capacity.


Jon
 
On Jun 9, 2011, at 4:44 PM, Benjamin Coverston wrote:

> Because you were able to successfully run repair you can follow up with a 
> nodetool cleanup which will git rid of some of the extraneous data on that 
> (bigger) node. You're also assured after you run repair that entropy beteen 
> the nodes is minimal.
> 
> Assuming you're using the random ordered partitioner: To balance your ring I 
> would start by calculating the new token locations, then moving each of your 
> nodes backwards along their owned range to their new locations.
> 
> From the script on http://wiki.apache.org/cassandra/Operations your new 
> balanced tokens would be:
> 
> 0
> 21267647932558653966460912964485513216
> 42535295865117307932921825928971026432
> 63802943797675961899382738893456539648
> 85070591730234615865843651857942052864
> 106338239662793269832304564822427566080
> 127605887595351923798765477786913079296
> 148873535527910577765226390751398592512
> 
> From this you can see that  10.46.108.{100, 101} is already in the right 
> place so you don't have to do anything with those nodes. Proceed with moving 
> 10.46.108.104 to its new token, the safest way to do this would be to use 
> nodetool move. Another way to do it could be to run a remove-token followed 
> by re-adding the node into the ring at its new location. The risk here is 
> that if you do not at least repair after re-joining the ring (and before you 
> move the next node in the ring) then some of the data on that node would be 
> ignored as it would now fall out of the owned range, so it's good practice to 
> immediately run repair on a node that you do a removetoken / re-join on.
> 
> The rest of your balancing should be an iteration on the above steps moving 
> through the range.
> 
> 
> On 6/9/11 6:21 AM, Jonathan Colby wrote:
>> I got myself into a situation where one node (10.47.108.100) has a lot more 
>> data than the other nodes.   In fact, the 1 TB disk on this node is almost 
>> full.  I added 3 new nodes and let cassandra automatically calculate new 
>> tokens by taking the highest loaded nodes.  Unfortunately there is still a 
>> big token range this  node is responsible for (5113... -  85070...).  Yes, I 
>> know that one option would be to rebalance the entire cluster with move but 
>> this is an extremely time-consuming and error-prone process because of the 
>> amount of data involved.
>> 
>> Our RF = 3 and we read/write quorum.   The nodes have been repaired so I 
>> think the data should be in good shape.
>> 
>> Question:    Can I get myself out of this mess without installing new nodes? 
>>    I was thinking of either decommission or removetoken to have the cluster 
>> "rebalance itself".  The re-bootstrap this node to a new token.
>> 
>> 
>> Address         Status State   Load            Owns    Token
>>                                                        
>> 127605887595351923798765477786913079296
>> 10.46.108.100   Up     Normal  218.52 GB       25.00%  0
>> 10.46.108.101   Up     Normal  260.04 GB       12.50%  
>> 21267647932558653966460912964485513216
>> 10.46.108.104   Up     Normal  286.79 GB       17.56%  
>> 51138582157040063602728874106478613120
>> 10.47.108.100   Up     Normal  874.91 GB       19.94%  
>> 85070591730234615865843651857942052863
>> 10.47.108.102   Up     Normal  302.79 GB       4.16%   
>> 92156241323118845370666296304459139297
>> 10.47.108.103   Up     Normal  242.02 GB       4.16%   
>> 99241191538897700272878550821956884116
>> 10.47.108.101   Up     Normal  439.9 GB        8.34%   
>> 113427455640312821154458202477256070484
>> 10.46.108.103   Up     Normal  304 GB          8.33%   
>> 127605887595351923798765477786913079296
> 
> -- 
> Ben Coverston
> Director of Operations
> DataStax -- The Apache Cassandra Company
> http://www.datastax.com/
>

Re: fixing unbalanced cluster !?

Reply via email to