Re: [Gluster-users] gluster rebalance taking multiple days

Craig Carl Tue, 07 Dec 2010 22:30:36 -0800

All -

It is possible to calculate in advance the number of files that willbe moved by a re-balance. By testing performance in advance with somesmall rsyncs, and the formula below you should be able to get anaccurate estimate of the time it will take. Starting in Gluster 3.1 itis possible to stop a re-balance, then restart it where it left off, see -


volume rebalance <VOLNAME> start - start rebalance of volume <VOLNAME>
volume rebalance <VOLNAME> stop - stop rebalance of volume <VOLNAME>
volume rebalance <VOLNAME> status - rebalance status of volume <VOLNAME>

/Basic Assumptions:- Distribute equally distributes all the filesacross all the nodes :O

Existing nodes in the cluster are a set of "N" nodes
New nodes being added to cluster are a set of "M" nodes.
N+M will be the total number of nodes in new volume configuration.
Total files in the cluster before rebalance "X"
Number of  files on each existing nodes are "J"  = (X / N)
Number of files on each nodes after rebalance/scaling are "K"  = (X / (N+M))
K * M = Z (Total Number of Files on set of M nodes after rebalance/scaling)
J * N = X (Total files in the cluster before rebalance/scaling)

Z / N = Y (Total Number of Files moved from each existing nodes afterrebalance/scaling)( Y / J ) * 100 = Percentage of Files moved from each 'N' nodes afterrebalance/scaling.( J - Y ) / J * 100 = Percentage of Files existing on each 'N' nodesafter rebalance/scalingNOTE: "N" is obtained as not as just number of nodes but totalsub-volumes for "distribute" translator. "M" is number of additionalsub-volumes added before starting rebalance and scaling.So for multiple exports from a single server we need to calculate thetotal value moved from the server by multiplying with such number ofexports./



Thanks,

Craig

-->
Craig Carl
Senior Systems Engineer
Gluster


On 12/06/2010 04:50 PM, Michael Robbert wrote:

How long should a rebalance take? I know that it depends so lets take this 
example. 4 servers, 1 brick per server. here is the df -i output from the 
servers:

[r...@ra5 ~]# pdsh -g rack7 "df -i|grep brick"
iosrv-7-1:                      366288896 2720139 363568757    1% /mnt/brick1
iosrv-7-4:                      366288896 3240868 363048028    1% /mnt/brick4
iosrv-7-2:                      366288896 2594165 363694731    1% /mnt/brick2
iosrv-7-3:                      366288896 3267152 363021744    1% /mnt/brick3

So, it looks like there are roughly 10 million files. I have a rebalance 
running on one of the servers since last Friday and this is what the status 
looks like right now:

[r...@iosrv-7-2 ~]# gluster volume rebalance gluster-test status
rebalance step 1: layout fix in progress: fixed layout 149531740

As a side note I started this rebalance when I noticed that about half of my 
clients are missing a certain set of files. Upon further investigation I found 
that a different set of clients are missing different data. This problem 
happened after many problems getting an upgrade to 3.1.1 working. Unfortunately 
I don't remember which version was running when I was last able to write to 
this volume.

Any thoughts?

Mike

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] gluster rebalance taking multiple days

Reply via email to