I just spent some time evaluating rebalancing options.
The method that I found most useful was to walk through my data a directory at a time incrementing the replication count on that directory's contents, waiting a minute and then dropping it back down. That resulted in a rebalanced storage set surprisingly quickly. In your case, you are guaranteed that you can increase the replication count on at least some reasonable fraction of your data by a substantial fraction. Supposing that you are using 2-way replication, you have 5 x 200GB of actually data. This means that 40% of your data could be bumped to 3-way replication at a time (since you have added 2x200GB). For faster redistribution, you would be able to increase up to 20% of your data to 4-way replication. Obviously, these numbers would vary a bit depending on the mix of your replication, but the fact that you have added two new datanodes means that you are guaranteed that up to 200GB of actual files can have their replication increased by 2. My recommendation is that you do just that to a rolling 5% of your data. It turned out in my case that each round of increased replication took almost exactly the same amount of time to stabilize. In any case, if you don't wait quite long enough, the only penalty is slightly worse balancing. I chose a fixed period of delay partly because the output of [hadoop fsck] is somewhat inconvenient to use in a script (much of the output comes to standard error). On 9/27/07 7:27 PM, "Nathan Wang" <[EMAIL PROTECTED]> wrote: > I saw a similar post > (http://www.mail-archive.com/[email protected]/msg01112.html) > but the answer was not very satisfactory. > > Image I used Hadoop as a fault-tolerance storage. I had 10 nodes, each loaded > with 200GBs. > I found the nodes were overloaded and decided to add 2 new boxes with bigger > disk spaces. > How do I redistribute the existing data? I don't want to bump up the > replication factor since > the old nodes were already overloaded. It'd be very helpful if this function > could be implemented > at the system level. > > Thanks.
