Rebalancing data across partitions on a datanode.

Doug Balog Wed, 25 Aug 2010 08:14:59 -0700

We've just added a couple of new drives to our datanodes. 
Each new drive has a single filesystem which we added to  dfs.data.dir, and 
mapred.{local,tmp}.dir.
Now I want to rebalance the data across the new filesystems so that they are 
equally utilized.
My plan is to write a script that does the following.


- Calculate how much data each filesystem should have.
- while filesystems are not balanced, 
        - Randomly pick a file and its .meta file from a filesystem that is 
over utilized.
        - Copy them to a tmp name on an under utilized filesystem.
        - Rename files from tmp to proper location on under utilized filesystem.
        - Remove files from the over utilized filesystem.

I think this will work because I believe that the datanode tries to open the 
file
on each of the filesystems until it succeeds. So it doesn't store the 
filesystem that 
the block lives on in memory.

Will this work ?
What are the gotcha's that I have to watch out for ?

Thanks,

Doug

Rebalancing data across partitions on a datanode.

Reply via email to