Re: dfs.data.dir

Steve Loughran Thu, 22 Apr 2010 05:42:36 -0700

Eli Collins wrote:

Hey Mag,


You can bring down the datanode daemon, add the extra dfs.data.dir and
then restart. Since blocks are round robin'd the new directory will
have lower utilization (one other directories are full it will start
catching up). If that's not OK you can re-balance the directories by
hand with cp when the datanode is down (before you restart it).  If
this takes you longer than 10 minutes the blocks on that datanode will
start getting re-replicated but when you bring the datanode back up
the namenode will notice the over-replicated blocks and remove them.

that brings up a couple of issues I've been thinking about now thatworkers can go to 6+ HDDs/node

* a way to measure the distribution across disks, rather than justnodes. DfsClient doesn't provide enough info here yet.* a way to triger some rebalancing on a single node, to say "positionstuff more fairly". You don't need to worry about network traffic, justlocal disk load and CPU time, so it should be simpler.

Re: dfs.data.dir

Reply via email to