Re: dfs.data.dir

Steve Loughran Mon, 26 Apr 2010 09:46:51 -0700

Allen Wittenauer wrote:

On Apr 22, 2010, at 5:41 AM, Steve Loughran wrote:

that brings up a couple of issues I've been thinking about now that workers can 
go to 6+ HDDs/node


* a way to measure the distribution across disks, rather than just nodes. 
DfsClient doesn't provide enough info here yet.


What should probably happen is that instead of throwing you to the file browser, clicking 
on a host from the live nodes page should probably put you on a "stats about this 
node" page.

I don't want to do any of this by hand. I want machine readable contentsomething can aggregate over time.

* a way to triger some rebalancing on a single node, to say "position stuff more 
fairly". You don't need to worry about network traffic, just local disk load and CPU 
time, so it should be simpler.



Yup.  Working with 8 drives per node, it is interesting to see how unbalanced 
the data gets after a while.  [Luckily, we have MR tmp space segregated off so 
I'm sure it would be a lot worse if we didn't!]

Someone should file a jira. :)


Especially if someone else offers to fix it.

Re: dfs.data.dir

Reply via email to