On Apr 22, 2010, at 5:41 AM, Steve Loughran wrote:
> that brings up a couple of issues I've been thinking about now that workers 
> can go to 6+ HDDs/node
> 
> * a way to measure the distribution across disks, rather than just nodes. 
> DfsClient doesn't provide enough info here yet.

What should probably happen is that instead of throwing you to the file 
browser, clicking on a host from the live nodes page should probably put you on 
a "stats about this node" page.

> * a way to triger some rebalancing on a single node, to say "position stuff 
> more fairly". You don't need to worry about network traffic, just local disk 
> load and CPU time, so it should be simpler.


Yup.  Working with 8 drives per node, it is interesting to see how unbalanced 
the data gets after a while.  [Luckily, we have MR tmp space segregated off so 
I'm sure it would be a lot worse if we didn't!]

Someone should file a jira. :)

Reply via email to