Allen Wittenauer wrote:
On Apr 22, 2010, at 5:41 AM, Steve Loughran wrote:
that brings up a couple of issues I've been thinking about now that workers can
go to 6+ HDDs/node
* a way to measure the distribution across disks, rather than just nodes.
DfsClient doesn't provide enough info here yet.
What should probably happen is that instead of throwing you to the file browser, clicking
on a host from the live nodes page should probably put you on a "stats about this
node" page.
I don't want to do any of this by hand. I want machine readable content
something can aggregate over time.
* a way to triger some rebalancing on a single node, to say "position stuff more
fairly". You don't need to worry about network traffic, just local disk load and CPU
time, so it should be simpler.
Yup. Working with 8 drives per node, it is interesting to see how unbalanced
the data gets after a while. [Luckily, we have MR tmp space segregated off so
I'm sure it would be a lot worse if we didn't!]
Someone should file a jira. :)
Especially if someone else offers to fix it.