On Apr 22, 2010, at 5:41 AM, Steve Loughran wrote: > that brings up a couple of issues I've been thinking about now that workers > can go to 6+ HDDs/node > > * a way to measure the distribution across disks, rather than just nodes. > DfsClient doesn't provide enough info here yet.
What should probably happen is that instead of throwing you to the file browser, clicking on a host from the live nodes page should probably put you on a "stats about this node" page. > * a way to triger some rebalancing on a single node, to say "position stuff > more fairly". You don't need to worry about network traffic, just local disk > load and CPU time, so it should be simpler. Yup. Working with 8 drives per node, it is interesting to see how unbalanced the data gets after a while. [Luckily, we have MR tmp space segregated off so I'm sure it would be a lot worse if we didn't!] Someone should file a jira. :)
