Stuart, Can you disable the topology(rack-awareness) on hdfs? That way, all 17 nodes should get the equal amount (assuming you have enough tasks to run on all the nodes).
Koji On 9/29/09 10:19 AM, "Stuart White" <stuart.whi...@gmail.com> wrote: > I have a hadoop cluster across 2 racks. One rack contains 12 nodes, > the other rack contains 5 nodes. > > When I run a really large job, the disks on the 5 nodes fill up much > sooner than the disks on the 12 nodes, and I believe it's because the > 12 nodes are sending their replicated blocks to the 5-node rack. In > fact, my job won't finish successfully, due to full disks on the 5 > nodes, even though the overall usage of the cluster is ~75%. > > Is there a way I can tell hadoop not to enforce the "send replicated > blocks outside the current rack" rule?