I have a hadoop cluster across 2 racks.  One rack contains 12 nodes,
the other rack contains 5 nodes.

When I run a really large job, the disks on the 5 nodes fill up much
sooner than the disks on the 12 nodes, and I believe it's because the
12 nodes are sending their replicated blocks to the 5-node rack.  In
fact, my job won't finish successfully, due to full disks on the 5
nodes, even though the overall usage of the cluster is ~75%.

Is there a way I can tell hadoop not to enforce the "send replicated
blocks outside the current rack" rule?

Reply via email to