hey all,
let us say that i have 3 boxes, A B and C. initially, map tasks are
running on all 3. after most of the mapping is done, C is 32% done
with reduce (so still copying stuff to its local disk) and A is stuck
on a particularly long map-task (it got an ill-behaved record from the
input splits). does A's intermediate map output data go directly to
C's local disk, or is it still written to HDFS and therefore
distributed amongst all the machines? also, will A's disk be a favored
target for A's output bytes, or is the target volume independent of
the corresponding mapper?
Thanks! The answer to this question should clear a lot of things up
for me.
- question on HDFS Mori Bellamy
-