Mori,
On Jul 22, 2008, at 12:22 PM, Mori Bellamy wrote:
hey all,
let us say that i have 3 boxes, A B and C. initially, map tasks are
running on all 3. after most of the mapping is done, C is 32% done
with reduce (so still copying stuff to its local disk) and A is
stuck on a particularly long map-task (it got an ill-behaved record
from the input splits). does A's intermediate map output data go
directly to C's local disk, or is it still written to HDFS and
therefore distributed amongst all the machines? also, will A's disk
be a favored target for A's output bytes, or is the target volume
independent of the corresponding mapper?
Intermediate outputs (i.e. map outputs) are written to the local disk
and not to HDFS. The reduce fetches the intermediate outputs via HTTP.
hth,
Arun
Thanks! The answer to this question should clear a lot of things up
for me.