That's interesting. Why letting reducer fetch local data through HTTP not SSH?
----- Original Message ---- From: Arun C Murthy <[EMAIL PROTECTED]> To: [email protected] Sent: Tuesday, July 22, 2008 2:19:36 PM Subject: Re: question on HDFS Mori, On Jul 22, 2008, at 12:22 PM, Mori Bellamy wrote: > hey all, > let us say that i have 3 boxes, A B and C. initially, map tasks are > running on all 3. after most of the mapping is done, C is 32% done > with reduce (so still copying stuff to its local disk) and A is > stuck on a particularly long map-task (it got an ill-behaved record > from the input splits). does A's intermediate map output data go > directly to C's local disk, or is it still written to HDFS and > therefore distributed amongst all the machines? also, will A's disk > be a favored target for A's output bytes, or is the target volume > independent of the corresponding mapper? > Intermediate outputs (i.e. map outputs) are written to the local disk and not to HDFS. The reduce fetches the intermediate outputs via HTTP. hth, Arun > Thanks! The answer to this question should clear a lot of things up > for me.
