1) It fetches the block from the rack it is on, if available or from another
rack if not.  Block is fetched (or streamed?) over the network I believe,
before map can begin.  This feature is known as the rack locality.  You can
see a counter associated with this in the jobs you run (data local tasks,
rack local tasks, etc).

2) The reducer has a phase called copy which fetches _all_ the map outputs
it needs to act on (first 33%).  Only then the sort phase is initiated (next
33%).  Only after copy and sort, the reduce begins (onto 100%).  So such an
issue won't occur, as all map outputs are fetched before any other logic
runs.

On Oct 13, 2010 5:42 PM, "Matthew John" <[email protected]> wrote:

Hi all ,

Had some doubts :

1) what happens when a mapper running in node A needs data from a block it
does nt have ? ( the block might be present in some other node in the
cluster )

2) in the Sort/Shuffle phase is just a logical representation of all map
outputs together sorted rite ? and again, what happens when reduce in Node C
needs access of some map outputs not in its memory?

Matthew .
  • doubts Matthew John
    • Re: doubts Harsh J

Reply via email to