Hi,
I had some questions specifically on the Map-Reduce phase:

[1] For the reduce phase, the TaskTrackers corresponding to the reduce node, 
poll the Job Tracker to know about maps that have completed and if the 
Jobtracker informs it about maps that are complete, it then pulls the data from 
the map node where the map is complete. This is a "pull" model as opposed to 
"push" model where the map directly sends a region of the map output to the 
appropriate reduce node. Is the pull model the default  in 0.20, 0.23 etc ?

In the pull model, how does the Reduce node know it is responsible for a 
particular region of map output? (Is this determined up front? From where it 
gets this information?)

[2]There can be multiple reduce tasks per reduce node. The number of reduce 
tasks is configurable, How about the number of reduce nodes? How is this 
determined?

[3]Pre 0.23, The map/reduce tasks slots for a node are allocated statically . 
Is this based on just configuration ?

Thanks in advance!

Reply via email to