Your latter statement is correct: > if the output of the Map1 phase (or Reduce phase) is immediately inserted to Map2 phase (or Map3 Phase) within the same node, without any distribution.
ChainMappers / ChainReducers are just convenience classes to allow reuse of mapper code whether executing as part of a sequence or executing standalone. These do not force the system to do any additional distribution, grouping, sorting etc. -Rahul 2011/4/29 Panayotis Antonopoulos <[email protected]> > > Hello, > Let' say we have a MR job that uses ChainMapper and ChainReducer like in > the following diagram: > Input->Map1->Map2->Reduce->Map3->Output > > The input is split and distributed to the nodes of the cluster before being > processed by Map1 phase. > Also, before the Reduce phase the key/value pairs are also distributed to > the Reducers according to the Partitions made by the Partitioner. > > I expected that the same thing (distribution of the keys) would happen > before Map2 and Map3 phases but after reading "Pro Hadoop" Book I strongly > doubt it. > > I would like to ask you if the key/value pairs emitted by the Map1 phase > (or those emitted by the Reduce phase) are distributed to the nodes of the > cluster before being processed by the next Map phase, > or if the output of the Map1 phase (or Reduce phase) is immediately > inserted to Map2 phase (or Map3 Phase) within the same node, without any > distribution. > > Thank you in advance! > Panagiotis Antonopoulos >
