Hi, I'm designing a 'Hadoop MapReduce Poster', putting all pieces together so people will easily be able to visualize the full M/R flow.
Concerning the combiners, I have a few points I'd like to have clarified. If I'm not mistaken, the output of the Mapper is passed to the Partitioner which will dispatch K,V into R partitions. <K,V> for each partition then go through the set SortComparatorClass for sorting. If there is a combiner, the sorted output is grouped using the SortComparatorClass (and not the GroupingComparatorClass as it's the case in the Reducer) and passed to the combiner prior to be written to the partition file. My question is, what happens if the combiner outputs different keys than what it is being fed? The output of the combiner will suffer two flaws: 1. It won't be sorted 2. It might end up in the wrong partition Since a Combiner is simply a Reducer with no other constraints, nothing seems to prevent those 2 problems. Is my understanding correct? Mathias.
