Combiners

Mathias Herberts Sat, 29 Oct 2011 03:53:19 -0700

Hi,

I'm designing a 'Hadoop MapReduce Poster', putting all pieces together
so people will easily be able to visualize the full M/R flow.


Concerning the combiners, I have a few points I'd like to have clarified.

If I'm not mistaken, the output of the Mapper is passed to the
Partitioner which will dispatch K,V into R partitions.

<K,V> for each partition then go through the set SortComparatorClass
for sorting.

If there is a combiner, the sorted output is grouped using the
SortComparatorClass (and not the GroupingComparatorClass as it's the
case in the Reducer) and passed to the combiner prior to be written to
the partition file.

My question is, what happens if the combiner outputs different keys
than what it is being fed? The output of the combiner will suffer two
flaws:

1. It won't be sorted
2. It might end up in the wrong partition

Since a Combiner is simply a Reducer with no other constraints,
nothing seems to prevent those 2 problems.

Is my understanding correct?

Mathias.

Combiners

Reply via email to