On Nov 16, 2008, at 6:18 PM, Owen O'Malley wrote:
On Sun, Nov 16, 2008 at 2:18 PM, Saptarshi Guha <[EMAIL PROTECTED]
>wrote:
Hello,
If my understanding is correct, the combiner will read in
values for
a given key, process it, output it and then **all** values for a
key are
given to the reducer.
Not quite. The flow looks like RecordReader -> Mapper -> Combiner * ->
Reducer -> OutputFormat .
Yes, i glossed over that bit. Thanks for the correction.
The Combiner may be called 0, 1, or many times on each key between the
mapper and reducer. Combiners are just an application specific
optimization
that compress the intermediate output. They should not have side
effects or
transform the types. Unfortunately, since there isn't a separate
interface
for Combiners, there is isn't a great place to document this
requirement.
I've just filed HADOOP-4668 to improve the documentation.
Hmm, i had no idea that the combiner could be called 0 times. Thanks
for the heads up
Thank you
Saptarshi