On Sun, Nov 16, 2008 at 2:18 PM, Saptarshi Guha <[EMAIL PROTECTED]>wrote:

> Hello,
>        If my understanding is correct, the combiner will read in values for
> a given key, process it, output it and then **all** values for a key are
> given to the reducer.


Not quite. The flow looks like RecordReader -> Mapper -> Combiner * ->
Reducer -> OutputFormat .

The Combiner may be called 0, 1, or many times on each key between the
mapper and reducer. Combiners are just an application specific optimization
that compress the intermediate output. They should not have side effects or
transform the types. Unfortunately, since there isn't a separate interface
for Combiners, there is isn't a great place to document this requirement.
I've just filed HADOOP-4668 to improve the documentation.


>   Then it ought to be possible for the combiner to be of the form
>      ... Reducer<IntWritable, Text, IntWritable, BytesWritable>
>    and the reducer:
>      ...Reducer<IntWritable, BytesWritable, IntWritable, Text>


Since the combiner may be called an arbitrary number of times, it must have
the same input and output types. So the parts generically look like:

input: InputFormat<K1,V1>
mapper: Mapper<K1,V1,K2,V2>
combiner: Reducer<K2,V2,K2,V2>
reducer: Reducer<K2,V2,K3,V3>
output: RecordWriter<K3,V3>

so you probably need to move the code that was changing the type into the
last setp of the mapper.

-- Owen

Reply via email to