On Mon, May 23, 2011 at 11:32 AM, Mike Spreitzer <[email protected]> wrote: > What happens if one invocation of a combiner outputs more than one value? > > What happens if an output key is different from the input key?
The combiner is responsible for maintaining the sort order (and partitioning) effected prior to that step. So given a record (k,v), one can emit any number of records with keys equal to (but not necessarily the same as) k, per the user-defined comparator. Note that the grouping comparator affects this constraint. The partition of a record is not reevaluated after a map emits it, so a combiner that emits records that belong to a different partition will not group all the keys as expected (i.e. the same key could appear in two different reducers). Similarly, emitting records out of sorted order will have undefined effects. It's possible to work around these constraints, or even write applications that depend on them, but then your application is writing around the implementation details of Apache Hadoop and may break in subsequent releases. -C
