If you are on 0.18, it is possible to say that a combiner be invoked once per partition per spill. Do job.setCombineOnlyOnce(true); Or set the value of "mapred.combine.once" to true in your conf.
On 9/24/08 2:28 PM, "Palleti, Pallavi" <[EMAIL PROTECTED]> wrote: > Can it be possible to ensure that a combiner must run only once? > > Thanks > Pallavi > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Owen > O'Malley > Sent: Wednesday, September 24, 2008 6:42 AM > To: [email protected] > Subject: Re: setting a different input/output class for combiner function than > map and reduce functions > > On Tue, Sep 23, 2008 at 5:40 PM, Sandy <[EMAIL PROTECTED]> wrote: > >> >> I just wrote a combiner class to try and speed things up. However, now I >> want to do something like the following: >> ==map phase== >> input: key = LongWritable value = Text, >> output: key = Text, value = Longwritable >> >> ==combiner== >> input: key = Text, value = iterator<LongWritable> >> output: key = Text, value = Text > > > The input and output types for the combiner *must* be the same. The combiner > may be applied 0, 1, or many times between the map and the reduce. So, > combiners must be: > * not depend on being run exactly once > * not have side effects > > InputFormat -> Map -> Combiner* -> Reduce -> OutputFormat > > Since the Combiner may run more than once, it can't do type transformations. > > -- Owen
