On 31 October 2011 14:13, Mathias Herberts <[email protected]>wrote:

> Not one I can think of at the moment, but intuitively that's the kind of
> flexibility I would like to see should a grouping comparator become
> configurable for Combiners.
>

I don't think there's a good example, because contractually, IF the
combiner uses a different grouping comparator, then either

a) It has to be a sub-comparator of the reduce grouping comparator, in
which case the reduction is nonoptimal - there are algorithmic arguments
for doing this, but they can all just as easily be resolved using the type
system as messing with the structure of the dataflow.

b) It's a super-comparator (i.e. compares fewer fields and generates larger
groups) of the reduce grouping comparator, in which case it has memory
visible effects on the reducer since groups have now vanished which should
have been present without the "optimization".

One case is suboptimal, the other case is incorrect.


> By pushing your reasoning further, why specify a combiner class at all
> instead of applying the Reducer on the map side.
>

I'll give a concrete example to explain why one doesn't just use a reducer
on the map side: The type contract is different.

Average, as a PartialAggregate: Int -> Partial -> Double

Mapper converts Int to a Partial { value : 1 }

Combiner converts a set of Partials (and possibly, but unlikely Ints) to a
Partial { sum(values) : sum(counts) }

Reducer converts a set of Partials to a Double: sum(values) / sum(counts)

The combiner is idempotent in types, the reducer is not. But the output
from the combiner is not the desired result.

Obviously you can build a reducer out of the Combiner plus some extra code,
but I think that fact is a distraction from the question. Alternatively,
you could do it the other way around and say "Run the combiner on the
reduce side, and force the reducer to accept a single Partial as input, not
a list of Partials." but again I think one can fake up an algorithm where
the reducer needs multiple inputs which cannot easily be merged into a
single type by the combiner (Well, I think given a complex enough combiner
type, this could be wrong... I'll think about it.)

S.


> On Oct 31, 2011 10:02 PM, "Shevek" <[email protected]> wrote:
>
> > On 31 October 2011 13:37, Mathias Herberts <[email protected]
> > >wrote:
> >
> > > I don't know if it's a bug but I'd rather have the ability to set a
> > > Combiner specific group comparator than to have the Combiner use the
> > group
> > > comparator set for the Reducer.
> > > On Oct 31, 2011 9:21 PM, "Harsh J" <[email protected]> wrote:
> > >
> >
> > Now I'm curious. Can you argue that there's a case where it makes a
> > difference? Preferably one where it can't be trivially curried into the
> > combiner?
> >
> > S.
> >
> >
> > > > Shevek,
> > > >
> > > > The problem Mathias indicates here is that the Combiners do not
> utilize
> > > > the Grouping Comparators. They only use the Sort Comparators. Is that
> > > > probably a bug is what I wonder.
> > > >
> > > > On 31-Oct-2011, at 11:14 PM, Shevek wrote:
> > > >
> > > > > I like the ability to reuse a Java component for both sorting and
> > > > grouping,
> > > > > and to be honest, since the cases where one can do a comparison
> > without
> > > > > deserializing the raw bytes are relatively few and far between, I
> > tend
> > > to
> > > > > use java's Comparator interface, and wrap it in some
> > > > > infrastructure-specific adapter. I have a vague feeling that Hadoop
> > > > > sometimes calls the byte interface and sometimes the object
> interface
> > > > > anyway? ICBW, the way I've been writing code makes it irrelevant.
> > > > >
> > > > > Alternatively, I've misunderstood the (simpler) question, and the
> > > answer
> > > > is
> > > > > to use the setGroupingComparatorClass() API.
> > > > >
> > > > > S.
> > > > >
> > > > > On 29 October 2011 04:35, Mathias Herberts <
> > [email protected]
> > > > >wrote:
> > > > >
> > > > >> Another point concerning the Combiners,
> > > > >>
> > > > >> the grouping is currently done using the RawComparator used for
> > > > >> sorting the Mapper's output. Wouldn't it be useful to be able to
> > set a
> > > > >> custom CombinerGroupingComparatorClass?
> > > > >>
> > > > >> Mathias.
> > > > >>
> > > >
> > > >
> > >
> >
>

Reply via email to