Re: Reducer granularity and starvation

W.P. McNeill Wed, 18 May 2011 18:05:04 -0700

Here's a consequence that I see of having the values be much larger than the
keys: there's not much point in me adding a combiner.


My mapper emits pairs of the form:

<Key, Value>

where the size of value is much greater than the size of Key.  The reducer
then processes input of the form:

<Key, Iterator<Value>>

The reducer then looks at the set of values corresponding to a Key and
separates it into one of two bins.  I don't think this is particularly
CPU-intensive, however, the reducer needs access to the entire set of
Values.  The set can't be boiled down into some smaller sufficient statistic
the way, say, in a word count program we can combine the counts for a word
from different documents into a single number.  As a result, the only
combiner strategy I can see is to have the mapper emit a Value as a single
item list:

<Key, [Value]>

Have a combiner combine the lists:

<Key, [Value, Value...]

and then the reducer would work on lists of lists.

<Key, Iterator<[Value, Value...]>>

This would save on redundant Key IO, but since Values are so much bigger
than Keys I don't think this would matter.

Re: Reducer granularity and starvation

Reply via email to