Re: correct pattern for using setOutputValueGroupingComparator?

Meng Mao Mon, 05 Jan 2009 20:17:40 -0800

Unfortunately, my team is on 0.15 :(. We are looking to upgrade to 0.18 as
soon as we upgrade our hardware (long story).
>From comparing the 0.15 and 0.19 mapreduce tutorials, and looking at the
4545 patch, I don't see anything that seems majorly different about the
MapReduce API?
- There's a Partitioner that's used, but that seems optional?
- I see that 0.19 still provides setOutputValueGroupingComparator; is the
setGroupingComparatorClass in the patch from the 0.20 API?

I have an associated question -- is it possible to use this
GroupingComparator technique to perform essentially a one-to-many mapping?
Let's say I have records like so:
id_1  -   metadata
id_2  -   metadata
id_1  A  numbers
id_2  B  numbers
id_1  C  numbers

Would it be possible for a key,value pair for <"id_1, -", metadata> to map
to both the groups for the keys "id_1, A" and "id_1, C" ?  The comparator
seems easy to achieve; but I don't see multiple copies of a record being
sent to multiple groups.  I know it's a bit unusual, but it would be useful
for us to have this kind of wildcard behavior.

Meng

On Mon, Jan 5, 2009 at 6:58 PM, Owen O'Malley <[email protected]> wrote:

> This is exactly what the setOutputValueGroupingComparator is for. Take a
> look at HADOOP-4545, for an example using the secondary sort. If you are
> using trunk or 0.20, look at
> src/examples/org/apache/hadoop/examples/SecondarySort.java. The checked in
> example uses the new map/reduce api that was introduced in 0.20.
>
> -- Owen
>

Re: correct pattern for using setOutputValueGroupingComparator?

Reply via email to