Re: I want to group "similar" keys in the reducer.

Sonal Goyal Mon, 15 Mar 2010 19:13:25 -0700

Hi Raymond,

A custom partitioner is probably what you need.
An alternate approach is to emit keys based on your pattern. Say you are
currently emitting <KEY1, Val1> , <KEY2, Val2>, <K1, Val3>, <K4, Val4>


You can instead emit

<KEY, <Key1, Val1>> <KEY, <Key2, Val2>> <K, <K1, Val3>> <K, <K4, Val4>>

Thanks and Regards,
Sonal


2010/3/16 Jim Twensky <jim.twen...@gmail.com>

> Hi Raymond,
>
> Take a look at
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setGroupingComparatorClass(java.lang.Class)<http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setGroupingComparatorClass%28java.lang.Class%29>
> .
> I think this is what you want. Also make sure to implement a custom
> partitioner that only takes into account the first part of the key,
> namely the KEY part. You can search for "Secondary Sort" and "Hadoop"
> to see some tutorials on this topic.
>
> Cheers,
> Jim
>
> 2010/3/15 Gang Luo <lgpub...@yahoo.com.cn>:
> > you need to define a pattern and implement you own partitioner so that
> all the similar keys you want to group will go the the same reducer. At
> reduce side, you possibly need to  implement secondary  sorting so that the
> keys you want to group are grouped in the sorted input to reducer. For
> reduce method process on key at one time, you also need to maintain a window
> to buffer all the keys being grouped.
> >
> > -Gang
> >
> >
> >
> > ----- 原始邮件 ----
> > 发件人： Raymond Jennings III <raymondj...@yahoo.com>
> > 收件人： common-user@hadoop.apache.org
> > 发送日期： 2010/3/15 (周一) 1:26:09 下午
> > 主   题： I want to group "similar" keys in the reducer.
> >
> > Is it possible to override a method in the reducer so that similar keys
> will be grouped together?  For example I want all keys of value "KEY1" and
> "KEY2" to merged together.  (My reducer has a KEY of type TEXT.)  Thanks.
> >
> >
> >
> >
>

Re: I want to group "similar" keys in the reducer.

Reply via email to