[ http://issues.apache.org/jira/browse/HADOOP-485?page=all ]

Sameer Paranjpye updated HADOOP-485:
------------------------------------

    Component/s: mapred

> allow a different comparator for grouping keys in calls to reduce
> -----------------------------------------------------------------
>
>                 Key: HADOOP-485
>                 URL: http://issues.apache.org/jira/browse/HADOOP-485
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.5.0
>            Reporter: Owen O'Malley
>         Assigned To: Owen O'Malley
>
> Some algorithms require that the values to the reduce be sorted in a 
> particular order, but extending the key with the additional fields causes  
> them to be handled by different calls to reduce. (The user then collects the 
> values until they detect a "real" key change and then processes them.)
> It would be much easier if the framework let you define a second comparator 
> that did the grouping of values for reduces. So your reduce inputs look like:
> A1, V1
> A2, V2
> A3, V3
> B1, V4
> B2, V5
> instead of getting calls to reduce that look like:
> reduce(A1, {V1}); reduce(A2, {V2}); reduce(A3, {V3}); reduce(B1, {V4}); 
> reduce(B2, {V5});
> you could define the grouping comparator to just compare the letters and end 
> up with:
> reduce(A1, {V1,V2,V3}); reduce(B1, {V4,V5});
> which is the desired outcome. Note that this assumes that the "extra" part of 
> the key is just for sorting because the reduce will only see the first 
> representative of each equivalence class.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to