allow a different comparator for grouping keys in calls to reduce
-----------------------------------------------------------------

                 Key: HADOOP-485
                 URL: http://issues.apache.org/jira/browse/HADOOP-485
             Project: Hadoop
          Issue Type: New Feature
    Affects Versions: 0.5.0
            Reporter: Owen O'Malley
         Assigned To: Owen O'Malley


Some algorithms require that the values to the reduce be sorted in a particular 
order, but extending the key with the additional fields causes  them to be 
handled by different calls to reduce. (The user then collects the values until 
they detect a "real" key change and then processes them.)

It would be much easier if the framework let you define a second comparator 
that did the grouping of values for reduces. So your reduce inputs look like:

A1, V1
A2, V2
A3, V3
B1, V4
B2, V5

instead of getting calls to reduce that look like:

reduce(A1, {V1}); reduce(A2, {V2}); reduce(A3, {V3}); reduce(B1, {V4}); 
reduce(B2, {V5});

you could define the grouping comparator to just compare the letters and end up 
with:

reduce(A1, {V1,V2,V3}); reduce(B1, {V4,V5});

which is the desired outcome. Note that this assumes that the "extra" part of 
the key is just for sorting because the reduce will only see the first 
representative of each equivalence class.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to