[ http://issues.apache.org/jira/browse/HADOOP-485?page=all ]
Sameer Paranjpye updated HADOOP-485: ------------------------------------ Component/s: mapred > allow a different comparator for grouping keys in calls to reduce > ----------------------------------------------------------------- > > Key: HADOOP-485 > URL: http://issues.apache.org/jira/browse/HADOOP-485 > Project: Hadoop > Issue Type: New Feature > Components: mapred > Affects Versions: 0.5.0 > Reporter: Owen O'Malley > Assigned To: Owen O'Malley > > Some algorithms require that the values to the reduce be sorted in a > particular order, but extending the key with the additional fields causes > them to be handled by different calls to reduce. (The user then collects the > values until they detect a "real" key change and then processes them.) > It would be much easier if the framework let you define a second comparator > that did the grouping of values for reduces. So your reduce inputs look like: > A1, V1 > A2, V2 > A3, V3 > B1, V4 > B2, V5 > instead of getting calls to reduce that look like: > reduce(A1, {V1}); reduce(A2, {V2}); reduce(A3, {V3}); reduce(B1, {V4}); > reduce(B2, {V5}); > you could define the grouping comparator to just compare the letters and end > up with: > reduce(A1, {V1,V2,V3}); reduce(B1, {V4,V5}); > which is the desired outcome. Note that this assumes that the "extra" part of > the key is just for sorting because the reduce will only see the first > representative of each equivalence class. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira