[ https://issues.apache.org/jira/browse/HADOOP-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493163 ]
Hadoop QA commented on HADOOP-485: ---------------------------------- +1 http://issues.apache.org/jira/secure/attachment/12356648/485.patch applied and successfully tested against trunk revision r534234. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/103/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/103/console > allow a different comparator for grouping keys in calls to reduce > ----------------------------------------------------------------- > > Key: HADOOP-485 > URL: https://issues.apache.org/jira/browse/HADOOP-485 > Project: Hadoop > Issue Type: New Feature > Components: mapred > Affects Versions: 0.5.0 > Reporter: Owen O'Malley > Assigned To: Tahir Hashmi > Attachments: 485.patch, 485.patch, 485.patch, 485.patch, > Hadoop-485-pre.patch, TestUserValueGrouping.java.patch > > > Some algorithms require that the values to the reduce be sorted in a > particular order, but extending the key with the additional fields causes > them to be handled by different calls to reduce. (The user then collects the > values until they detect a "real" key change and then processes them.) > It would be much easier if the framework let you define a second comparator > that did the grouping of values for reduces. So your reduce inputs look like: > A1, V1 > A2, V2 > A3, V3 > B1, V4 > B2, V5 > instead of getting calls to reduce that look like: > reduce(A1, {V1}); reduce(A2, {V2}); reduce(A3, {V3}); reduce(B1, {V4}); > reduce(B2, {V5}); > you could define the grouping comparator to just compare the letters and end > up with: > reduce(A1, {V1,V2,V3}); reduce(B1, {V4,V5}); > which is the desired outcome. Note that this assumes that the "extra" part of > the key is just for sorting because the reduce will only see the first > representative of each equivalence class. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.