Re: [jira] Updated: (HADOOP-485) allow a different comparator for grouping keys in calls to reduce

Nigel Daley Thu, 26 Apr 2007 22:07:19 -0700

FWIW, HADOOP-742 requests better javadoc for JobConf

On Apr 26, 2007, at 1:38 PM, Doug Cutting (JIRA) wrote:

[ https://issues.apache.org/jira/browse/HADOOP-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doug Cutting updated HADOOP-485:
--------------------------------

    Status: Open  (was: Patch Available)
The new JobConf methods need javadoc. (Yes, the rest of JobConf iswoefully undocumented, but that's a bug that we should work to fix.)
allow a different comparator for grouping keys in calls to reduce
-----------------------------------------------------------------

                Key: HADOOP-485
                URL: https://issues.apache.org/jira/browse/HADOOP-485
            Project: Hadoop
         Issue Type: New Feature
         Components: mapred
   Affects Versions: 0.5.0
           Reporter: Owen O'Malley
        Assigned To: Tahir Hashmi
Attachments: 485.patch, 485.patch, Hadoop-485-pre.patch,TestUserValueGrouping.java.patch
Some algorithms require that the values to the reduce be sorted ina particular order, but extending the key with the additionalfields causes them to be handled by different calls to reduce.(The user then collects the values until they detect a "real" keychange and then processes them.)It would be much easier if the framework let you define a secondcomparator that did the grouping of values for reduces. So yourreduce inputs look like:
A1, V1
A2, V2
A3, V3
B1, V4
B2, V5
instead of getting calls to reduce that look like:
reduce(A1, {V1}); reduce(A2, {V2}); reduce(A3, {V3}); reduce(B1,{V4}); reduce(B2, {V5});you could define the grouping comparator to just compare theletters and end up with:
reduce(A1, {V1,V2,V3}); reduce(B1, {V4,V5});
which is the desired outcome. Note that this assumes that the"extra" part of the key is just for sorting because the reducewill only see the first representative of each equivalence class.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Updated: (HADOOP-485) allow a different comparator for grouping keys in calls to reduce

Reply via email to