Hi Vivek,

Can you include a unit test for this fix?

On Jun 28, 2007, at 2:40 AM, Vivek Ratan (JIRA) wrote:


[ https://issues.apache.org/jira/browse/HADOOP-1535? page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Ratan updated HADOOP-1535:
--------------------------------

    Attachment: 1535_01.patch

We use the comparator returned by JobConf.getOutputKeyComparator() for the sort/merge phases of Map and Reduce. We use the comparator returned by JobConf.getOutputValueGroupingComparator() for the iterator across values for a given key. See 1535_01.patch.

Wrong comparator used to merge files in Reduce phase
----------------------------------------------------

                Key: HADOOP-1535
URL: https://issues.apache.org/jira/browse/ HADOOP-1535
            Project: Hadoop
         Issue Type: Bug
         Components: mapred
   Affects Versions: 0.12.3, 0.13.0
           Reporter: Vivek Ratan
           Assignee: Vivek Ratan
            Fix For: 0.14.0

        Attachments: 1535_01.patch


As per the fix for HADOOP-485, we allow users to optionally provide a different comparator to group values when calling the user's Reduce function. Devaraj and I were looking at the code yesterday and we found that in ReduceTask.java, we use the user- supplied comparator to merge the output files from the Map tasks (we use the user-supplied comparator when creating a new SequenceFile.Sorter object). This is incorrect as the comparator used to merge Map output files should be the same as that used to create those files in the Map phase. The user-supplied comparator for grouping values should be used only in the iterator passed to the user's Reduce function (which is done correctly in the code).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Reply via email to