[ 
https://issues.apache.org/jira/browse/HADOOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508522
 ] 

Vivek Ratan commented on HADOOP-1535:
-------------------------------------

A related issue: During the Map phase, when we sort key-value pairs, we use the 
comparator returned by JobConf.getOutputKeyComparator() (in 
BasicTypeSorterBase::configure()). When we merge files (in 
MapOutputBuffer::mergeParts()), we use the comparator returned by 'new 
WritableComparator(keyClass)' (in SequenceFile::Sorter::Sorter()). This is not 
right, as the exact same comparator should be used both during sort and during 
merge (as well as during merge in the Reduce phase). There can be situations 
when JobConf.getOutputKeyComparator()  and WritableComparator(keyClass) return 
different comparators. 

> Wrong comparator used to merge files in Reduce phase
> ----------------------------------------------------
>
>                 Key: HADOOP-1535
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1535
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.12.3, 0.13.0
>            Reporter: Vivek Ratan
>            Assignee: Vivek Ratan
>             Fix For: 0.14.0
>
>
> As per the fix for HADOOP-485, we allow users to optionally provide a 
> different comparator to group values when calling the user's Reduce function. 
> Devaraj and I were looking at the code yesterday and we found that in 
> ReduceTask.java, we use the user-supplied comparator to merge the output 
> files from the Map tasks (we use the user-supplied comparator when creating a 
> new SequenceFile.Sorter object). This is incorrect as the comparator used to 
> merge Map output files should be the same as that used to create those files 
> in the Map phase. The user-supplied comparator for grouping values should be 
> used only in the iterator passed to the user's Reduce function (which is done 
> correctly in the code). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to