[jira] Updated: (HADOOP-1535) Wrong comparator used to merge files in Reduce phase

Devaraj Das (JIRA) Tue, 26 Jun 2007 22:43:51 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Devaraj Das updated HADOOP-1535:
--------------------------------

        Fix Version/s: 0.14.0
    Affects Version/s: 0.12.3
                       0.13.0

bq. Devaraj and I were looking at the code yesterday and we found that in 
ReduceTask.java, we use the user-supplied comparator to merge the output files 
from the Map tasks (we use the user-supplied comparator when creating a new 
SequenceFile.Sorter object). This is incorrect as the comparator used to merge 
Map output files should be the same as that used to create those files in the 
Map phase.

A small clarification - we use the *map output key comparator* for sorting map 
outputs and the same comparator must be used for merging them (on the reducer 
side). Also, we should continue to use the map output key comparator for 
iterating through the key/value records from the (possibly merged) map outputs; 
we should use the value-grouping comparator to only decide whether the current 
key we are looking at is "equal" to the last key that was looked at, while the 
user's reducer method is iterating through the values for a key. 

> Wrong comparator used to merge files in Reduce phase
> ----------------------------------------------------
>
>                 Key: HADOOP-1535
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1535
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.12.3, 0.13.0
>            Reporter: Vivek Ratan
>             Fix For: 0.14.0
>
>
> As per the fix for HADOOP-485, we allow users to optionally provide a 
> different comparator to group values when calling the user's Reduce function. 
> Devaraj and I were looking at the code yesterday and we found that in 
> ReduceTask.java, we use the user-supplied comparator to merge the output 
> files from the Map tasks (we use the user-supplied comparator when creating a 
> new SequenceFile.Sorter object). This is incorrect as the comparator used to 
> merge Map output files should be the same as that used to create those files 
> in the Map phase. The user-supplied comparator for grouping values should be 
> used only in the iterator passed to the user's Reduce function (which is done 
> correctly in the code). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1535) Wrong comparator used to merge files in Reduce phase

Reply via email to