[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13652508#comment-13652508
 ] 

Arun C Murthy commented on MAPREDUCE-5211:
------------------------------------------

+1 for the branch-0.23 patch.

Looks like this existed forever in trunk and got accidentally fixed by 
MAPREDUCE-2264 which introduced CompressAwarePath which extends Path i.e. with 
a correct impl of toString - hence the bug in MapOutput.toString got fixed 
accidentally.

However, the fix in trunk is truly accidental, and we'll get paths which have 
the suffix ".merge" for each level of merge etc. repeated.

So, we should apply a similar fix to trunk too.


                
> Reducer intermediate files can collide during merge
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-5211
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5211
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.7
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-5211.branch-0.23.patch
>
>
> The OnDiskMerger.merge method constructs an output path that is not unique to 
> a reduce attempt, and as a result can result in a file collision with other 
> reducers from the same app that are running on the same node.  In addition 
> the name of the output file is based on MapOutput.toString which may not be 
> unique in light of multi-pass merges on disk since the mapId will be null and 
> the basename ends up as "MapOutput(null, DISK)"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to