[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545336#comment-13545336
 ] 

anty.rao commented on MAPREDUCE-3685:
-------------------------------------

{quote}
Ravi Prakash The patch looks good. One question: why pass along uncompressed 
size for the new MapOutput ctor, shouldn't we be using compressed size so we 
get the smallest on-disk files first?
{quote}
I agree on this.

{quote}
One nit: we should use MergeManager.getDiskMapOutputs in OnDiskMerger.merge 
too... maybe MergeManager.getDiskMapOutputs should just return Path[] and then 
we can fix MergeManager.finalMerge to use Path[] rather than List<Path>. 
Thoughts?
{quote}
MergeManager.finalMerge could be better to use List<Path>, b/c method 
finalMerge may need change the contents of  List<Path>;if you use Path[], you 
have to create a new Path[] to make the modification 

OnDiskMerge.merge can't use MergeManager.getDiskMapOutputs, b/c 
OnDiskMerge.merge will make changes to MergeManager#onDiskMapOutputs according 
to its merge policy(e.g. mergeFactor)

I know these codes are ugly, but i can't think of a better way to fix it.Maybe 
we should use List<Path> always, but there are many codes using Path[] already.


                
> There are some bugs in implementation of MergeManager
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-3685
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3685
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.1
>            Reporter: anty.rao
>            Assignee: anty
>            Priority: Critical
>         Attachments: MAPREDUCE-3685-branch-0.23.1.patch, 
> MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685-branch-0.23.1.patch, 
> MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch, 
> MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, 
> MAPREDUCE-3685.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to