[
https://issues.apache.org/jira/browse/MAPREDUCE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545336#comment-13545336
]
anty.rao commented on MAPREDUCE-3685:
-------------------------------------
{quote}
Ravi Prakash The patch looks good. One question: why pass along uncompressed
size for the new MapOutput ctor, shouldn't we be using compressed size so we
get the smallest on-disk files first?
{quote}
I agree on this.
{quote}
One nit: we should use MergeManager.getDiskMapOutputs in OnDiskMerger.merge
too... maybe MergeManager.getDiskMapOutputs should just return Path[] and then
we can fix MergeManager.finalMerge to use Path[] rather than List<Path>.
Thoughts?
{quote}
MergeManager.finalMerge could be better to use List<Path>, b/c method
finalMerge may need change the contents of List<Path>;if you use Path[], you
have to create a new Path[] to make the modification
OnDiskMerge.merge can't use MergeManager.getDiskMapOutputs, b/c
OnDiskMerge.merge will make changes to MergeManager#onDiskMapOutputs according
to its merge policy(e.g. mergeFactor)
I know these codes are ugly, but i can't think of a better way to fix it.Maybe
we should use List<Path> always, but there are many codes using Path[] already.
> There are some bugs in implementation of MergeManager
> -----------------------------------------------------
>
> Key: MAPREDUCE-3685
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3685
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 0.23.1
> Reporter: anty.rao
> Assignee: anty
> Priority: Critical
> Attachments: MAPREDUCE-3685-branch-0.23.1.patch,
> MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685-branch-0.23.1.patch,
> MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch,
> MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch,
> MAPREDUCE-3685.patch
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira