[
https://issues.apache.org/jira/browse/MAPREDUCE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597951#comment-13597951
]
Mariappan Asokan commented on MAPREDUCE-3685:
---------------------------------------------
Hi Ravi,
I looked at the {{Merger}} class a little deeper. I think the
optimization(for more parallelism) I suggested is a bit aggressive in some
cases. For example, if you end up having only 101 files to merge(instead of
198) {{Merger}} will merge just 2 files in the first pass and then merge 100
files for the final merge. Now, if there is a genie that can tell us how many
disk files we will create during the course of shuffle/merge we can either opt
to wait or kick off the merge as soon as we reach the disk file count greater
then {{io.sort.factor.}} This is something that can be explored later. For
example, if we know that the number of mappers is huge compared to
{{io.sort.factor}} and we do not have enough memory for large in-memory merges
we can opt for the optimization I suggested.
-- Asokan
> There are some bugs in implementation of MergeManager
> -----------------------------------------------------
>
> Key: MAPREDUCE-3685
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3685
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 0.23.1
> Reporter: anty.rao
> Assignee: anty
> Priority: Critical
> Fix For: 0.23.7, 2.0.5-beta
>
> Attachments: MAPREDUCE-3685-branch-0.23.1.patch,
> MAPREDUCE-3685-branch-0.23.1.patch, MAPREDUCE-3685-branch-0.23.1.patch,
> MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch,
> MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.branch-0.23.patch,
> MAPREDUCE-3685.branch-0.23.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch,
> MAPREDUCE-3685.patch, MAPREDUCE-3685.patch, MAPREDUCE-3685.patch,
> MAPREDUCE-3685.patch, MAPREDUCE-3685.patch
>
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira