[
https://issues.apache.org/jira/browse/HADOOP-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601290#action_12601290
]
Doug Cutting commented on HADOOP-3473:
--------------------------------------
Changing this has memory implications, no? Buffers are allocated for each
stream being merged. Buffers should be large enough so that transfer dominates
seek, i.e., @ 10ms/seek, 100MB/s transfer, seek=transfer at 1MB. So for
merging not to be seek-bound with 100 buffers, the total buffer size needs to
be substantially larger than 100MB, which is currently the default for
io.sort.mb. So I can see increasing this to 50 w/o changing the default for
io.sort.mb.
BTW, you've proposed a solution in the description rather than a problem. The
problem, I assume, is that the sort-factor is non-optimal. Perhaps a better
solution to this problem is to not specify the sort factor at all, but rather
to have the sort code determine it automatically based on io.sort.mb? So if
you increase io.sort.mb, you'd get a larger sort factor. Of course, then we'd
have to make some assumptions about disk performance...
> io.sort.factor should default to 100 instead of 10
> --------------------------------------------------
>
> Key: HADOOP-3473
> URL: https://issues.apache.org/jira/browse/HADOOP-3473
> Project: Hadoop Core
> Issue Type: Bug
> Components: conf
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Fix For: 0.18.0
>
>
> 10 is *really* conservative and can make merges much much more expensive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.