[jira] Commented: (HADOOP-3473) io.sort.factor should default to 100 instead of 10

Doug Cutting (JIRA) Fri, 30 May 2008 16:05:09 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601290#action_12601290
 ]


Doug Cutting commented on HADOOP-3473:
--------------------------------------

Changing this has memory implications, no?  Buffers are allocated for each 
stream being merged.  Buffers should be large enough so that transfer dominates 
seek, i.e., @ 10ms/seek, 100MB/s transfer, seek=transfer at 1MB.  So for 
merging not to be seek-bound with 100 buffers, the total buffer size needs to 
be substantially larger than 100MB, which is currently the default for 
io.sort.mb.  So I can see increasing this to 50 w/o changing the default for 
io.sort.mb.

BTW, you've proposed a solution in the description rather than a problem.  The 
problem, I assume, is that the sort-factor is non-optimal.  Perhaps a better 
solution to this problem is to not specify the sort factor at all, but rather 
to have the sort code determine it automatically based on io.sort.mb?  So if 
you increase io.sort.mb, you'd get a larger sort factor.  Of course, then we'd 
have to make some assumptions about disk performance...

> io.sort.factor should default to 100 instead of 10
> --------------------------------------------------
>
>                 Key: HADOOP-3473
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3473
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: conf
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.18.0
>
>
> 10 is *really* conservative and can make merges much much more expensive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3473) io.sort.factor should default to 100 instead of 10

Reply via email to