[ 
https://issues.apache.org/jira/browse/SPARK-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-741:
----------------------------------

    Reporter: Patrick Wendell  (was: Patrick Cogan)

> DiskStore should use > 8kB buffer when doing writes
> ---------------------------------------------------
>
>                 Key: SPARK-741
>                 URL: https://issues.apache.org/jira/browse/SPARK-741
>             Project: Apache Spark
>          Issue Type: Improvement
>            Reporter: Patrick Wendell
>            Assignee: Reynold Xin
>             Fix For: 0.8.0
>
>
> Right now the DiskStore uses a buffered output stream with the default buffer 
> size of 8kB. This can hurt disk throughput by a substantial amount when there 
> are several shuffle files being output at once (either due to a large # of 
> concurrent tasks or a large # of output splits).
> We should avoid increasing this buffer arbitrarily because it is instantiated 
> (# tasks * # splits) times currently, which could be large. The best approach 
> is probably to do something like this:
> - By default, give each task 10mB of total buffer space, divided up amongst 
> its output partitions.
> - If this means each split buffer is < 8kB, bump up to at least 8kB (we'd 
> rather OOM then have terrible disk throughput, so at least people can figure 
> out what's wrong).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to