[
https://issues.apache.org/jira/browse/SPARK-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Wendell updated SPARK-741:
----------------------------------
Reporter: Patrick Wendell (was: Patrick Cogan)
> DiskStore should use > 8kB buffer when doing writes
> ---------------------------------------------------
>
> Key: SPARK-741
> URL: https://issues.apache.org/jira/browse/SPARK-741
> Project: Apache Spark
> Issue Type: Improvement
> Reporter: Patrick Wendell
> Assignee: Reynold Xin
> Fix For: 0.8.0
>
>
> Right now the DiskStore uses a buffered output stream with the default buffer
> size of 8kB. This can hurt disk throughput by a substantial amount when there
> are several shuffle files being output at once (either due to a large # of
> concurrent tasks or a large # of output splits).
> We should avoid increasing this buffer arbitrarily because it is instantiated
> (# tasks * # splits) times currently, which could be large. The best approach
> is probably to do something like this:
> - By default, give each task 10mB of total buffer space, divided up amongst
> its output partitions.
> - If this means each split buffer is < 8kB, bump up to at least 8kB (we'd
> rather OOM then have terrible disk throughput, so at least people can figure
> out what's wrong).
--
This message was sent by Atlassian JIRA
(v6.2#6252)