[
https://issues.apache.org/jira/browse/SPARK-2469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin resolved SPARK-2469.
--------------------------------
Resolution: Fixed
Fix Version/s: 1.1.0
Assignee: Reynold Xin
> Lower shuffle compression buffer memory usage (replace LZF with Snappy for
> default compression codec)
> -----------------------------------------------------------------------------------------------------
>
> Key: SPARK-2469
> URL: https://issues.apache.org/jira/browse/SPARK-2469
> Project: Spark
> Issue Type: Improvement
> Components: Shuffle, Spark Core
> Reporter: Reynold Xin
> Assignee: Reynold Xin
> Fix For: 1.1.0
>
>
> I was looking into the memory usage of shuffle and one annoying thing is the
> default compression codec (LZF) is that the implementation we use allocates
> buffers pretty generously. I did a simple experiment and found that creating
> 1000 LZFOutputStream allocated 198976424 bytes (~190MB). If we have a shuffle
> task that uses 10k reducers and 32 threads running currently, the memory used
> by the lzf stream alone would be ~ 60GB.
> In comparison, Snappy only allocates ~ 65MB for every 1k SnappyOutputStream.
> However, Snappy's compression is slightly lower than LZF's. In my experience,
> it leads to 10 - 20% increase in size. Compression ratio does matter here
> because we are sending data across the network.
--
This message was sent by Atlassian JIRA
(v6.2#6252)