[
https://issues.apache.org/jira/browse/SPARK-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Wendell updated SPARK-1912:
-----------------------------------
Fix Version/s: 0.9.2
> Compression memory issue during reduce
> --------------------------------------
>
> Key: SPARK-1912
> URL: https://issues.apache.org/jira/browse/SPARK-1912
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Reporter: Wenchen Fan
> Assignee: Wenchen Fan
> Fix For: 0.9.2, 1.0.1, 1.1.0
>
>
> When we need to read a compressed block, we will first create a compress
> stream instance(LZF or Snappy) and use it to wrap that block.
> Let's say a reducer task need to read 1000 local shuffle blocks, it will
> first prepare to read that 1000 blocks, which means create 1000 compression
> stream instance to wrap them. But the initialization of compression instance
> will allocate some memory and when we have many compression instance at the
> same time, it is a problem.
> Actually reducer reads the shuffle blocks one by one, so why we create
> compression instance at the first time? Can we do it lazily that when a block
> is first read, create compression instance for it.
--
This message was sent by Atlassian JIRA
(v6.2#6252)