[
https://issues.apache.org/jira/browse/SPARK-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113524#comment-14113524
]
Apache Spark commented on SPARK-1912:
-------------------------------------
User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/2179
> Compression memory issue during reduce
> --------------------------------------
>
> Key: SPARK-1912
> URL: https://issues.apache.org/jira/browse/SPARK-1912
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Reporter: Wenchen Fan
> Assignee: Wenchen Fan
> Fix For: 0.9.2, 1.0.1, 1.1.0
>
>
> When we need to read a compressed block, we will first create a compress
> stream instance(LZF or Snappy) and use it to wrap that block.
> Let's say a reducer task need to read 1000 local shuffle blocks, it will
> first prepare to read that 1000 blocks, which means create 1000 compression
> stream instance to wrap them. But the initialization of compression instance
> will allocate some memory and when we have many compression instance at the
> same time, it is a problem.
> Actually reducer reads the shuffle blocks one by one, so why we create
> compression instance at the first time? Can we do it lazily that when a block
> is first read, create compression instance for it.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]