[jira] [Commented] (SPARK-1912) Compression memory issue during reduce

Apache Spark (JIRA) Thu, 28 Aug 2014 01:09:38 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14113524#comment-14113524
 ]


Apache Spark commented on SPARK-1912:
-------------------------------------

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/2179

> Compression memory issue during reduce
> --------------------------------------
>
>                 Key: SPARK-1912
>                 URL: https://issues.apache.org/jira/browse/SPARK-1912
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Wenchen Fan
>            Assignee: Wenchen Fan
>             Fix For: 0.9.2, 1.0.1, 1.1.0
>
>
> When we need to read a compressed block, we will first create a compress 
> stream instance(LZF or Snappy) and use it to wrap that block.
> Let's say a reducer task need to read 1000 local shuffle blocks, it will 
> first prepare to read that 1000 blocks, which means create 1000 compression 
> stream instance to wrap them. But the initialization of compression instance 
> will allocate some memory and when we have many compression instance at the 
> same time, it is a problem.
> Actually reducer reads the shuffle blocks one by one, so why we create 
> compression instance at the first time? Can we do it lazily that when a block 
> is first read, create compression instance for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-1912) Compression memory issue during reduce

Reply via email to