[ 
https://issues.apache.org/jira/browse/SPARK-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia resolved SPARK-2530.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 1.1.0

This was fixed by SPARK-2711.

> Relax incorrect assumption of one ExternalAppendOnlyMap per thread
> ------------------------------------------------------------------
>
>                 Key: SPARK-2530
>                 URL: https://issues.apache.org/jira/browse/SPARK-2530
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.1
>            Reporter: Andrew Or
>             Fix For: 1.1.0
>
>
> Originally reported by Matei.
> Our current implementation of EAOM assumes only one map is created per task. 
> This is not true in the following case, however:
> {code}
> rdd1.join(rdd2).reduceByKey(...)
> {code}
> This is because reduce by key does a map side combine, which creates an EAOM 
> that streams from an EAOM previously created by the same thread to aggregate 
> values from the join.
> The more concerning thing is the following: we currently maintain a global 
> shuffle memory map (thread ID -> memory used by that thread to shuffle). If 
> we create two EAOMs in the same thread, the memory occupied by the first map 
> may be clobbered by that occupied by the second. This has very adverse 
> consequences if the first map is huge but the second is just starting out, in 
> which case we end up believing that we use much less memory than we actually 
> do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to