[ https://issues.apache.org/jira/browse/SPARK-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matei Zaharia resolved SPARK-2530. ---------------------------------- Resolution: Fixed Fix Version/s: 1.1.0 This was fixed by SPARK-2711. > Relax incorrect assumption of one ExternalAppendOnlyMap per thread > ------------------------------------------------------------------ > > Key: SPARK-2530 > URL: https://issues.apache.org/jira/browse/SPARK-2530 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.0.1 > Reporter: Andrew Or > Fix For: 1.1.0 > > > Originally reported by Matei. > Our current implementation of EAOM assumes only one map is created per task. > This is not true in the following case, however: > {code} > rdd1.join(rdd2).reduceByKey(...) > {code} > This is because reduce by key does a map side combine, which creates an EAOM > that streams from an EAOM previously created by the same thread to aggregate > values from the join. > The more concerning thing is the following: we currently maintain a global > shuffle memory map (thread ID -> memory used by that thread to shuffle). If > we create two EAOMs in the same thread, the memory occupied by the first map > may be clobbered by that occupied by the second. This has very adverse > consequences if the first map is huge but the second is just starting out, in > which case we end up believing that we use much less memory than we actually > do. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org