[
https://issues.apache.org/jira/browse/SPARK-9419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-9419:
-----------------------------------
Assignee: Apache Spark (was: Josh Rosen)
> ShuffleMemoryManager and MemoryStore should track memory on a per-task, not
> per-thread, basis
> ---------------------------------------------------------------------------------------------
>
> Key: SPARK-9419
> URL: https://issues.apache.org/jira/browse/SPARK-9419
> Project: Spark
> Issue Type: Bug
> Components: Block Manager, Spark Core
> Reporter: Josh Rosen
> Assignee: Apache Spark
> Priority: Critical
>
> Spark's ShuffleMemoryManager and MemoryStore track memory on a per-thread
> basis, which causes problems in the handful of cases where we have tasks that
> use multiple threads. In PythonRDD, RRDD, ScriptTransformation, and PipedRDD
> we consume the input iterator in a separate thread in order to write it to an
> external process. As a result, these RDD's input iterators are consumed in a
> different thread than the thread that created them, which can cause problems
> in our memory allocation tracking. For example, if allocations are performed
> in one thread but deallocations are performed in a separate thread then
> memory may be leaked or we may get errors complaining that more memory was
> allocated than was freed.
> I think that the right way to fix this is to change our accounting to be
> performed on a per-task instead of per-thread basis. Note that the current
> per-thread tracking has caused problems in the past; SPARK-3731 (#2668) fixes
> a memory leak in PythonRDD that was caused by this issue (that fix is no
> longer necessary as of this patch).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]