[
https://issues.apache.org/jira/browse/SPARK-26713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-26713.
-------------------------------
Resolution: Fixed
Fix Version/s: 3.0.0
Issue resolved by pull request 23638
[https://github.com/apache/spark/pull/23638]
> PipedRDD may holds stdin writer and stdout read threads even if the task is
> finished
> ------------------------------------------------------------------------------------
>
> Key: SPARK-26713
> URL: https://issues.apache.org/jira/browse/SPARK-26713
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.1.3, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.4.0
> Reporter: Xianjin YE
> Assignee: Xianjin YE
> Priority: Major
> Fix For: 3.0.0
>
>
> During an investigation of OOM of one internal production job, I found that
> PipedRDD leaks memory. After some digging, the problem lies down to the fact
> that PipedRDD doesn't release stdin writer and stdout threads even if the
> task is finished.
>
> PipedRDD creates two threads: stdin writer and stdout reader. If we are lucky
> and the task is finished normally, these two threads exit normally. If the
> subprocess(pipe command) is failed, the task will be marked failed, however
> the stdin writer will be still running until it consumes its parent RDD's
> iterator. There is even a race condition with ShuffledRDD + PipedRDD: the
> ShuffleBlockFetchIterator is cleaned up at task completion and hangs stdin
> writer thread, which leaks memory.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]