[
https://issues.apache.org/jira/browse/SPARK-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14490636#comment-14490636
]
Imran Rashid commented on SPARK-6839:
-------------------------------------
[~ilganeli] sorry I am already on it! I should have marked it as in progress.
I'm just about to submit a patch -- maybe you can help review?
and yes, the patch unfortunately has to touch a huge amount of code to pass in
something to clean up the resources in all the possible places ... :/
> BlockManager.dataDeserialize leaks resources on user exceptions
> ---------------------------------------------------------------
>
> Key: SPARK-6839
> URL: https://issues.apache.org/jira/browse/SPARK-6839
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Reporter: Imran Rashid
>
> From a discussion with [~vanzin] on {{ByteBufferInputStream}}, we realized
> that
> [{{BlockManager.dataDeserialize}}|https://github.com/apache/spark/blob/b5c51c8df480f1a82a82e4d597d8eea631bffb4e/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1202]
> doesn't guarantee the underlying InputStream is properly closed. In
> particular, {{BlockManager.dispose(byteBuffer)}} will not get called any time
> there is an exception in user code.
> The problem is that right now, we convert the input streams to iterators, and
> only close the input stream if the end of the iterator is reached. But, we
> might never reach the end of the iterator -- the obvious case is if there is
> a bug in the user code, so tasks fail part of the way through the iterator.
> I think the solution is to give {{BlockManager.dataDeserialize}} a
> {{TaskContext}} so it can call {{context.addTaskCompletionListener}} to do
> the cleanup (as is done in {{ShuffleBlockFetcherIterator}}).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]