[ 
https://issues.apache.org/jira/browse/SPARK-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14490576#comment-14490576
 ] 

Ilya Ganelin edited comment on SPARK-6839 at 4/11/15 12:09 AM:
---------------------------------------------------------------

The obvious solution won't work. 

Adding a {{TaskContext}} to {{dataSerialize()}} won't work because it's called 
from within both {{MemoryStore}} and {{TachyonStore}} which are instantiated 
within the {{BlockManager}} constructor. The {{TaskContext}} also can't be 
created within the constructor for {{BlockManager}} since that's created within 
the {{SparkEnv}} constructor which has no tasks associated with it.

The only workable solution that I can see is to assign a {{TaskContext}} to the 
{{BlockManager}} at run-time but that sounds very sketchy to me since the block 
manager is a singleton and we may have multiple tasks going at once. Any 
thoughts on this conundrum?


was (Author: ilganeli):
The obvious solution won't work. 

Adding a {code}TaskContext{code} to {code}dataSerialize(){code} won't work 
because it's called from within both {code}MemoryStore{code} and 
{code}TachyonStore{code} which are instantiated within the 
{code}BlockManager{code} constructor. The {code}TaskContext{code} also can't be 
created within the constructor for {code}BlockManager{code} since that's 
created within the {code}SparkEnv{code} constructor which has no tasks 
associated with it.

The only workable solution that I can see is to assign a 
{code}TaskContext{code} to the {code}BlockManager{code} at run-time but that 
sounds very sketchy to me since the block manager is a singleton and we may 
have multiple tasks going at once. Any thoughts on this conundrum?

> BlockManager.dataDeserialize leaks resources on user exceptions
> ---------------------------------------------------------------
>
>                 Key: SPARK-6839
>                 URL: https://issues.apache.org/jira/browse/SPARK-6839
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Imran Rashid
>
> From a discussion with [~vanzin] on {{ByteBufferInputStream}}, we realized 
> that 
> [{{BlockManager.dataDeserialize}}|https://github.com/apache/spark/blob/b5c51c8df480f1a82a82e4d597d8eea631bffb4e/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1202]
>  doesn't  guarantee the underlying InputStream is properly closed.  In 
> particular, {{BlockManager.dispose(byteBuffer)}} will not get called any time 
> there is an exception in user code.
> The problem is that right now, we convert the input streams to iterators, and 
> only close the input stream if the end of the iterator is reached.  But, we 
> might never reach the end of the iterator -- the obvious case is if there is 
> a bug in the user code, so tasks fail part of the way through the iterator.
> I think the solution is to give {{BlockManager.dataDeserialize}} a 
> {{TaskContext}} so it can call {{context.addTaskCompletionListener}} to do 
> the cleanup (as is done in {{ShuffleBlockFetcherIterator}}).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to