[ 
https://issues.apache.org/jira/browse/SPARK-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359294#comment-14359294
 ] 

Ilya Ganelin commented on SPARK-4927:
-------------------------------------

Are you running over yarn? My theory is that the memory usage has to do with 
data movement between nodes.



Sent with Good (www.good.com)




> Spark does not clean up properly during long jobs. 
> ---------------------------------------------------
>
>                 Key: SPARK-4927
>                 URL: https://issues.apache.org/jira/browse/SPARK-4927
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Ilya Ganelin
>
> On a long running Spark job, Spark will eventually run out of memory on the 
> driver node due to metadata overhead from the shuffle operation. Spark will 
> continue to operate, however with drastically decreased performance (since 
> swapping now occurs with every operation).
> The spark.cleanup.tll parameter allows a user to configure when cleanup 
> happens but the issue with doing this is that it isn’t done safely, e.g. If 
> this clears a cached RDD or active task in the middle of processing a stage, 
> this ultimately causes a KeyNotFoundException when the next stage attempts to 
> reference the cleared RDD or task.
> There should be a sustainable mechanism for cleaning up stale metadata that 
> allows the program to continue running. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to