[
https://issues.apache.org/jira/browse/SPARK-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-4927.
------------------------------
Resolution: Cannot Reproduce
At the moment I've tried to reproduce this a few ways and wasn't able to. It
may have been fixed somehow since. It can be reopened if there is a
reproduction vs 1.3+
> Spark does not clean up properly during long jobs.
> ---------------------------------------------------
>
> Key: SPARK-4927
> URL: https://issues.apache.org/jira/browse/SPARK-4927
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.1.0
> Reporter: Ilya Ganelin
>
> On a long running Spark job, Spark will eventually run out of memory on the
> driver node due to metadata overhead from the shuffle operation. Spark will
> continue to operate, however with drastically decreased performance (since
> swapping now occurs with every operation).
> The spark.cleanup.tll parameter allows a user to configure when cleanup
> happens but the issue with doing this is that it isn’t done safely, e.g. If
> this clears a cached RDD or active task in the middle of processing a stage,
> this ultimately causes a KeyNotFoundException when the next stage attempts to
> reference the cleared RDD or task.
> There should be a sustainable mechanism for cleaning up stale metadata that
> allows the program to continue running.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]