[
https://issues.apache.org/jira/browse/SPARK-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14365897#comment-14365897
]
Apache Spark commented on SPARK-5836:
-------------------------------------
User 'ilganeli' has created a pull request for this issue:
https://github.com/apache/spark/pull/5074
> Highlight in Spark documentation that by default Spark does not delete its
> temporary files
> ------------------------------------------------------------------------------------------
>
> Key: SPARK-5836
> URL: https://issues.apache.org/jira/browse/SPARK-5836
> Project: Spark
> Issue Type: Improvement
> Components: Documentation
> Reporter: Tomasz Dudziak
>
> We recently learnt the hard way (in a prod system) that Spark by default does
> not delete its temporary files until it is stopped. WIthin a relatively short
> time span of heavy Spark use the disk of our prod machine filled up
> completely because of multiple shuffle files written to it. We think there
> should be better documentation around the fact that after a job is finished
> it leaves a lot of rubbish behind so that this does not come as a surprise.
> Probably a good place to highlight that fact would be the documentation of
> {{spark.local.dir}} property, which controls where Spark temporary files are
> written.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]