Tomasz Dudziak created SPARK-5836:
-------------------------------------
Summary: Highlight in Spark documentation that by default it does
not delete its temporary files
Key: SPARK-5836
URL: https://issues.apache.org/jira/browse/SPARK-5836
Project: Spark
Issue Type: Improvement
Components: Documentation
Reporter: Tomasz Dudziak
We recently learnt the hard way (in a prod system) that Spark by default does
not delete its temporary files until it is stopped. WIthin a relatively short
time span of heavy Spark use the disk of our prod machine filled up completely
because of multiple shuffle files written to it. We think there should be
better documentation around the fact that after a job is finished it leaves a
lot of rubbish behind so that this does not come as a surprise.
Probably a good place to highlight that fact would be the documentation of
{{spark.local.dir}} property, which controls where Spark temporary files are
written.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]