Tomasz Dudziak created SPARK-5836:
-------------------------------------

             Summary: Highlight in Spark documentation that by default it does 
not delete its temporary files
                 Key: SPARK-5836
                 URL: https://issues.apache.org/jira/browse/SPARK-5836
             Project: Spark
          Issue Type: Improvement
          Components: Documentation
            Reporter: Tomasz Dudziak


We recently learnt the hard way (in a prod system) that Spark by default does 
not delete its temporary files until it is stopped. WIthin a relatively short 
time span of heavy Spark use the disk of our prod machine filled up completely 
because of multiple shuffle files written to it. We think there should be 
better documentation around the fact that after a job is finished it leaves a 
lot of rubbish behind so that this does not come as a surprise.

Probably a good place to highlight that fact would be the documentation of 
{{spark.local.dir}} property, which controls where Spark temporary files are 
written. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to