[
https://issues.apache.org/jira/browse/SPARK-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337128#comment-14337128
]
Sean Owen commented on SPARK-5836:
----------------------------------
I'd like to take this up, since I've heard versions of this come up frequently
lately. A first step is indeed improving documentation. I want to confirm or
deny things I only sort of know about how temp files are treated.
- Temp files/dirs created by executors may live as long as the executors, but
should be deleted with executors?
- Shuffle files however may live longer?
- {{spark.cleaner.ttl}} is relevant to this or no?
If we believe that temp files die when they should (er, well, [~vanzin] is
fixing a few things around temp dirs right now), then is the surprising thing
here the life of shuffle files?
In which case maybe [~ilganeli] can cover this when writing up some basics
about how the shuffle works?
But I want to figure out definitively what the right thing is to say about
behavior right now, even if the behavior should or could be different in the
future.
CC [~sandyr]
> Highlight in Spark documentation that by default Spark does not delete its
> temporary files
> ------------------------------------------------------------------------------------------
>
> Key: SPARK-5836
> URL: https://issues.apache.org/jira/browse/SPARK-5836
> Project: Spark
> Issue Type: Improvement
> Components: Documentation
> Reporter: Tomasz Dudziak
>
> We recently learnt the hard way (in a prod system) that Spark by default does
> not delete its temporary files until it is stopped. WIthin a relatively short
> time span of heavy Spark use the disk of our prod machine filled up
> completely because of multiple shuffle files written to it. We think there
> should be better documentation around the fact that after a job is finished
> it leaves a lot of rubbish behind so that this does not come as a surprise.
> Probably a good place to highlight that fact would be the documentation of
> {{spark.local.dir}} property, which controls where Spark temporary files are
> written.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]