There's a second new mechanism which uses TTL for cleanup of shuffle files. Can you share more about your use case?
On Mon, Sep 14, 2020 at 1:33 PM Edward Mitchell <edee...@gmail.com> wrote: > We've also had some similar disk fill issues. > > For Java/Scala RDDs, shuffle file cleanup is done as part of the JVM > garbage collection. I've noticed that if RDDs maintain references in the > code, and cannot be garbage collected, then immediate shuffle files hang > around. > > Best way to handle this is by organizing your code such that when an RDD > is finished, it falls out of scope, and thus is able to be garbage > collected. > > There's also an experimental API created in Spark 3 (I think), that allows > you to have more granular control by calling a method to clean up the > shuffle files. > > On Mon, Sep 14, 2020 at 11:02 AM lsn248 <lekshmi.s...@gmail.com> wrote: > >> Hi, >> >> I have a long running application and spark seem to fill up the disk with >> shuffle files. Eventually the job fails running out of disk space. Is >> there >> a way for me to clean the shuffle files ? >> >> Thanks >> >> >> >> >> >> -- >> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau