Re: How to clear spark Shuffle files

Edward Mitchell Mon, 14 Sep 2020 13:33:43 -0700

We've also had some similar disk fill issues.

For Java/Scala RDDs, shuffle file cleanup is done as part of the JVM
garbage collection. I've noticed that if RDDs maintain references in the
code, and cannot be garbage collected, then immediate shuffle files hang
around.

Best way to handle this is by organizing your code such that when an RDD is
finished, it falls out of scope, and thus is able to be garbage collected.

There's also an experimental API created in Spark 3 (I think), that allows
you to have more granular control by calling a method to clean up the
shuffle files.

On Mon, Sep 14, 2020 at 11:02 AM lsn248 <lekshmi.s...@gmail.com> wrote:

> Hi,
>
>  I have a long running application and spark seem to fill up the disk with
> shuffle files.  Eventually the job fails running out of disk space. Is
> there
> a way for me to clean the shuffle files ?
>
> Thanks
>
>
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: How to clear spark Shuffle files

Reply via email to