Right, I remember now, the only problematic case is when things go bad
and the cleaner is not executed.
Also, it can be a problem when reusing the same sparkcontext for many runs.
Guillaume
It cleans the work dir, and SPARK_LOCAL_DIRS should be cleaned
automatically. From the source code comments:
// SPARK_LOCAL_DIRS environment variable, and deleted by the Worker when the
// application finishes.
On 13.04.2015, at 11:26, Guillaume Pitel <guillaume.pi...@exensa.com
<mailto:guillaume.pi...@exensa.com>> wrote:
Does it also cleanup spark local dirs ? I thought it was only
cleaning $SPARK_HOME/work/
Guillaume
I have set SPARK_WORKER_OPTS in spark-env.sh for that. For example:
export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true
-Dspark.worker.cleanup.appDataTtl=<seconds>"
On 11.04.2015, at 00:01, Wang, Ningjun (LNG-NPV)
<ningjun.w...@lexisnexis.com <mailto:ningjun.w...@lexisnexis.com>>
wrote:
Does anybody have an answer for this?
Thanks
Ningjun
*From:*Wang, Ningjun (LNG-NPV)
*Sent:*Thursday, April 02, 2015 12:14 PM
*To:*user@spark.apache.org <mailto:user@spark.apache.org>
*Subject:*Is the disk space in SPARK_LOCAL_DIRS cleanned up?
I set SPARK_LOCAL_DIRS to C:\temp\spark-temp. When RDDs are
shuffled, spark writes to this folder. I found that the disk space
of this folder keep on increase quickly and at certain point I will
run out of disk space.
I wonder does spark clean up the disk spacein this folder once the
shuffle operation is done? If not, I need to write a job to clean
it up myself. But how do I know which sub folders there can be removed?
Ningjun
--
<exensa_logo_mail.png>
*Guillaume PITEL, Président*
+33(0)626 222 431
eXenSa S.A.S. <http://www.exensa.com/>
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)184 163 677 / Fax +33(0)972 283 705
--
eXenSa
*Guillaume PITEL, Président*
+33(0)626 222 431
eXenSa S.A.S. <http://www.exensa.com/>
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)184 163 677 / Fax +33(0)972 283 705