Hi Deb,

If you don't have long-running Spark applications (those taking more than
spark.worker.cleanup.appDataTtl) then the TTL-based cleaner is a good
solution.  If however you have a mix of long-running and short-running
applications, then the TTL-based solution will fail.  It will clean up data
from applications that are still running, which causes problems.

http://spark.apache.org/docs/latest/spark-standalone.html

To make this work properly, we need the worker directory cleanup to only
clean up directories from terminated applications and leave directories
from running applications regardless of their age.  This is tracked at
https://issues.apache.org/jira/browse/SPARK-1860


On Wed, Aug 13, 2014 at 9:47 PM, Debasish Das <debasish.da...@gmail.com>
wrote:

> Hi,
>
> I have set up the SPARK_LOCAL_DIRS option in spark-env.sh so that Spark
> can use more shuffle space...
>
> Does Spark cleans all the shuffle files once the runs are done ? Seems to
> me that the shuffle files are not cleaned...
>
> Do I need to set this variable ? spark.cleaner.ttl
>
> Right now we are planning to use logRotate to clean up the shuffle
> files..is that a good practice ?
>
> Thanks.
> Deb
>
>

Reply via email to