Are you saying that even with the spark.cleaner.ttl set your files are not
getting cleaned up?

TD

On Thu, Apr 2, 2015 at 8:23 AM, andrem <amesa...@gmail.com> wrote:

> Apparently Spark Streaming 1.3.0 is not cleaning up its internal files and
> the worker nodes eventually run out of inodes.
> We see tons of old shuffle_*.data and *.index files that are never deleted.
> How do we get Spark to remove these files?
>
> We have a simple standalone app with one RabbitMQ receiver and a two node
> cluster (2 x r3large AWS instances).
> Batch interval is 10 minutes after which we process data and write results
> to DB. No windowing or state mgmt is used.
>
> I've poured over the documentation and tried setting the following
> properties but they have not helped.
> As a work around we're using a cron script that periodically cleans up old
> files but this has a bad smell to it.
>
> SPARK_WORKER_OPTS in spark-env.sh on every worker node
>   spark.worker.cleanup.enabled true
>   spark.worker.cleanup.interval
>   spark.worker.cleanup.appDataTtl
>
> Also tried on the driver side:
>   spark.cleaner.ttl
>   spark.shuffle.consolidateFiles true
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Worker-runs-out-of-inodes-tp22355.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to