Are you saying that even with the spark.cleaner.ttl set your files are not getting cleaned up?
TD On Thu, Apr 2, 2015 at 8:23 AM, andrem <amesa...@gmail.com> wrote: > Apparently Spark Streaming 1.3.0 is not cleaning up its internal files and > the worker nodes eventually run out of inodes. > We see tons of old shuffle_*.data and *.index files that are never deleted. > How do we get Spark to remove these files? > > We have a simple standalone app with one RabbitMQ receiver and a two node > cluster (2 x r3large AWS instances). > Batch interval is 10 minutes after which we process data and write results > to DB. No windowing or state mgmt is used. > > I've poured over the documentation and tried setting the following > properties but they have not helped. > As a work around we're using a cron script that periodically cleans up old > files but this has a bad smell to it. > > SPARK_WORKER_OPTS in spark-env.sh on every worker node > spark.worker.cleanup.enabled true > spark.worker.cleanup.interval > spark.worker.cleanup.appDataTtl > > Also tried on the driver side: > spark.cleaner.ttl > spark.shuffle.consolidateFiles true > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Worker-runs-out-of-inodes-tp22355.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >