What was the state of your streaming application? Was it falling behind
with a large increasing scheduling delay?

TD

On Thu, Apr 23, 2015 at 11:31 AM, N B <nb.nos...@gmail.com> wrote:

> Thanks for the response, Conor. I tried with those settings and for a
> while it seemed like it was cleaning up shuffle files after itself.
> However, after exactly 5 hours later it started throwing exceptions and
> eventually stopped working again. A sample stack trace is below. What is
> curious about 5 hours is that I set the cleaner ttl to 5 hours after
> changing the max window size to 1 hour (down from 6 hours in order to
> test). It also stopped cleaning the shuffle files after this started
> happening.
>
> Any idea why this could be happening?
>
> 2015-04-22 17:39:52,040 ERROR Executor task launch worker-989
> Executor.logError - Exception in task 0.0 in stage 215425.0 (TID 425147)
> java.lang.Exception: Could not compute split, block input-0-1429706099000
> not found
>         at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>         at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>         at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:56)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:198)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
> Thanks
> NB
>
>
> On Tue, Apr 21, 2015 at 5:14 AM, Conor Fennell <
> conor.fenn...@altocloud.com> wrote:
>
>> Hi,
>>
>>
>> We set the spark.cleaner.ttl to some reasonable time and also
>> set spark.streaming.unpersist=true.
>>
>>
>> Those together cleaned up the shuffle files for us.
>>
>>
>> -Conor
>>
>> On Tue, Apr 21, 2015 at 8:18 AM, N B <nb.nos...@gmail.com> wrote:
>>
>>> We already do have a cron job in place to clean just the shuffle files.
>>> However, what I would really like to know is whether there is a "proper"
>>> way of telling spark to clean up these files once its done with them?
>>>
>>> Thanks
>>> NB
>>>
>>>
>>> On Mon, Apr 20, 2015 at 10:47 AM, Jeetendra Gangele <
>>> gangele...@gmail.com> wrote:
>>>
>>>> Write a crone job for this like below
>>>>
>>>> 12 * * * *  find $SPARK_HOME/work -cmin +1440 -prune -exec rm -rf {} \+
>>>> 32 * * * *  find /tmp -type d -cmin +1440 -name "spark-*-*-*" -prune
>>>> -exec rm -rf {} \+
>>>> 52 * * * *  find $SPARK_LOCAL_DIR -mindepth 1 -maxdepth 1 -type d -cmin
>>>> +1440 -name "spark-*-*-*" -prune -exec rm -rf {} \+
>>>>
>>>>
>>>> On 20 April 2015 at 23:12, N B <nb.nos...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I had posed this query as part of a different thread but did not get a
>>>>> response there. So creating a new thread hoping to catch someone's
>>>>> attention.
>>>>>
>>>>> We are experiencing this issue of shuffle files being left behind and
>>>>> not being cleaned up by Spark. Since this is a Spark streaming 
>>>>> application,
>>>>> it is expected to stay up indefinitely, so shuffle files not being cleaned
>>>>> up is a big problem right now. Our max window size is 6 hours, so we have
>>>>> set up a cron job to clean up shuffle files older than 12 hours otherwise
>>>>> it will eat up all our disk space.
>>>>>
>>>>> Please see the following. It seems the non-cleaning of shuffle files
>>>>> is being documented in 1.3.1.
>>>>>
>>>>> https://github.com/apache/spark/pull/5074/files
>>>>> https://issues.apache.org/jira/browse/SPARK-5836
>>>>>
>>>>>
>>>>> Also, for some reason, the following JIRAs that were reported as
>>>>> functional issues were closed as Duplicates of the above Documentation 
>>>>> bug.
>>>>> Does this mean that this issue won't be tackled at all?
>>>>>
>>>>> https://issues.apache.org/jira/browse/SPARK-3563
>>>>> https://issues.apache.org/jira/browse/SPARK-4796
>>>>> https://issues.apache.org/jira/browse/SPARK-6011
>>>>>
>>>>> Any further insight into whether this is being looked into and
>>>>> meanwhile how to handle shuffle files will be greatly appreciated.
>>>>>
>>>>> Thanks
>>>>> NB
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to