The cleaner ttl was introduced as a "brute force" method to clean all old
data and metadata in the system, so that the system can run 24/7. The
cleaner ttl should be set to a large value, so that RDDs older than that
are not used. Though there are some cases where you may want to use an RDD
again and again for an infinite duration, in which case no value of TTL is
good enough. There are way around it - if you are going to use an RDD
beyond the tll value, then you will have to recreate the RDD after some
interval.

However, the ideal solution is to actually is to identify the stuff (RDD,
broadcasts, etc.) that are ready to be GCed and then clear their associated
data. This is currently being implemented as a part of this PR.
https://github.com/apache/spark/pull/126
This will make setting the cleaner TTL unnecessary and reduce errors
related to cleaning up of needed RDDS.






On Thu, Mar 27, 2014 at 3:47 PM, Evgeny Shishkin <itparan...@gmail.com>wrote:

>
> On 28 Mar 2014, at 01:44, Tathagata Das <tathagata.das1...@gmail.com>
> wrote:
>
> The more I think about it the problem is not about /tmp, its more about
> the workers not having enough memory. Blocks of received data could be
> falling out of memory before it is getting processed.
> BTW, what is the storage level that you are using for your input stream?
> If you are using MEMORY_ONLY, then try MEMORY_AND_DISK. That is safer
> because it ensure that if received data falls out of memory it will be at
> least saved to disk.
>
> TD
>
>
> And i saw such errors because of cleaner.rtt.
> Thich erases everything. Even needed rdds.
>
>
>
>
> On Thu, Mar 27, 2014 at 2:29 PM, Scott Clasen <scott.cla...@gmail.com>wrote:
>
>> Heh sorry that wasnt a clear question, I know 'how' to set it but dont
>> know
>> what value to use in a mesos cluster, since the processes are running in
>> lxc
>> containers they wont be sharing a filesystem (or machine for that matter)
>>
>> I cant use an s3n:// url for local dir can I?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Kafka-Mesos-Marathon-strangeness-tp3285p3373.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>
>

Reply via email to