Re: Shuffle TTLs

Holden Karau Wed, 16 Oct 2024 17:08:51 -0700

It’s a good suggestion, however I don’t think there is a mechanism for TTLs
in notebooks and most things in notebooks might not be safe to recompute,
unlike if we delete shuffle files.


Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/
<https://www.fighthealthinsurance.com/?q=hk_email>
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her


On Wed, Oct 16, 2024 at 11:55 AM Reynold Xin <[email protected]> wrote:

> Thanks for bringing this up. Wouldn't it be better for the notebooks to
> control when these DFs/RDDs expire so they can do fine granular control?
>
> On Wed, Oct 16, 2024 at 7:51 AM Holden Karau <[email protected]>
> wrote:
>
>> Hi Spark Devs,
>>
>> So back in Spark 1.X we had shuffle TTLs, but they did not take into
>> account last access times. With the increased use of notebooks where
>> dataframes & rdds are more likely to be defined at the global scope I was
>> thinking it could be a good time to try and re-introduce shuffle TTLs but
>> with a last accessed mechanism so I've filed
>> https://issues.apache.org/jira/browse/SPARK-49788 -- I'd love to get
>> folks feedback before I put in too much effort here.
>>
>> Cheers,
>>
>> Holden :)
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> <https://www.fighthealthinsurance.com/?q=hk_email>
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>

Re: Shuffle TTLs

Reply via email to