It’s a good suggestion, however I don’t think there is a mechanism for TTLs in notebooks and most things in notebooks might not be safe to recompute, unlike if we delete shuffle files.
Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ <https://www.fighthealthinsurance.com/?q=hk_email> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Wed, Oct 16, 2024 at 11:55 AM Reynold Xin <r...@databricks.com> wrote: > Thanks for bringing this up. Wouldn't it be better for the notebooks to > control when these DFs/RDDs expire so they can do fine granular control? > > On Wed, Oct 16, 2024 at 7:51 AM Holden Karau <holden.ka...@gmail.com> > wrote: > >> Hi Spark Devs, >> >> So back in Spark 1.X we had shuffle TTLs, but they did not take into >> account last access times. With the increased use of notebooks where >> dataframes & rdds are more likely to be defined at the global scope I was >> thinking it could be a good time to try and re-introduce shuffle TTLs but >> with a last accessed mechanism so I've filed >> https://issues.apache.org/jira/browse/SPARK-49788 -- I'd love to get >> folks feedback before I put in too much effort here. >> >> Cheers, >> >> Holden :) >> >> -- >> Twitter: https://twitter.com/holdenkarau >> Fight Health Insurance: https://www.fighthealthinsurance.com/ >> <https://www.fighthealthinsurance.com/?q=hk_email> >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> Pronouns: she/her >> >