Hello people, I'm working on a fix for SPARK-33000 <https://issues.apache.org/jira/browse/SPARK-33000>. Spark does not cleanup checkpointed RDDs/DataFrames on shutdown, even if the appropriate configs are set.
In the course of developing a fix, another contributor pointed out <https://github.com/apache/spark/pull/31742#issuecomment-790987483> that checkpointed data may not be the only type of resource that needs a fix for shutdown cleanup. I'm looking for a committer who might have an opinion on how Spark should clean up disk-based resources on shutdown. The last people who contributed significantly to the ContextCleaner, where this cleanup happens, were @witgo <https://github.com/witgo> and @andrewor14 <https://github.com/andrewor14>. But that was ~6 years ago, and I don't think they are active on the project anymore. Any takers to take a look and give their thoughts? The PR is small <https://github.com/apache/spark/pull/31742>. +39 / -2. Nick