Checkpoint data is left behind after a normal shutdown, not just an
unexpected shutdown. The PR description includes a simple demonstration of
this.

If the current behavior is truly intended -- which I find difficult to
believe given how confusing <https://stackoverflow.com/q/52630858/877069> it
<https://stackoverflow.com/q/60009856/877069> is
<https://stackoverflow.com/q/61454740/877069> -- then at the very least we
need to update the documentation for both `.checkpoint()` and
`cleanCheckpoints` to make that clear.

> This way even after an unexpected exit the next run of the same app
should be able to pick up the checkpointed data.

The use case you are describing potentially makes sense. But preserving
checkpoint data after an unexpected shutdown -- even when
`cleanCheckpoints` is set to true -- is a new guarantee that is not
currently expressed in the API or documentation. At least as far as I can
tell.

On Wed, Mar 10, 2021 at 3:10 PM Attila Zsolt Piros <
piros.attila.zs...@gmail.com> wrote:

> Hi Nick!
>
> I am not sure you are fixing a problem here. I think what you see is as
> problem is actually an intended behaviour.
>
> Checkpoint data should outlive the unexpected shutdowns. So there is a
> very important difference between the reference goes out of scope during a
> normal execution (in this case cleanup is expected depending on the config
> you mentioned) and when a references goes out of scope because of an
> unexpected error (in this case you should keep the checkpoint data).
>
> This way even after an unexpected exit the next run of the same app should
> be able to pick up the checkpointed data.
>
> Best Regards,
> Attila
>
>
>
>
> On Wed, Mar 10, 2021 at 8:10 PM Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> Hello people,
>>
>> I'm working on a fix for SPARK-33000
>> <https://issues.apache.org/jira/browse/SPARK-33000>. Spark does not
>> cleanup checkpointed RDDs/DataFrames on shutdown, even if the appropriate
>> configs are set.
>>
>> In the course of developing a fix, another contributor pointed out
>> <https://github.com/apache/spark/pull/31742#issuecomment-790987483> that
>> checkpointed data may not be the only type of resource that needs a fix for
>> shutdown cleanup.
>>
>> I'm looking for a committer who might have an opinion on how Spark should
>> clean up disk-based resources on shutdown. The last people who contributed
>> significantly to the ContextCleaner, where this cleanup happens, were
>> @witgo <https://github.com/witgo> and @andrewor14
>> <https://github.com/andrewor14>. But that was ~6 years ago, and I don't
>> think they are active on the project anymore.
>>
>> Any takers to take a look and give their thoughts? The PR is small
>> <https://github.com/apache/spark/pull/31742>. +39 / -2.
>>
>> Nick
>>
>>

Reply via email to