[ 
https://issues.apache.org/jira/browse/FLINK-26388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504243#comment-17504243
 ] 

Dawid Wysakowicz commented on FLINK-26388:
------------------------------------------

I checked it again and it seems to work fine. One last question. What are the 
resources subject to the repeatable cleanup?  Could we list them in the docs 
added in FLINK-26296?

I am asking in the context of completed checkpoints that could not have been 
discarded. I see we do not retry to delete them. It might be fine if we just 
list what will be retried.

> Release Testing: Repeatable Cleanup (FLINK-25433)
> -------------------------------------------------
>
>                 Key: FLINK-26388
>                 URL: https://issues.apache.org/jira/browse/FLINK-26388
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.0
>            Reporter: Matthias Pohl
>            Assignee: Dawid Wysakowicz
>            Priority: Blocker
>              Labels: release-testing
>             Fix For: 1.15.0
>
>
> Repeatable cleanup got introduced with 
> [FLIP-194|https://issues.apache.org/jira/projects/FLINK/issues/FLINK-26284?filter=allopenissues]
>  but should be considered as an independent feature of the {{JobResultStore}} 
> (JRS) from a user's point of view.
> Repeatable cleanup can be triggered by running into an error while cleaning 
> up. This can be achieved by disabling access to S3 after the job finished, 
> e.g.:
> * Setting a reasonable enough checkpointing time (checkpointing should be 
> enabled to allow cleanup of s3)
> * Disable s3 (removing permissions or shutting down the s3 server)
> * Stop job with savepoint
> Stopping the job should work but the logs should show failure with repeating 
> retries. Enabling S3 again should fix the issue.
> Keep in mind that if testing this in with HA, you should use a different 
> bucket for the file-based JRS artifacts only change permissions for the 
> bucket that holds JRS-unrelated artifacts. Flink would fail fatally if the 
> JRS is not able to access it's backend storage.
> Documentation and configuration is still in the process of being updated in 
> FLINK-26296 and FLINK-26331



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to