[ 
https://issues.apache.org/jira/browse/FLINK-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103518#comment-17103518
 ] 

Steven Zhen Wu edited comment on FLINK-17571 at 5/9/20, 10:59 PM:
------------------------------------------------------------------

[~pnowojski] what is the usage of the remove command?

Please correct my understanding on incremental checkpoint.
 * It removes S3 files when reference count reaching zero. Normally, there 
shouldn't be orphaned checkpoint files lingering around. Maybe in some rare 
cases, reference count based cleanup didn't happen or succeed. so there is a 
small chance of orphaned files here.
 * We don't always restore from external checkpoint and continue the same 
checkpoint lineage (with incremental checkpoint and reference count). E.g. we 
can restore from a savepoint or empty state. Then those abandoned checkpoint 
lineages can leave significant garbage behind. 

here is what I am thinking about the GC
 # trace from root of retained external checkpoints to find all live files
 # Find all files in S3 bucket/prefix. I heard S3 can send daily report and we 
don't have to list objects
 # find the diff and remove the non live files (with some safety threshold like 
older than 30 days)


was (Author: stevenz3wu):
[~pnowojski] what is the usage of the remove command?

Please correct my understanding on incremental checkpoint.
 * It removes S3 files when reference count reaching zero. Normally, there 
shouldn't be orphaned checkpoint files lingering around. Maybe in some rare 
cases, reference count based cleanup didn't happen or succeed. so there is a 
small chance of orphaned files here.
 * We don't always restore from external checkpoint and continue the same 
checkpoint lineage. E.g. we can restore from a savepoint or empty state. Then 
those abandoned checkpoint lineages can leave significant garbage behind. 

here is what I am thinking about the GC
 # trace from root of retained external checkpoints to find all live files
 # Find all files in S3 bucket/prefix. I heard S3 can send daily report and we 
don't have to list objects
 # find the diff and remove the non live files (with some safety threshold like 
older than 30 days)

> A better way to show the files used in currently checkpoints
> ------------------------------------------------------------
>
>                 Key: FLINK-17571
>                 URL: https://issues.apache.org/jira/browse/FLINK-17571
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Checkpointing
>            Reporter: Congxian Qiu(klion26)
>            Priority: Major
>
> Inspired by the 
> [userMail|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Shared-Checkpoint-Cleanup-and-S3-Lifecycle-Policy-tt34965.html]
> Currently, there are [three types of 
> directory|https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/checkpoints.html#directory-structure]
>  for a checkpoint, the files in TASKOWND and EXCLUSIVE directory can be 
> deleted safely, but users can't delete the files in the SHARED directory 
> safely(the files may be created a long time ago).
> I think it's better to give users a better way to know which files are 
> currently used(so the others are not used)
> maybe a command-line command such as below is ok enough to support such a 
> feature.
> {{./bin/flink checkpoint list $checkpointDir  # list all the files used in 
> checkpoint}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to