[ 
https://issues.apache.org/jira/browse/FLINK-13856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324751#comment-17324751
 ] 

fanrui commented on FLINK-13856:
--------------------------------

[~sewen]  [~andrew_lin] 

Good Feature. In HeapKeyedStateBackend or OperatorState scenarios, deleting 
directories can optimize the number of RPC from O(n) to O(1) compared to 
deleting files. Hope that jira can continue to push forward.

If S3 is not suitable for deleting directories, can we refactor 
CompletedCheckpoint#doDiscard? S3 uses the old cleanup strategy, and other 
FileSystems use the new cleanup strategy.

> Reduce the delete file api when the checkpoint is completed
> -----------------------------------------------------------
>
>                 Key: FLINK-13856
>                 URL: https://issues.apache.org/jira/browse/FLINK-13856
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing, Runtime / State Backends
>    Affects Versions: 1.8.1, 1.9.0
>            Reporter: Andrew.D.lin
>            Assignee: Andrew.D.lin
>            Priority: Major
>              Labels: pull-request-available, stale-assigned
>         Attachments: after.png, before.png, 
> f6cc56b7-2c74-4f4b-bb6a-476d28a22096.png
>
>   Original Estimate: 48h
>          Time Spent: 10m
>  Remaining Estimate: 47h 50m
>
> When the new checkpoint is completed, an old checkpoint will be deleted by 
> calling CompletedCheckpoint.discardOnSubsume().
> When deleting old checkpoints, follow these steps:
> 1, drop the metadata
> 2, discard private state objects
> 3, discard location as a whole
> In some cases, is it possible to delete the checkpoint folder recursively by 
> one call?
> As far as I know the full amount of checkpoint, it should be possible to 
> delete the folder directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to