[
https://issues.apache.org/jira/browse/FLINK-13856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324751#comment-17324751
]
fanrui commented on FLINK-13856:
--------------------------------
[~sewen] [~andrew_lin]
Good Feature. In HeapKeyedStateBackend or OperatorState scenarios, deleting
directories can optimize the number of RPC from O(n) to O(1) compared to
deleting files. Hope that jira can continue to push forward.
If S3 is not suitable for deleting directories, can we refactor
CompletedCheckpoint#doDiscard? S3 uses the old cleanup strategy, and other
FileSystems use the new cleanup strategy.
> Reduce the delete file api when the checkpoint is completed
> -----------------------------------------------------------
>
> Key: FLINK-13856
> URL: https://issues.apache.org/jira/browse/FLINK-13856
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Checkpointing, Runtime / State Backends
> Affects Versions: 1.8.1, 1.9.0
> Reporter: Andrew.D.lin
> Assignee: Andrew.D.lin
> Priority: Major
> Labels: pull-request-available, stale-assigned
> Attachments: after.png, before.png,
> f6cc56b7-2c74-4f4b-bb6a-476d28a22096.png
>
> Original Estimate: 48h
> Time Spent: 10m
> Remaining Estimate: 47h 50m
>
> When the new checkpoint is completed, an old checkpoint will be deleted by
> calling CompletedCheckpoint.discardOnSubsume().
> When deleting old checkpoints, follow these steps:
> 1, drop the metadata
> 2, discard private state objects
> 3, discard location as a whole
> In some cases, is it possible to delete the checkpoint folder recursively by
> one call?
> As far as I know the full amount of checkpoint, it should be possible to
> delete the folder directly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)