[ 
https://issues.apache.org/jira/browse/FLINK-13856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961953#comment-16961953
 ] 

Congxian Qiu(klion26) commented on FLINK-13856:
-----------------------------------------------

[~andrew_lin]  I think reduce the RPC call is really import, we encountered the 
RPC pressure also(keyed states and operator states). And we solved the problem 
by using shared the same underlying file for different state handles. We have 
filed an issue[1], and [~sewen] is helping review the filed PR, the PR 
currently is just for shared state, and will support exclusive state in the 
future. We have implemented this feature for shared state before, and 
implemented for exclusive state currently in our production, from the 
observation from our production, this can reduce much RPC count(including 
create RPC, delete RPC, create block RPC and so on…) and can reduce much the 
request queue length.

After we sharing the same file, when we discard the states, we just need to 
delete the file *_once,_* and *__* the RPC call can reduce much.

What do you think about this?

[1] https://issues.apache.org/jira/browse/FLINK-11937

> Reduce the delete file api when the checkpoint is completed
> -----------------------------------------------------------
>
>                 Key: FLINK-13856
>                 URL: https://issues.apache.org/jira/browse/FLINK-13856
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing, Runtime / State Backends
>    Affects Versions: 1.8.1, 1.9.0
>            Reporter: andrew.D.lin
>            Assignee: andrew.D.lin
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: after.png, before.png, 
> f6cc56b7-2c74-4f4b-bb6a-476d28a22096.png
>
>   Original Estimate: 48h
>          Time Spent: 10m
>  Remaining Estimate: 47h 50m
>
> When the new checkpoint is completed, an old checkpoint will be deleted by 
> calling CompletedCheckpoint.discardOnSubsume().
> When deleting old checkpoints, follow these steps:
> 1, drop the metadata
> 2, discard private state objects
> 3, discard location as a whole
> In some cases, is it possible to delete the checkpoint folder recursively by 
> one call?
> As far as I know the full amount of checkpoint, it should be possible to 
> delete the folder directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to