[jira] [Commented] (FLINK-17421) Backpressure new checkpoints if previous were not managed to be cleaned up yet

Piotr Nowojski (Jira) Mon, 27 Apr 2020 23:15:11 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-17421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094182#comment-17094182
 ]


Piotr Nowojski commented on FLINK-17421:
----------------------------------------

[~stevenz3wu] please take a look at the linked issue and the user mailing list 
discussion there. From reading those (I haven't investigated the code) it seems 
like the issue can happen with any number of concurrent checkpoints (including 
1), if cleaning up (subsuming?) checkpoints is slower than the checkpoint 
interval (in the reference case checkpoint interval was 1 second). In such case 
number of completed checkpoints will be ever growing leading to OOM.

> Backpressure new checkpoints if previous were not managed to be cleaned up 
> yet 
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-17421
>                 URL: https://issues.apache.org/jira/browse/FLINK-17421
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.7.2, 1.8.3, 1.9.3, 1.10.0
>            Reporter: Piotr Nowojski
>            Priority: Major
>
> As reported in FLINK-17073, ioExecutor might not manage to clean up 
> checkpoints quickly enough causing ever growing memory consumption. A 
> proposed solution would be to backpressure new checkpoints in that scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-17421) Backpressure new checkpoints if previous were not managed to be cleaned up yet

Reply via email to