[
https://issues.apache.org/jira/browse/FLINK-17421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094182#comment-17094182
]
Piotr Nowojski commented on FLINK-17421:
----------------------------------------
[~stevenz3wu] please take a look at the linked issue and the user mailing list
discussion there. From reading those (I haven't investigated the code) it seems
like the issue can happen with any number of concurrent checkpoints (including
1), if cleaning up (subsuming?) checkpoints is slower than the checkpoint
interval (in the reference case checkpoint interval was 1 second). In such case
number of completed checkpoints will be ever growing leading to OOM.
> Backpressure new checkpoints if previous were not managed to be cleaned up
> yet
> -------------------------------------------------------------------------------
>
> Key: FLINK-17421
> URL: https://issues.apache.org/jira/browse/FLINK-17421
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Checkpointing
> Affects Versions: 1.7.2, 1.8.3, 1.9.3, 1.10.0
> Reporter: Piotr Nowojski
> Priority: Major
>
> As reported in FLINK-17073, ioExecutor might not manage to clean up
> checkpoints quickly enough causing ever growing memory consumption. A
> proposed solution would be to backpressure new checkpoints in that scenario.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)