[
https://issues.apache.org/jira/browse/FLINK-21351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Khachatryan updated FLINK-21351:
--------------------------------------
Fix Version/s: (was: 1.11.4)
> Incremental checkpoint data would be lost once a non-stop savepoint completed
> -----------------------------------------------------------------------------
>
> Key: FLINK-21351
> URL: https://issues.apache.org/jira/browse/FLINK-21351
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.11.3, 1.12.1, 1.13.0
> Reporter: Yun Tang
> Assignee: Roman Khachatryan
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.12.2, 1.13.0
>
>
> FLINK-10354 counted savepoint as retained checkpoint so that job could
> failover from latest position. I think this operation is reasonable, however,
> current implementation would let incremental checkpoint data lost immediately
> once a non-stop savepoint completed.
> Current general phase of incremental checkpoints: once a newer checkpoint
> completed, it would be added to checkpoint store. And if the size of
> completed checkpoints larger than max retained limit, it would subsume the
> oldest one. This lead to the reference of incremental data decrease one and
> data would be deleted once reference reached to zero. As we always ensure to
> register newer checkpoint and then unregister older checkpoint, current phase
> works fine as expected.
> However, if a non-stop savepoint (a median manual trigger savepoint) is
> completed, it would be also added into checkpoint store and just subsume
> previous added checkpoint (in default retain one checkpoint case), which
> would unregister older checkpoint without newer checkpoint registered,
> leading to data lost.
> Thanks for [~banmoy] reporting this problem first.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)