[jira] [Commented] (FLINK-20427) Remove CheckpointConfig.setPreferCheckpointForRecovery because it can lead to data loss

Nico Kruber (Jira) Tue, 01 Dec 2020 03:44:04 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241465#comment-17241465
 ]


Nico Kruber commented on FLINK-20427:
-------------------------------------

I'm not sure we should touch this before the rework of the checkpoint/savepoint 
semantics - if this is still planned, I am talking about not having checkpoints 
vs. savepoints but rather snapshots with certain properties. In that case, we 
could have user-triggered snapshots with statebackend-native format and then I 
don't see any downside of using these for recovery just like using 
checkpoints-only now.
>From what we learned, RocksDB-recovery from a savepoint can actually take a 
>considerable amount of time and especially in at-least-once sink use cases 
>with low latency requirements, this could be a problem!

> Remove CheckpointConfig.setPreferCheckpointForRecovery because it can lead to 
> data loss
> ---------------------------------------------------------------------------------------
>
>                 Key: FLINK-20427
>                 URL: https://issues.apache.org/jira/browse/FLINK-20427
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream, Runtime / Checkpointing
>    Affects Versions: 1.12.0
>            Reporter: Till Rohrmann
>            Priority: Critical
>             Fix For: 1.13.0
>
>
> The {{CheckpointConfig.setPreferCheckpointForRecovery}} allows to configure 
> whether Flink prefers checkpoints for recovery if the 
> {{CompletedCheckpointStore}} contains savepoints and checkpoints. This is 
> problematic because due to this feature, Flink might prefer older checkpoints 
> over newer savepoints for recovery. Since some components expect that the 
> always the latest checkpoint/savepoint is used (e.g. the 
> {{SourceCoordinator}}), it breaks assumptions and can lead to 
> {{SourceSplits}} which are not read. This effectively means that the system 
> loses data. Similarly, this behaviour can cause that exactly once sinks might 
> output results multiple times which violates the processing guarantees. 
> Hence, I believe that we should remove this setting because it changes 
> Flink's behaviour in some very significant way potentially w/o the user 
> noticing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-20427) Remove CheckpointConfig.setPreferCheckpointForRecovery because it can lead to data loss

Reply via email to