[ 
https://issues.apache.org/jira/browse/FLINK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243183#comment-17243183
 ] 

Stephan Ewen commented on FLINK-20427:
--------------------------------------

We need to do something about this, because the current state is just broken. 
Like Till mentioned, any regional failover is inconsistent. Exactly-once sinks 
are by definition broken with this.

I am in favor of deprecating this in 1.12. It would not break programs, but 
would make users aware that this is not a feature with well-defined semantics.

Then let's clearly define how savepoints, side effects, and durability 
notifications (checkpoint complete) should interact in all these scenarios. 
Then we can think about adding this option back, or maybe the option will not 
be necessary any more.

> Remove CheckpointConfig.setPreferCheckpointForRecovery because it can lead to 
> data loss
> ---------------------------------------------------------------------------------------
>
>                 Key: FLINK-20427
>                 URL: https://issues.apache.org/jira/browse/FLINK-20427
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream, Runtime / Checkpointing
>    Affects Versions: 1.12.0
>            Reporter: Till Rohrmann
>            Priority: Critical
>             Fix For: 1.13.0
>
>
> The {{CheckpointConfig.setPreferCheckpointForRecovery}} allows to configure 
> whether Flink prefers checkpoints for recovery if the 
> {{CompletedCheckpointStore}} contains savepoints and checkpoints. This is 
> problematic because due to this feature, Flink might prefer older checkpoints 
> over newer savepoints for recovery. Since some components expect that the 
> always the latest checkpoint/savepoint is used (e.g. the 
> {{SourceCoordinator}}), it breaks assumptions and can lead to 
> {{SourceSplits}} which are not read. This effectively means that the system 
> loses data. Similarly, this behaviour can cause that exactly once sinks might 
> output results multiple times which violates the processing guarantees. 
> Hence, I believe that we should remove this setting because it changes 
> Flink's behaviour in some very significant way potentially w/o the user 
> noticing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to