Till Rohrmann created FLINK-20427:
-------------------------------------
Summary: Remove CheckpointConfig.setPreferCheckpointForRecovery
because it can lead to data loss
Key: FLINK-20427
URL: https://issues.apache.org/jira/browse/FLINK-20427
Project: Flink
Issue Type: Bug
Components: API / DataStream, Runtime / Checkpointing
Affects Versions: 1.12.0
Reporter: Till Rohrmann
Fix For: 1.13.0
The {{CheckpointConfig.setPreferCheckpointForRecovery}} allows to configure
whether Flink prefers checkpoints for recovery if the
{{CompletedCheckpointStore}} contains savepoints and checkpoints. This is
problematic because due to this feature, Flink might prefer older checkpoints
over newer savepoints for recovery. Since some components expect that the
always the latest checkpoint/savepoint is used (e.g. the
{{SourceCoordinator}}), it breaks assumptions and can lead to {{SourceSplits}}
which are not read. This effectively means that the system loses data.
Similarly, this behaviour can cause that exactly once sinks might output
results multiple times which violates the processing guarantees. Hence, I
believe that we should remove this setting because it changes Flink's behaviour
in some very significant way potentially w/o the user noticing.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)