[
https://issues.apache.org/jira/browse/FLINK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243032#comment-17243032
]
Nico Kruber commented on FLINK-20427:
-------------------------------------
Ahh...Flink regional failover. Makes sense now: for these, the sources have the
implicit assumption that they recover from the latest snapshot (while for a
global failover, they restore from the snapshotted state). It would be good to
add a safeguard these to ensure that the source is always recovering from the
latest snapshot for regional failovers then (ideally it would then fall back to
a global failover if this assumption is not true).
As for removing {{CheckpointConfig.setPreferCheckpointForRecovery}}: Let's
evaluate with the user ml whether anyone is relying on this and if not, let's
remove it and get rid of one more special case. The only use case I can think
of which may benefit from this feature here is a low-latency use case which
tolerates duplicates in the sinks but has very strong SLAs even in the failure
case: these could work around the slow savepoint-restore via retained
checkpoints. Retained checkpoints, however, don't really serve as a backup
which you can roll back to in case of bugs - user-triggered checkpoints could
be a better solution here, as mentioned, but they don't exist yet.
> Remove CheckpointConfig.setPreferCheckpointForRecovery because it can lead to
> data loss
> ---------------------------------------------------------------------------------------
>
> Key: FLINK-20427
> URL: https://issues.apache.org/jira/browse/FLINK-20427
> Project: Flink
> Issue Type: Bug
> Components: API / DataStream, Runtime / Checkpointing
> Affects Versions: 1.12.0
> Reporter: Till Rohrmann
> Priority: Critical
> Fix For: 1.13.0
>
>
> The {{CheckpointConfig.setPreferCheckpointForRecovery}} allows to configure
> whether Flink prefers checkpoints for recovery if the
> {{CompletedCheckpointStore}} contains savepoints and checkpoints. This is
> problematic because due to this feature, Flink might prefer older checkpoints
> over newer savepoints for recovery. Since some components expect that the
> always the latest checkpoint/savepoint is used (e.g. the
> {{SourceCoordinator}}), it breaks assumptions and can lead to
> {{SourceSplits}} which are not read. This effectively means that the system
> loses data. Similarly, this behaviour can cause that exactly once sinks might
> output results multiple times which violates the processing guarantees.
> Hence, I believe that we should remove this setting because it changes
> Flink's behaviour in some very significant way potentially w/o the user
> noticing.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)