[
https://issues.apache.org/jira/browse/FLINK-19741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tzu-Li (Gordon) Tai updated FLINK-19741:
----------------------------------------
Summary: InternalTimeServiceManager fails to restore due to corrupt reads
if there are other users of raw keyed state streams (was:
InternalTimeServiceManager fails to restore if there are other users of raw
keyed state streams)
> InternalTimeServiceManager fails to restore due to corrupt reads if there are
> other users of raw keyed state streams
> --------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-19741
> URL: https://issues.apache.org/jira/browse/FLINK-19741
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.9.3, 1.10.2, 1.11.2
> Reporter: Tzu-Li (Gordon) Tai
> Priority: Blocker
>
> Currently, when restoring a {{InternalTimeServiceManager}}, we always attempt
> to read from the provided raw keyed state streams (using
> {{InternalTimerServiceSerializationProxy}}):
> https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/InternalTimeServiceManagerImpl.java#L117
> This is incorrect, since we don't write with the
> {{InternalTimerServiceSerializationProxy}} if the timers do not require
> legacy synchronous snapshots:
> https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/InternalTimeServiceManagerImpl.java#L192
> (we currently only require that when users use RocksDB backend + heap timers).
> Therefore, the {{InternalTimeServiceManager}} can fail to be created on
> restore due to corrupt reads in the case where:
> * a checkpoint was taken where {{useLegacySynchronousSnapshots}} is false
> (hence nothing was written, and the time service manager does not use the raw
> keyed stream)
> * the raw keyed stream is used elsewhere (e.g. in the Flink application's
> user code)
> * on restore from the checkpoint, {{InternalTimeServiceManagerImpl.create()}}
> attempts to read from the raw keyed stream with the
> {{InternalTimerServiceSerializationProxy}}.
> The fix would be to also respect the {{useLegacySynchronousSnapshots}} flag
> in:
> https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/InternalTimeServiceManagerImpl.java#L231
--
This message was sent by Atlassian Jira
(v8.3.4#803005)