[
https://issues.apache.org/jira/browse/FLINK-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tzu-Li (Gordon) Tai updated FLINK-8421:
---------------------------------------
Description:
The {{HeapInternalTimerService}} still uses simple {{equals}} checks on
restored / newly provided serializers for compatibility checks. This should be
replaced with the {{TypeSerializer::ensureCompatibility}} checks instead, so
that new serializers can be reconfigured.
This would entail that the {{TypeSerializerConfiguration}} of the key and
namespace serializer in the {{HeapInternalTimerService}} also needs to be
written to the raw state.
For Flink 1.4.0 release and current master, this is a critical bug since the
{{KryoSerializer}} has different default base registrations than before due to
FLINK-7420. i.e if the key of a window is serialized using the
{{KryoSerializer}} in 1.3.x, the restore would never succeed in 1.4.0.
For 1.3.x, this fix would be an improvement, such that the
{{HeapInternalTimerService}} restore will make use of serializer
reconfiguration.
Other remarks:
* We need to double check all operators that checkpoint / restore from **raw**
state. Apparently, the serializer compatibility checks were only implemented
for managed state.
* Migration ITCases apparently do not have enough coverage. A migration test
job that uses a key type which required the {{KryoSerializer}}, and uses
windows, would have caught this issue.
was:
The {{HeapInternalTimerService}} still uses simple {{equals}} checks on
restored / newly provided serializers for compatibility checks. This should be
replaced with the {{TypeSerializer::ensureCompatibility}} checks instead, so
that new serializers can be reconfigured.
For Flink 1.4.0 release and current master, this is a critical bug since the
{{KryoSerializer}} has different default base registrations than before due to
FLINK-7420. i.e if the key of a window is serialized using the
{{KryoSerializer}} in 1.3.x, the restore would never succeed in 1.4.0.
For 1.3.x, this fix would be an improvement, such that the
{{HeapInternalTimerService}} restore will make use of serializer
reconfiguration.
Other remarks:
* We need to double check all operators that checkpoint / restore from **raw**
state. Apparently, the serializer compatibility checks were only implemented
for managed state.
* Migration ITCases apparently do not have enough coverage. A migration test
job that uses a key type which required the {{KryoSerializer}}, and uses
windows, would have caught this issue.
> HeapInternalTimerService should reconfigure compatible key / namespace
> serializers on restore
> ---------------------------------------------------------------------------------------------
>
> Key: FLINK-8421
> URL: https://issues.apache.org/jira/browse/FLINK-8421
> Project: Flink
> Issue Type: Bug
> Affects Versions: 1.4.0, 1.5.0
> Reporter: Tzu-Li (Gordon) Tai
> Priority: Blocker
> Fix For: 1.3.3, 1.5.0, 1.4.1
>
>
> The {{HeapInternalTimerService}} still uses simple {{equals}} checks on
> restored / newly provided serializers for compatibility checks. This should
> be replaced with the {{TypeSerializer::ensureCompatibility}} checks instead,
> so that new serializers can be reconfigured.
> This would entail that the {{TypeSerializerConfiguration}} of the key and
> namespace serializer in the {{HeapInternalTimerService}} also needs to be
> written to the raw state.
> For Flink 1.4.0 release and current master, this is a critical bug since the
> {{KryoSerializer}} has different default base registrations than before due
> to FLINK-7420. i.e if the key of a window is serialized using the
> {{KryoSerializer}} in 1.3.x, the restore would never succeed in 1.4.0.
> For 1.3.x, this fix would be an improvement, such that the
> {{HeapInternalTimerService}} restore will make use of serializer
> reconfiguration.
> Other remarks:
> * We need to double check all operators that checkpoint / restore from
> **raw** state. Apparently, the serializer compatibility checks were only
> implemented for managed state.
> * Migration ITCases apparently do not have enough coverage. A migration test
> job that uses a key type which required the {{KryoSerializer}}, and uses
> windows, would have caught this issue.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)