[ 
https://issues.apache.org/jira/browse/FLINK-9376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-9376:
---------------------------------------
    Description: 
Currently, users have access to upgrade state serializers on the restore run of 
a stateful job, as long as the upgraded new serializer remains backwards 
compatible with all previous written data in the savepoint (i.e. it can read 
all previous and current schema of serialized state objects).

What is still lacking is the ability to upgrade to incompatible serializers. 
Upon being registered an incompatible serializer for existing restored state, 
that state needs to go through the process of -
 1. read serialized state with the previous serializer
 2. passing each deserialized state object through a “migration map function”, 
and
 3. writing back the state with the new serializer

The availability of this process should be strictly limited to state 
registrations that occur before the actual processing begins (e.g. in the 
{{open}} or {{initializeState}} methods), so that we avoid performing these 
operations during processing.

How this procedure actually occurs, differs across different types of state 
backends.
For example, for state backends that eagerly deserialize / lazily serialize 
state (e.g. {{HeapStateBackend}}), the job execution itself can be seen as a 
"migration"; everything is deserialized to state objects on restore, and is 
only serialized again, with the new serializer, on checkpoints.
Therefore, for these state backends, the above process is irrelevant.

On the other hand, for state backends that lazily deserialize / eagerly 
serialize state (e.g. {{RocksDBStateBackend}}), the state evolution process 
needs to happen for every state with a newly registered incompatible serializer.

Procedure 2. will allow even state type migrations, but that is out-of-scope of 
this JIRA.
 This ticket focuses only on procedures 1. and 3., where we try to enable 
schema evolution without state type changes.

This is an umbrella JIRA ticket that overlooks this feature, including a few 
preliminary tasks that work towards enabling it.

  was:
Currently, users have access to upgrade state serializers on the restore run of 
a stateful job, as long as the upgraded new serializer remains backwards 
compatible with all previous written data in the savepoint (i.e. it can read 
all previous and current schema of serialized state objects).

What is still lacking is the ability to upgrade to incompatible serializers. 
Upon being registered an incompatible serializer for existing restored state, 
that state needs to go through the process of -
 1. read serialized state with the previous serializer
 2. passing each deserialized state object through a “migration map function”, 
and
 3. writing back the state with the new serializer

The availability of this process should be strictly limited to state 
registrations that occur before the actual processing begins (e.g. in the 
{{open}} or {{initializeState}} methods), so that we avoid performing these 
operations during processing.

Procedure 2. will allow even state type migrations, but that is out-of-scope of 
this JIRA.
 This ticket focuses only on procedures 1. and 3., where we try to enable 
schema evolution without state type changes.

This is an umbrella JIRA ticket that overlooks this feature, including a few 
preliminary tasks that work towards enabling it.


> Allow upgrading to incompatible state serializers (state schema evolution)
> --------------------------------------------------------------------------
>
>                 Key: FLINK-9376
>                 URL: https://issues.apache.org/jira/browse/FLINK-9376
>             Project: Flink
>          Issue Type: New Feature
>          Components: State Backends, Checkpointing, Type Serialization System
>            Reporter: Tzu-Li (Gordon) Tai
>            Priority: Critical
>             Fix For: 1.6.0
>
>
> Currently, users have access to upgrade state serializers on the restore run 
> of a stateful job, as long as the upgraded new serializer remains backwards 
> compatible with all previous written data in the savepoint (i.e. it can read 
> all previous and current schema of serialized state objects).
> What is still lacking is the ability to upgrade to incompatible serializers. 
> Upon being registered an incompatible serializer for existing restored state, 
> that state needs to go through the process of -
>  1. read serialized state with the previous serializer
>  2. passing each deserialized state object through a “migration map 
> function”, and
>  3. writing back the state with the new serializer
> The availability of this process should be strictly limited to state 
> registrations that occur before the actual processing begins (e.g. in the 
> {{open}} or {{initializeState}} methods), so that we avoid performing these 
> operations during processing.
> How this procedure actually occurs, differs across different types of state 
> backends.
> For example, for state backends that eagerly deserialize / lazily serialize 
> state (e.g. {{HeapStateBackend}}), the job execution itself can be seen as a 
> "migration"; everything is deserialized to state objects on restore, and is 
> only serialized again, with the new serializer, on checkpoints.
> Therefore, for these state backends, the above process is irrelevant.
> On the other hand, for state backends that lazily deserialize / eagerly 
> serialize state (e.g. {{RocksDBStateBackend}}), the state evolution process 
> needs to happen for every state with a newly registered incompatible 
> serializer.
> Procedure 2. will allow even state type migrations, but that is out-of-scope 
> of this JIRA.
>  This ticket focuses only on procedures 1. and 3., where we try to enable 
> schema evolution without state type changes.
> This is an umbrella JIRA ticket that overlooks this feature, including a few 
> preliminary tasks that work towards enabling it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to