tzulitai opened a new pull request #7329: [FLINK-11073] (part 1) Introduce 
COMPATIBLE_WITH_RECONFIGURED_SERIALIZER option in 
TypeSerializerSchemaCompatibility
URL: https://github.com/apache/flink/pull/7329
 
 
   ## What is the purpose of the change
   
   This commit introduces a new option to the 
`TypeSerializerSchemaCompatibility` class:
   ```
   
TypeSerializerSchemaCompatibility.compatibleWithReconfiguredSerializer(TypeSerializer<T>
 reconfiguredSerializerInstance);
   ```
   
   The main point to introduce this option is so that we can allow cleaner 
serializer implementations which don't need to be mutable to accommodate the 
fact that sometimes it needs to be reconfigured (e.g. the `KryoSerializer` or 
`PojoSerializer` are example serializers that reconfigure themselves).
   This is a step towards the principle that all serializers in Flink should be 
immutable.
   
   After introducing this option, this PR also lets the state backends 
framework code respect the new option, i.e. using the returned reconfigured 
serializer instance to access state instead of the originally registered 
serializer. Serializer's that need to take this account include:
   
   1. State value serializers in keyed backends / operator backends
   2. Key serializers in keyed backends
   3. Namespace serializers in keyed backends
   
   For 1. and 3., the change is pretty straightforward, and requires only small 
changes to the `StateSerializerProvider`; when we get a new registered 
serializer for restored state, we consider the reconfigured case there and 
maybe reassign the registered serializer reference to be the reconfigured one.
   
   For 2., things are a bit more evolved since for the key serializer, we 
ALWAYS get the new serializer instance first, and THEN maybe get the previous 
key serializer snapshot if we're restored.
   In this scenario, the `StateSerializerProvider` needs to be modified further 
to support this, as it previously only assumed that for restored state we 
always get the serializer snapshot first.
   Simply put, the `StateSerializerProvider` had to be changed so that it 
supports both directions, either first getting the new serializer or the old 
serializer snapshot.
   
   ## Brief change log
   
   - f5ab6b8: Extend the `TypeSerializerSchemaCompatibility` class to have the 
new option
   - c25b8fb: Preliminary test utilities extension. It introduces a 
`ReconfigurationRequiringTestTypeSerializer` to be used by various 
reconfiguring related migration tests later on.
   - 1a7f5a8: This modifies the `StateSerializerProvider` to work for the 
serializer reconfiguration cases 1. and 3. mentioned above. Tests are added to 
`StateSerializerProviderTest` for the new feature scope.
   - e028f29: This modifies the `StateSerializerProvider` to work for both 
directions. Tests are added for the new feature scope 
`StateSerializerProviderTest`.
   - b3ce23e: Let the key serializer in `AbstractKeyedStateBackend` be a 
`StateSerializerProvider`. Also makes accessing the key serializer more secure 
and well-defined.
   - 32485e1: Extend `StateBackendMigrationTestBase` to cover serializer 
reconfiguration cases for all kinds of state, including `ValueState` / 
`ListState` / key and namespace serializers in keyed state backends, and 
partitionable / union list state as well as broadcast state in operator state 
backends. This includes major refactoring of the 
`StateBackendMigrationTestBase` to let the cases be much clearer.
   
   ## Verifying this change
   
   Changes in `StateBackendMigrationTestBase`, `StateSerializerProviderTest` 
should reflect the new features and expected behaviours.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
     - The serializers: (**yes** / no / don't know)
     - The runtime per-record code paths (performance sensitive): (**yes (state 
access code paths are affected)** / no / don't know)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (**yes** / no / don't know)
     - The S3 file system connector: (yes / **no** / don't know)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to