tzulitai opened a new pull request #7329: [FLINK-11073] (part 1) Introduce COMPATIBLE_WITH_RECONFIGURED_SERIALIZER option in TypeSerializerSchemaCompatibility URL: https://github.com/apache/flink/pull/7329 ## What is the purpose of the change This commit introduces a new option to the `TypeSerializerSchemaCompatibility` class: ``` TypeSerializerSchemaCompatibility.compatibleWithReconfiguredSerializer(TypeSerializer<T> reconfiguredSerializerInstance); ``` The main point to introduce this option is so that we can allow cleaner serializer implementations which don't need to be mutable to accommodate the fact that sometimes it needs to be reconfigured (e.g. the `KryoSerializer` or `PojoSerializer` are example serializers that reconfigure themselves). This is a step towards the principle that all serializers in Flink should be immutable. After introducing this option, this PR also lets the state backends framework code respect the new option, i.e. using the returned reconfigured serializer instance to access state instead of the originally registered serializer. Serializer's that need to take this account include: 1. State value serializers in keyed backends / operator backends 2. Key serializers in keyed backends 3. Namespace serializers in keyed backends For 1. and 3., the change is pretty straightforward, and requires only small changes to the `StateSerializerProvider`; when we get a new registered serializer for restored state, we consider the reconfigured case there and maybe reassign the registered serializer reference to be the reconfigured one. For 2., things are a bit more evolved since for the key serializer, we ALWAYS get the new serializer instance first, and THEN maybe get the previous key serializer snapshot if we're restored. In this scenario, the `StateSerializerProvider` needs to be modified further to support this, as it previously only assumed that for restored state we always get the serializer snapshot first. Simply put, the `StateSerializerProvider` had to be changed so that it supports both directions, either first getting the new serializer or the old serializer snapshot. ## Brief change log - f5ab6b8: Extend the `TypeSerializerSchemaCompatibility` class to have the new option - c25b8fb: Preliminary test utilities extension. It introduces a `ReconfigurationRequiringTestTypeSerializer` to be used by various reconfiguring related migration tests later on. - 1a7f5a8: This modifies the `StateSerializerProvider` to work for the serializer reconfiguration cases 1. and 3. mentioned above. Tests are added to `StateSerializerProviderTest` for the new feature scope. - e028f29: This modifies the `StateSerializerProvider` to work for both directions. Tests are added for the new feature scope `StateSerializerProviderTest`. - b3ce23e: Let the key serializer in `AbstractKeyedStateBackend` be a `StateSerializerProvider`. Also makes accessing the key serializer more secure and well-defined. - 32485e1: Extend `StateBackendMigrationTestBase` to cover serializer reconfiguration cases for all kinds of state, including `ValueState` / `ListState` / key and namespace serializers in keyed state backends, and partitionable / union list state as well as broadcast state in operator state backends. This includes major refactoring of the `StateBackendMigrationTestBase` to let the cases be much clearer. ## Verifying this change Changes in `StateBackendMigrationTestBase`, `StateSerializerProviderTest` should reflect the new features and expected behaviours. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (**yes** / no / don't know) - The runtime per-record code paths (performance sensitive): (**yes (state access code paths are affected)** / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (**yes** / no / don't know) - The S3 file system connector: (yes / **no** / don't know)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
