GitHub user tzulitai opened a pull request:
https://github.com/apache/flink/pull/3804
[FLINK-6190] [core] Reconfigurable TypeSerializers
Reconfigurable TypeSerializers is the first step towards allowing
serializer upgrades. The second step, activating serializer upgrades for
registered state, will be a separate follow-up PR based on this one.
This PR includes both
[FLINK-6190](https://issues.apache.org/jira/browse/FLINK-6190) (configuration
snapshot for all serializers) and
[FLINK-6191](https://issues.apache.org/jira/browse/FLINK-6191) (reconfiguration
logic for all serializers). They have been bundled together in this PR because
they share common concerns.
Since the change touches all serializers currently in Flink, to ease
review, the PR is broken up into incrementally buildable commits that are
independent for different kinds of serializers.
## Description
Reconfigurable serializers consist of 2 parts:
1. Creating a snapshot of a serializer's configuration, that can be written
to state with definable versioned format. A serializer's configuration snapshot
is a point-in-time view of the current state and parameters of the serializer
at the time of the snapshot. For example, for the `KryoSerializer`, its
configuration should contain the registration order of its classes and
serializers. Other serializers like the `PojoSerializer` contain configuration
parameters that change during runtime, e.g. the cached serializers for its
non-registered POJO subclasses (the explanation of why certain information
needs to be persisted for each serializer's configuration snapshot is included
as code comments and Javadocs in the PR).
2. Reconfiguring a new serializer with the configuration snapshot of a
preceding serializer, so that it may be compatible to read old data written by
the preceding serializer.
## New user interfaces
New methods in the `TypeSerializer` interface:
- `TypeSerializer#snapshotConfiguration()`: extracts a configuration
snapshot from the serializer
- `TypeSerializer#reconfigure(TypeSerializerConfigSnapshot)`: reconfigure
the serializer with its preceding serializer's configuration snapshot.
New classes:
- `TypeSerializerConfigSnapshot`: a `VersionedIOReadableWritable` base
class for serializers to implement their own configuration snapshot class. Each
serializer needs to encode its own information about its serialization format
and its required parameters within its own config snapshot class.
- `ForwardCompatibleSerializationFormatConfig`: a special marker config
that implementations of `TypeSerializer#snapshotConfiguration()` can return to
signal that new serializers for the data written by the serializer does not
need to be reconfigured for compatibility. For example, user custom serializers
using serialization frameworks with built-in compatibility mechanisms (e.g.
Thrift, Protobut etc.) can simply just return this marker.
- `ReconfigureResult`: enum representing the result of a reconfiguration,
returned from `TypeSerializer#reconfigure(TypeSerializerConfigSnapshot)`. 3
kinds of results are currently defined - please see the Javadoc of the class
for details.
## Tests
Specific tests have been added for the more complex serializers, namely:
- `EnumSerializer` (841298a)
- `KryoSerializer` (5b50505)
- `PojoSerializer` (7ca5496)
General `TypeSerializerConfigSnapshot` tests such as de-/serialization of
configuration snapshots is added in c2bb1ef.
Also, two fundamental unit tests related to config snapshots have also been
added to `SerializerTestBase` (b917941).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tzulitai/flink FLINK-6190
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3804.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3804
----
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---