[
https://issues.apache.org/jira/browse/FLINK-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534549#comment-16534549
]
ASF GitHub Bot commented on FLINK-9377:
---------------------------------------
GitHub user tzulitai opened a pull request:
https://github.com/apache/flink/pull/6273
[FLINK-9377] [core] Implement restore serializer factory method for simple
composite serializers
## What is the purpose of the change
This PR is built on top of #6235. It is a WIP PR.
This PR implements the restore serializer factory method for all simple
composite serializers (i.e., Flink serializers with nested serializers). More
complex serializers such as the Scala serializers, POJO serializers,
KryoSerializer, AvroSerializer, etc. will come as a follow-up PR.
## Brief change log
- Introduce the `CompositeTypeSerializer` base class, which wraps the
configuration snapshotting logic and compatibility checks.
- Let all simple composite type serializers extend the
`CompositeTypeSerializer`.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / **no**)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (yes / **no**)
- The serializers: (**yes** / no / don't know)
- The runtime per-record code paths (performance sensitive): (yes /
**no** / don't know)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: (**yes** / no / don't know)
- The S3 file system connector: (yes / **no** / don't know)
## Documentation
- Does this pull request introduce a new feature? (yes / **no**)
- If yes, how is the feature documented? (**not applicable** / docs /
JavaDocs / not documented)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tzulitai/flink FLINK-9377-composite
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/6273.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6273
----
commit 5fc4a36a144c3f8f22be7e21a4e542d3042d10b1
Author: Tzu-Li (Gordon) Tai <tzulitai@...>
Date: 2018-06-13T11:43:53Z
[FLINK-9377] [core] (part 1) Extend TypeSerializerConfigSnapshot as a
factory for restoring serializers
This commit is the first step towards removing serializers from
checkpointed state meta info and making Flink checkpoints Java
serialization free.
Instead of writing serializers in checkpoints, and trying to read that
to obtain a restore serializer at restore time, we aim to only write the
config snapshot as the single source of truth and use it as a factory to
create a restore serializer.
This commit adds the method and signatures to the
TypeSerializerConfigSnapshot interface. Use of the method, as well as
properly implementing the method for all serializers, will be
implemented in follow-up commits.
commit 661eb6d34da450ed096a77f166a4cc62ce3efdba
Author: Tzu-Li (Gordon) Tai <tzulitai@...>
Date: 2018-06-14T09:52:06Z
[FLINK-9377] [core] (part 2) Remove fallback deserializer option from
CompatibilityResult
Now that the config snapshot is used as a factory for the restore
serializer, it should be guaranteed that a restore serializer is always
available. This removes the need for the user to provide a "fallback"
convert serializer in the case where a migration is required.
commit c91d045c5eb6e355981e4edaa6d1a0d48e5d4a5e
Author: Tzu-Li (Gordon) Tai <tzulitai@...>
Date: 2018-06-14T14:41:45Z
[FLINK-9377] [core] (part 3) Deprecate TypeSerializerSerializationUtil
This commit deprecates all utility methods and classes related to
serializing serializers. All methods that will still be in use, i.e.
writing config snapshots, are now moved to a separate new
TypeSerializerConfigSnapshotSerializationUtil class.
commit e09f91469fb6c86f5d2f05b78a9db3d9af8cce87
Author: Tzu-Li (Gordon) Tai <tzulitai@...>
Date: 2018-06-18T14:24:08Z
[FLINK-9377] [core] (part 4) Introduce BackwardsCompatibleConfigSnapshot
The BackwardsCompatibleConfigSnapshot is a wrapper, dummy config
snapshot which wraps an actual config snapshot, as well as a
pre-existing serializer instance.
In previous versions, since the config snapshot wasn't a serializer
factory but simply a container for serializer parameters, previous
serializers didn't necessarily have config snapshots that are capable of
correctly creating a correct corresponding restore serializer.
In this case, since previous serializers still have serializers written
in the checkpoint, the backwards compatible solution would be to wrap
the written serializer and the config snapshot within the
BackwardsCompatibleConfigSnapshot dummy. When attempting to restore the
serializer, the wrapped serializer instance is returned instead of
actually calling the restoreSerializer method of the wrapped config
snapshot.
commit da84665a9b101a803f7446210afc34bbd4a71703
Author: Tzu-Li (Gordon) Tai <tzulitai@...>
Date: 2018-07-02T03:45:20Z
[FLINK-9377] [core] (part 5) Remove serializers from checkpoint state meta
infos
This commit officially removes the behaviour of writing serializers in
the state meta info of keyed state, operator state, and timers state.
This affects the serialization formats of the
KeyedBackendSerializationProxy, OperatorBackendSerializationProxy, and
InternalTimerServiceSerializationProxy, and therefore their versions are
all upticked.
commit cd298ddd44b8fb19ca956e0193a731bacc9bc38d
Author: Tzu-Li (Gordon) Tai <tzulitai@...>
Date: 2018-06-18T14:24:35Z
[FLINK-9377] [core] (part 6) Properly implement restoreSerializer for
simple composite serializer config snapshots
----
> Remove writing serializers as part of the checkpoint meta information
> ---------------------------------------------------------------------
>
> Key: FLINK-9377
> URL: https://issues.apache.org/jira/browse/FLINK-9377
> Project: Flink
> Issue Type: Sub-task
> Components: State Backends, Checkpointing
> Reporter: Tzu-Li (Gordon) Tai
> Assignee: Tzu-Li (Gordon) Tai
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.6.0
>
>
> When writing meta information of a state in savepoints, we currently write
> both the state serializer as well as the state serializer's configuration
> snapshot.
> Writing both is actually redundant, as most of the time they have identical
> information.
> Moreover, the fact that we use Java serialization to write the serializer
> and rely on it to be re-readable on the restore run, already poses problems
> for serializers such as the {{AvroSerializer}} (see discussion in FLINK-9202)
> to perform even a compatible upgrade.
> The proposal here is to leave only the config snapshot as meta information,
> and use that as the single source of truth of information about the schema of
> serialized state.
> The config snapshot should be treated as a factory (or provided to a
> factory) to re-create serializers capable of reading old, serialized state.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)