[
https://issues.apache.org/jira/browse/FLINK-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618634#comment-16618634
]
ASF GitHub Bot commented on FLINK-9377:
---------------------------------------
tzulitai opened a new pull request #6711: [FLINK-9377] [core, state backends]
Remove serializers from checkpoints
URL: https://github.com/apache/flink/pull/6711
## What is the purpose of the change
The goal of this PR is the following:
- Extend the `TypeSerializerConfigSnapshot` to also be the factory for the
originating serializer. This introduces a new `restoreSerializer()` method on
the interface (API-breaking change).
- Since interface would inevitably be broken, we also use the opportunity to
move the responsibility of serializer schema compatibility checks from
`TypeSerializer` to the `TypeSerializerConfigSnapshot`. This is an improvement
to move away from the previous design of having to reconfigure serializers,
leading to mutable serializers as well as a ill-defined separation of concerns
(serialization v.s. compatibility).
- Without the need to touch all `TypeSerializerConfigSnapshot` subclasses at
once while still letting the codebase to build and pass tests, the new methods
on the `TypeSerializerConfigSnapshot` interface should have "workaround" base
implementations. These base implementations should eventually be removed once
all subclasses have the proper implementations in place.
The next steps after this PR is merged would be the following:
- Introduce a `SerializerUpgradeTestBase` that covers test concerns for
upgrading a serializer and its interplay with different state backends.
Ultimately, a check needs to be added that ALL Flink-shipped serializers have a
corresponding test subclass.
- Incrementally implement the new methods for all serializers (will be
several PRs).
- Use the new interfaces to implement state migration in the state backends.
## Brief change log
#### b0151cd Add the `TypeSerializerConfigSnapshot#restoreSerializer()`
method
The workaround base implementation for this is to just return the config's
originating serializer.
There are two scenarios in our current tests that needs to be covered by
this: 1) taking a snapshot of current serializers, and then trying to restore
the serializer, and 2) migration tests which read from savepoint files and try
to restore serializers.
For 1), the workaround is to inject the originating serializer just before
the config is written, and then write / read that serializer as part of writing
/ reading the config.
For 2), that is already covered by our backwards compatibility path for
Flink <= 1.6.x, which still contained both serializer and config. The backwards
compatibility resolution would be to wrap the serializer and config inside a
`BackwardsCompatibleConfigSnapshot`, introduced in 80c3043.
#### 6073a0d Add the
`TypeSerializerConfigSnapshot#resolveSchemaCompatibility(TypeSerializer)` method
The workaround base implementation for this is to forward the compatibility
check to the to-be-removed
`TypeSerializer#ensureCompatibility(TypeSerializerConfigSnapshot)` method.
We also define a new set of compatibility results, defined in
`TypeSerializerSchemaCompatibility`, which will eventually completely replace
`CompatibilityResult`.
#### 48124d7 Deprecate `TypeSerializerSerializationUtil`
This is a simple code cleanup.
#### 80c3043 Introduce `BackwardsCompatibleConfigSnapshot`
This is our backwards compatibility resolution path when reading from
savepoint versions which still contained both the serializer and their config.
A `BackwardsCompatibleConfigSnapshot` wraps the serializer and config
together, and on `restoreSerializer`, instead of directly calling
`restoreSerializer` on the wrapped config (which for previous versions may not
be capable of actually instantiating the serializer), return the wrapped
serializer instead.
#### f9a0dee Remove serializers from checkpoints
This commit glues everything together to remove serializers from checkpoints.
This commit changes the serialization formats for state backends (while
remaining backwards compatibility).
## Verifying this change
All current state migration tests, state backend tests, serializer tests
etc. should still be working.
No new tests have been added yet.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / **no**)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (**yes** / no)
- The serializers: (**yes** / no / don't know)
- The runtime per-record code paths (performance sensitive): (yes / **no**
/ don't know)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: (**yes** / no / don't know)
- The S3 file system connector: (yes / **no** / don't know)
## Documentation
- Does this pull request introduce a new feature? (**yes** / no)
- If yes, how is the feature documented? (not applicable / docs /
**JavaDocs** / not documented)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Remove writing serializers as part of the checkpoint meta information
> ---------------------------------------------------------------------
>
> Key: FLINK-9377
> URL: https://issues.apache.org/jira/browse/FLINK-9377
> Project: Flink
> Issue Type: Sub-task
> Components: State Backends, Checkpointing
> Reporter: Tzu-Li (Gordon) Tai
> Assignee: Tzu-Li (Gordon) Tai
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.7.0
>
>
> When writing meta information of a state in savepoints, we currently write
> both the state serializer as well as the state serializer's configuration
> snapshot.
> Writing both is actually redundant, as most of the time they have identical
> information.
> Moreover, the fact that we use Java serialization to write the serializer
> and rely on it to be re-readable on the restore run, already poses problems
> for serializers such as the {{AvroSerializer}} (see discussion in FLINK-9202)
> to perform even a compatible upgrade.
> The proposal here is to leave only the config snapshot as meta information,
> and use that as the single source of truth of information about the schema of
> serialized state.
> The config snapshot should be treated as a factory (or provided to a
> factory) to re-create serializers capable of reading old, serialized state.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)