GitHub user tzulitai opened a pull request:
https://github.com/apache/flink/pull/5950
[FLINK-9169] [state-backend] Allow absence of old serializers when
restoring RocksDBStateBackend
## What is the purpose of the change
This PR contains 2 commits that overall allows absence of old state
serializers when restoring the RocksDB state backend. It also eliminates the
possibility of confusing NPEs which comes with the fact that previously,
restored serializers may be `null`.
Allowing old state serializers to be absent for the RocksDB state backend
will allow for users to perform state evolutions that they couldn't before.
## Brief change log
- 2f9f0d9 Always use dummy serializer instead of null when old state
serializer cannot be read
Previously, the behaviour of `TypeSerializerSerializationUtil` read methods
in the case when serializers cannot be read, is quite mixed up. For some
exceptions (e.g. `ClassNotFoundException,
InvalidClassException`), a dummy serializer will be used as a replacement.
In other cases, `null` is used.
This commit fixes this by always using dummy serializers if a
`useDummyPlaceholder` flag is set to true. Otherwise, an `IOException` is
thrown. This makes it clear that users should use dummy serializers if they
want the deserialization to be tolerant to failures.
Another benefit of this is that there will no longer be `null` serializers
after restore; they will either be an actual serializer, or a dummy if the old
serializer cannot be restored.
- 95223fc Adds a `isSerializerPresenceRequired` flag to the
`KeyedBackendSerializationProxy`.
If true, restored serializers cannot be the dummy serializer, otherwise an
`IOException` will be thrown to fail the restore. Heap backends set this to
true, while RocksDB sets this to false.
## Verifying this change
There are two main test classes that already have coverage of this issue:
- `SerializationProxiesTest`
- `StateBackendTestBase`
A new test, `StateBackendTestBase#testSerializerPresenceOnRestore`,
additionally verifies the restore behaviour of heap / rocksdb state backends
when serializers are not present.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / **no**)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (yes / **no**)
- The serializers: (yes / **no** / don't know)
- The runtime per-record code paths (performance sensitive): (yes /
**no** / don't know)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: (**yes** / no / don't know)
- The S3 file system connector: (yes / **no** / don't know)
## Documentation
- Does this pull request introduce a new feature? (yes / **no**)
- If yes, how is the feature documented? (**not applicable** / docs /
JavaDocs / not documented)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tzulitai/flink FLINK-9169-approach2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5950.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5950
----
commit 2f9f0d9ab02c0c207e4ac887e958f8de9c057310
Author: Tzu-Li (Gordon) Tai <tzulitai@...>
Date: 2018-05-02T05:37:22Z
[FLINK-9169] [runtime] Always use dummy serializer instead of null when old
state serializer cannot be read
Prreviously, the behaviour of TypeSerializerSerializationUtil read
methods in the case when serializers cannot be read, is quite mixed up.
For some exceptions (e.g. ClassNotFoundException,
InvalidClassException), a dummy serializer will be used as a
replacement. In other cases, null is used.
This commit fixes this by always using dummy serializers if a
'useDummyPlaceholder' flag is set to true. Otherwise, an IOException is
thrown. This makes it clear that users should use dummy serializers if
they want the deserialization to be tolerant to failures.
Another benefit of this is that there will no longer be 'null'
serializers after restore; they will either be an actual serializer, or
a dummy if the old serializer cannot be restored.
commit 95223fc129ed0439b3f14636721cb72bc7560876
Author: Tzu-Li (Gordon) Tai <tzulitai@...>
Date: 2018-05-03T07:30:55Z
[FLINK-9169] [runtime] Allow specifiying serializer presence requirement in
KeyedBackendSerializationProxy
This commit consolidates logic of whether old serializers are required
to be present at restore time in the KeyedBackendSerializationProxy, via
a isSerializerPresenceRequired flag.
Heap-based backends typically set this to true, while RocksDB state
backend will set this to false. If set to true, restored serializers
cannot be the UnloadableDummyTypeSerializer, otherwise an IOException
will be thrown to fail the restore.
----
---