GitHub user tzulitai opened a pull request:

    https://github.com/apache/flink/pull/5950

    [FLINK-9169] [state-backend] Allow absence of old serializers when 
restoring RocksDBStateBackend

    ## What is the purpose of the change
    
    This PR contains 2 commits that overall allows absence of old state 
serializers when restoring the RocksDB state backend. It also eliminates the 
possibility of confusing NPEs which comes with the fact that previously, 
restored serializers may be `null`.
    
    Allowing old state serializers to be absent for the RocksDB state backend 
will allow for users to perform state evolutions that they couldn't before.
    
    ## Brief change log
    
    - 2f9f0d9 Always use dummy serializer instead of null when old state 
serializer cannot be read
    Previously, the behaviour of `TypeSerializerSerializationUtil` read methods 
in the case when serializers cannot be read, is quite mixed up. For some 
exceptions (e.g. `ClassNotFoundException,
    InvalidClassException`), a dummy serializer will be used as a replacement. 
In other cases, `null` is used.
    This commit fixes this by always using dummy serializers if a 
`useDummyPlaceholder` flag is set to true. Otherwise, an `IOException` is 
thrown. This makes it clear that users should use dummy serializers if they 
want the deserialization to be tolerant to failures.
    
    Another benefit of this is that there will no longer be `null` serializers 
after restore; they will either be an actual serializer, or a dummy if the old 
serializer cannot be restored.
    
    - 95223fc Adds a `isSerializerPresenceRequired` flag to the 
`KeyedBackendSerializationProxy`.
    If true, restored serializers cannot be the dummy serializer, otherwise an 
`IOException` will be thrown to fail the restore. Heap backends set this to 
true, while RocksDB sets this to false.
    
    ## Verifying this change
    
    There are two main test classes that already have coverage of this issue:
    - `SerializationProxiesTest`
    - `StateBackendTestBase`
    
    A new test, `StateBackendTestBase#testSerializerPresenceOnRestore`, 
additionally verifies the restore behaviour of heap / rocksdb state backends 
when serializers are not present.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (yes / **no**)
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
      - The serializers: (yes / **no** / don't know)
      - The runtime per-record code paths (performance sensitive): (yes / 
**no** / don't know)
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (**yes** / no / don't know)
      - The S3 file system connector: (yes / **no** / don't know)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes / **no**)
      - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tzulitai/flink FLINK-9169-approach2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5950.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5950
    
----
commit 2f9f0d9ab02c0c207e4ac887e958f8de9c057310
Author: Tzu-Li (Gordon) Tai <tzulitai@...>
Date:   2018-05-02T05:37:22Z

    [FLINK-9169] [runtime] Always use dummy serializer instead of null when old 
state serializer cannot be read
    
    Prreviously, the behaviour of TypeSerializerSerializationUtil read
    methods in the case when serializers cannot be read, is quite mixed up.
    For some exceptions (e.g. ClassNotFoundException,
    InvalidClassException), a dummy serializer will be used as a
    replacement. In other cases, null is used.
    
    This commit fixes this by always using dummy serializers if a
    'useDummyPlaceholder' flag is set to true. Otherwise, an IOException is
    thrown. This makes it clear that users should use dummy serializers if
    they want the deserialization to be tolerant to failures.
    
    Another benefit of this is that there will no longer be 'null'
    serializers after restore; they will either be an actual serializer, or
    a dummy if the old serializer cannot be restored.

commit 95223fc129ed0439b3f14636721cb72bc7560876
Author: Tzu-Li (Gordon) Tai <tzulitai@...>
Date:   2018-05-03T07:30:55Z

    [FLINK-9169] [runtime] Allow specifiying serializer presence requirement in 
KeyedBackendSerializationProxy
    
    This commit consolidates logic of whether old serializers are required
    to be present at restore time in the KeyedBackendSerializationProxy, via
    a isSerializerPresenceRequired flag.
    
    Heap-based backends typically set this to true, while RocksDB state
    backend will set this to false. If set to true, restored serializers
    cannot be the UnloadableDummyTypeSerializer, otherwise an IOException
    will be thrown to fail the restore.

----


---

Reply via email to