[
https://issues.apache.org/jira/browse/FLINK-8715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413453#comment-16413453
]
Arvid Heise commented on FLINK-8715:
------------------------------------
I don't have an overview for all the different use cases, so the following
might be too simplistic to work.
>From a user's perspective, the serializer is just a plugin into the state
>descriptor that helps to customize serialization/deserialization. I don't see
>the need for a user to ever user StateDescriptor#getSerializer. I also clearly
>see the ownership of the serializer instance at the "framework"; I don't want
>to keep an outer reference either and I wouldn't expect to to be able to
>change the serializer after handing it over. So from a user's perspective a
>serializer is immutable.
If you follow that thought, StateDescriptor#getSerializer should have been a
non-public API; used solely by the state backend and the state query engine.
Then you are free to return a direct reference to the serializer instead of a
copy.
The direct reference will then be configured using the snapshot state during
initialization. Then it is effectively immutable and the varies copies can be
created for different threads accessing the the state backend or state query
engine.
A quick solution to amend the current public #getSerializer is to mark it as
deprecated and introduce a non-public #getSerializerInternally (failed to find
a better name), which returns the direct reference instead of a copy. Replace
all references to getSerializer with getSerializerInternally and make copies of
the serializer only where needed.I think that goes along the lines what
[~StephanEwen] wrote in the end.Btw if you are making an exception for Kryo,
it's hard to argue why it's not possible for Avro as well. Either you can make
sure a serializer is reconfigurable or you don't support that at all.
> RocksDB does not propagate reconfiguration of serializer to the states
> ----------------------------------------------------------------------
>
> Key: FLINK-8715
> URL: https://issues.apache.org/jira/browse/FLINK-8715
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
> Affects Versions: 1.3.2
> Reporter: Arvid Heise
> Priority: Blocker
> Fix For: 1.5.0
>
>
> Any changes to the serializer done in #ensureCompability are lost during the
> state creation.
> In particular,
> [https://github.com/apache/flink/blob/master/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBValueState.java#L68]
> always uses a fresh copy of the StateDescriptor.
> An easy fix is to pass the reconfigured serializer as an additional parameter
> in
> [https://github.com/apache/flink/blob/master/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L1681]
> , which can be retrieved through the side-output of getColumnFamily
> {code:java}
> kvStateInformation.get(stateDesc.getName()).f1.getStateSerializer()
> {code}
> I encountered it in 1.3.2 but the code in the master seems unchanged (hence
> the pointer into master). I encountered it in ValueState, but I suspect the
> same issue can be observed for all kinds of RocksDB states.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)