[jira] [Commented] (FLINK-9574) Add a dedicated documentation page for state evolution

ASF GitHub Bot (JIRA) Tue, 20 Nov 2018 04:54:28 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16693186#comment-16693186
 ]


ASF GitHub Bot commented on FLINK-9574:
---------------------------------------

igalshilman commented on a change in pull request #7124: [FLINK-9574] [doc]  
Rework documentation for custom state serializers and state evolution
URL: https://github.com/apache/flink/pull/7124#discussion_r234973231
 
 

 ##########
 File path: docs/dev/stream/state/custom_serialization.md
 ##########
 @@ -66,125 +69,166 @@ checkpointedState = 
getRuntimeContext.getListState(descriptor)
 </div>
 </div>
 
-Note that Flink writes state serializers along with the state as metadata. In 
certain cases on restore (see following
-subsections), the written serializer needs to be deserialized and used. 
Therefore, it is recommended to avoid using
-anonymous classes as your state serializers. Anonymous classes do not have a 
guarantee on the generated classname,
-which varies across compilers and depends on the order that they are 
instantiated within the enclosing class, which can 
-easily cause the previously written serializer to be unreadable (since the 
original class can no longer be found in the
-classpath).
-
-### Handling serializer upgrades and compatibility
-
-Flink allows changing the serializers used to read and write managed state, so 
that users are not locked in to any
-specific serialization. When state is restored, the new serializer registered 
for the state (i.e., the serializer
-that comes with the `StateDescriptor` used to access the state in the restored 
job) will be checked for compatibility,
-and is replaced as the new serializer for the state.
-
-A compatible serializer would mean that the serializer is capable of reading 
previous serialized bytes of the state,
-and the written binary format of the state also remains identical. The means 
to check the new serializer's compatibility
-is provided through the following two methods of the `TypeSerializer` 
interface:
-
-{% highlight java %}
-public abstract TypeSerializerConfigSnapshot snapshotConfiguration();
-public abstract CompatibilityResult 
ensureCompatibility(TypeSerializerConfigSnapshot configSnapshot);
-{% endhighlight %}
-
-Briefly speaking, every time a checkpoint is performed, the 
`snapshotConfiguration` method is called to create a
-point-in-time view of the state serializer's configuration. The returned 
configuration snapshot is stored along with the
-checkpoint as the state's metadata. When the checkpoint is used to restore a 
job, that serializer configuration snapshot
-will be provided to the _new_ serializer of the same state via the counterpart 
method, `ensureCompatibility`, to verify
-compatibility of the new serializer. This method serves as a check for whether 
or not the new serializer is compatible,
-as well as a hook to possibly reconfigure the new serializer in the case that 
it is incompatible.
+## State serializers and schema evolution
 
-Note that Flink's own serializers are implemented such that they are at least 
compatible with themselves, i.e. when the
-same serializer is used for the state in the restored job, the serializer's 
will reconfigure themselves to be compatible
-with their previous configuration.
+This section explains the user-facing abstractions related to state 
serialization and schema evolution, and necessary
+internal details about how Flink interacts with these abstractions.
 
-The following subsections illustrate guidelines to implement these two methods 
when using custom serializers.
+When restoring from savepoints, Flink allows changing the serializers used to 
read and write prior registered state,
+so that users are not locked in to any specific serialization schema. When 
state is restored, a new serializer will be
+registered for the state (i.e., the serializer that comes with the 
`StateDescriptor` used to access the state in the
+restored job). This new serializer may have a different schema than that of 
the prior serializer. Therefore, when
+implementing state serializers, besides the basic logic of reading / writing 
data, another important thing to keep in
+mind is how the serialization schema can be changed in the future.
 
-#### Implementing the `snapshotConfiguration` method
+When speaking of *schema*, in this context the term is interchangeable between 
referring to the *data model* of a state
+type and the *serialized binary format* of a state type. The schema, generally 
speaking, can change for a few cases:
 
-The serializer's configuration snapshot should capture enough information such 
that on restore, the information
-carried over to the new serializer for the state is sufficient for it to 
determine whether or not it is compatible.
-This could typically contain information about the serializer's parameters or 
binary format of the serialized data;
-generally, anything that allows the new serializer to decide whether or not it 
can be used to read previous serialized
-bytes, and that it writes in the same binary format.
+ 1. Data schema of the state type has evolved, i.e. adding or removing a field 
from a POJO that is used as state.
+ 2. Following the above, generally speaking, the serialization format of the 
serializer is intended to be upgraded.
+ 3. Configuration of the serializer has changed, i.e. Kryo types are 
registered in a different order than the
 
 Review comment:
   suggestion:  remove Kryo from here since it might confuse the reader (the 
page deals with custom serializers, and we support Kryo)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add a dedicated documentation page for state evolution
> ------------------------------------------------------
>
>                 Key: FLINK-9574
>                 URL: https://issues.apache.org/jira/browse/FLINK-9574
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Documentation, State Backends, Checkpointing, Type 
> Serialization System
>            Reporter: Tzu-Li (Gordon) Tai
>            Assignee: Tzu-Li (Gordon) Tai
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.8.0
>
>
> Currently, the only bit of documentation about serializer upgrades / state 
> evolution, is 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/state/custom_serialization.html#handling-serializer-upgrades-and-compatibility.],
>  which only explains things at an API level.
> State evolution over the time has proved to be a rather complex topic that is 
> often overlooked by users. Users would probably benefit from a actual 
> full-grown dedicated page that covers both API, some necessary internal 
> details regarding interplay of state serializers, best practices, and caution 
> notices.
> I propose to add this documentation as a subpage under Streaming/State & 
> Fault-Tolerance/.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-9574) Add a dedicated documentation page for state evolution

Reply via email to