[GitHub] igalshilman commented on a change in pull request #7124: [FLINK-9574] [doc] Rework documentation for custom state serializers and state evolution

GitBox Tue, 20 Nov 2018 04:54:14 -0800

igalshilman commented on a change in pull request #7124: [FLINK-9574] [doc]  
Rework documentation for custom state serializers and state evolution
URL: https://github.com/apache/flink/pull/7124#discussion_r234964879


 ##########
 File path: docs/dev/stream/state/schema_evolution.md
 ##########
 @@ -0,0 +1,92 @@
+---
+title: "State Schema Evolution"
+nav-parent_id: streaming_state
+nav-pos: 6
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* ToC
+{:toc}
+
+## Overview
+
+Apache Flink streaming applications are typically designed to run indefinitely 
for long periods of time.
+As with all long-running services, the applications need to be updated to 
adapt to changing requirements.
+This goes the same for data schemas that the applications work against; they 
evolve along with the application.
+
+This page provides an overview of how you can evolve your state type's data 
schema. 
+The current restrictions varies across different type's and state structures 
(et.c `ValueState`, `ListState`, etc.).
+
+Note that the information on this page is relevant only if you are using state 
serializers that is
+generated by Flink's own [type serialization framework]({{ site.baseurl 
}}/dev/types_serialization.html).
+That is, when declaring your state, the provided state descriptor is not 
configured to use a specific `TypeSerializer`
+or `TypeInformation`, and therefore allowing Flink to infer information about 
the state type:
+
+<div data-lang="java" markdown="1">
+{% highlight java %}
+ListStateDescriptor<MyPojoType> descriptor =
+    new ListStateDescriptor<>(
+        "state-name",
+        MyPojoType.class);
+
+checkpointedState = getRuntimeContext().getListState(descriptor);
+{% endhighlight %}
+</div>
+
+Under the hood, whether or not the schema of state can be evolved depends on 
the serializer used to read / write
+persisted state bytes. Simply put, a registered state's schema can only be 
evolved if its serializer properly
+supports it. This is handled transparently by serializers generated by Flink's 
type serialization framework
+(current scope of support is listed [below]({{ site.baseurl 
}}/dev/stream/state/schema_evolution#supported-data-types-for-schema-evolution)).
+
+If you intend to implement a custom `TypeSerializer` for your state type and 
would like to learn how to implement
+the serializer to support state schema evolution, please refer to
+[Custom State Serialization]({{ site.baseurl 
}}/dev/stream/state/custom_serialization).
+The documentation there also covers necessary internal details about the 
interplay between state serializers and Flink's
+state backends to support state schema evolution.
+
+## Evolving state schema
+
+To evolve the schema of a given state type, you would take the following steps:
+
+ 1. Take a savepoint of your Flink streaming job.
+ 2. Update state types in your application (e.g., modifying your Avro / POJO 
type schema).
+ 3. Restore the job from the savepoint. When accessing state for the first 
time, Flink will assess whether or not
+ the schema had been changed for the state, and migrate state schema if 
necessary.
+
+The process of migrating state to adapt to changed schemas happens 
automatically, and independently for each state.
+Further details about the migration process is out of the scope of this 
documentation; please refer to
+[here]({{ site.baseurl }}/dev/stream/state/custom_serialization).
+
+## Supported data types for schema evolution
+
+Currently, schema evolution is supported only for Avro. Therefore, if you care 
about schema evolution for
+state, it is currently recommended to always use Avro for state data types.
+
+There are plans to extend the support for more composite types, such as POJOs; 
for more details,
+please refer to 
[FLINK-10897](https://issues.apache.org/jira/browse/FLINK-10897).
+
+### Avro types
+
+Flink fully supports evolving schema of Avro type state, as long as the schema 
change is considered compatible by
+[Avro's rules for schema 
resolution](http://avro.apache.org/docs/current/spec.html#Schema+Resolution).
+
+Moreover, it is possible on restore to switch from using Avro-generated 
`SpecificRecord`s to `GenericRecord`s,
 
 Review comment:
   Would it make sense to mention the limitation here? 
   For example, you can't change the namespace, or relocated the Avro generated 
classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] igalshilman commented on a change in pull request #7124: [FLINK-9574] [doc] Rework documentation for custom state serializers and state evolution

Reply via email to