Repository: flink
Updated Branches:
  refs/heads/master 354201930 -> fe6b83585


[hotfix] [docs] Add a rouch description about internal types of states and 
state backends


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/fe6b8358
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/fe6b8358
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/fe6b8358

Branch: refs/heads/master
Commit: fe6b835855e0e376d35f48cf208bf3901fe040b8
Parents: 3542019
Author: Stephan Ewen <[email protected]>
Authored: Mon Nov 28 14:55:31 2016 +0100
Committer: Stephan Ewen <[email protected]>
Committed: Mon Nov 28 14:55:31 2016 +0100

----------------------------------------------------------------------
 docs/internals/state_backends.md       | 84 +++++++++++++++++++++++++++++
 docs/internals/stream_checkpointing.md | 13 +----
 2 files changed, 85 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/fe6b8358/docs/internals/state_backends.md
----------------------------------------------------------------------
diff --git a/docs/internals/state_backends.md b/docs/internals/state_backends.md
new file mode 100644
index 0000000..e9f9fd8
--- /dev/null
+++ b/docs/internals/state_backends.md
@@ -0,0 +1,84 @@
+---
+title:  "State and State Backends"
+nav-title: State Backends
+nav-parent_id: internals
+nav-pos: 4
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+**NOTE** This document is only a sketch of some bullet points, to be fleshed 
out.
+
+**NOTE** The structure of State Backends changed heavily between version 1.1 
and 1.2. This documentation is only applicable
+to Apache Flink version 1.2 and later.
+
+
+## Keyed State and Operator state
+
+There are two basic state backends: `Keyed State` and `Operator State`.
+
+#### Keyed State
+
+*Keyed State* is always relative to keys and can only be used in functions and 
operator on a `KeyedStream`.
+Examples of keyed state are the `ValueState` or `ListState` that one can 
create in a function on a `KeyedStream`, as
+well at the state of a keyed window operator.
+
+Keyed State is organized in so called *Key Groups*. Key Groups are the unit in 
which keyed state can be redistributed and
+there are as many key groups as the defined maximum parallelism.
+During execution, each parallel instance of an operator gets one or more key 
groups.
+
+#### Operator State
+
+*Operator State* is state per parallel subtask. It subsume the `Checkpointed` 
interface in Flink 1.0 and Flink 1.1.
+The new `CheckpointedFunction` interface is basically a shortcut (syntactic 
sugar) for the Operator State.
+
+Operator State needs special re-distribution schemes when parallelism is 
changed. There can be different variations of such
+schemes, out of which the following are currently defined:
+
+  - **List-style redistribution:** Each operator returns a List of state 
elements. The whole state is logically a concatenation of
+    all lists. On restore/redistribution, the list is evenly divided into as 
many sublists as there are parallel operators.
+    Each operator gets a sublist, which can be empty, or contain one or more 
elements.
+
+
+## Raw and Managed State
+
+*Keyed State* and *Operator State* exist in two forms: *managed* and *raw*.
+
+*Managed State* is represented in data structured controlled by the Flink 
runtime, such as internal hash tables, or RocksDB.
+Examples are the "ValueState", "ListState", etc. Flink's runtime encodes the 
states and writes them into the checkpoints.
+
+*Raw State* is state that users and operators keep in their own data 
structures. Upon checkpoints, they only write a sequence or bytes into
+the checkpoint. Flink knows nothing about the state's data structures and sees 
only the raw bytes.
+
+
+## Checkpointing Procedure
+
+When operator snapshots are takes, there are two parts: The **synchronous** 
and the **asynchronous** part.
+
+Operators and state backends provide their snapshots as a Java `FutureTask`. 
That task contains the state where tte *synchronous* part
+is completed and the *asynchronous* part is pending. The asynchronous part is 
then executed by a background thread for that checkpoint.
+
+Operators that checkpoint purely synchronous return an already completed 
`FutureTask`.
+If an asynchronous operation needs to be performed, it is executed in the 
`run()` method of that `FutureTask`.
+
+The tasks are canceleable, in order to release streams and other resource 
consuming handles.
+

http://git-wip-us.apache.org/repos/asf/flink/blob/fe6b8358/docs/internals/stream_checkpointing.md
----------------------------------------------------------------------
diff --git a/docs/internals/stream_checkpointing.md 
b/docs/internals/stream_checkpointing.md
index aaf7386..75493ca 100644
--- a/docs/internals/stream_checkpointing.md
+++ b/docs/internals/stream_checkpointing.md
@@ -133,7 +133,6 @@ of the data after checkpoint *n*.
 Because of that, dataflows with only embarrassingly parallel streaming 
operations (`map()`, `flatMap()`, `filter()`, ...) actually give *exactly once* 
guarantees even
 in *at least once* mode.
 
-<!--
 
 ### Asynchronous State Snapshots
 
@@ -143,17 +142,7 @@ It is possible to let an operator continue processing 
while it stores its state
 
 After receiving the checkpoint barriers on its inputs, the operator starts the 
asynchronous snapshot copying of its state. It immediately emits the barrier to 
its outputs and continues with the regular stream processing. Once the 
background copy process has completed, it acknowledges the checkpoint to the 
checkpoint coordinator (the JobManager). The checkpoint is now only complete 
after all sinks received the barriers and all stateful operators acknowledged 
their completed backup (which may be later than the barriers reaching the 
sinks).
 
-User-defined state that is used through the key/value state abstraction can be 
snapshotted *asynchronously*.
-User functions that implement the interface {% gh_link 
/flink-FIXME/flink-streaming/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/checkpoint/Checkpointed.java
 "Checkpointed" %} will be snapshotted *synchronously*, while functions that 
implement {% gh_link 
/flink-FIXME/flink-streaming/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/checkpoint/CheckpointedAsynchronously.java
 "CheckpointedAsynchronously" %} will be snapshotted *asynchronously*. Note 
that for the latter, the user function must guarantee that any future 
modifications to its state to not affect the state object returned by the 
`snapshotState()` method.
-
-
-
-### Incremental State Snapshots
-
-For large state, taking a snapshot copy of the entire state can be costly, and 
may prohibit very frequent checkpoints. This problem can be solved by drawing 
*incremental state snapshots*.
-For incremental snapshots, only the changes since the last snapshot are stored 
in the current snapshot. The state can then be reconstructed by taking the 
latest full snapshot and applying the incremental changes to the state.
-
--->
+See [State Backends]({{ site.baseurl }}/internals/state_backends.html) for 
details on the state snapshots.
 
 
 ## Recovery

Reply via email to