[GitHub] [flink] RocMarshal commented on a change in pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese

GitBox Sat, 27 Jun 2020 20:51:39 -0700


RocMarshal commented on a change in pull request #12727:
URL: https://github.com/apache/flink/pull/12727#discussion_r446596356




##########
File path: docs/learn-flink/fault_tolerance.zh.md
##########
@@ -29,180 +29,156 @@ under the License.
 
 ## State Backends
 
-The keyed state managed by Flink is a sort of sharded, key/value store, and 
the working copy of each
-item of keyed state is kept somewhere local to the taskmanager responsible for 
that key. Operator
-state is also local to the machine(s) that need(s) it. Flink periodically 
takes persistent snapshots
-of all the state and copies these snapshots somewhere more durable, such as a 
distributed file
-system.
+由 Flink 管理的 keyed state 是一种分片的键/值存储，并且 keyed state 的每一项的工作副本都保存在负责该键的 
taskmanager 本地的某个地方。
+Operator state 对于需要它的机器节点来说也是本地的。Flink 定期获取所有状态的连续快照，并将这些快照复制到持久化的地方，例如分布式文件系统。
 
-In the event of the failure, Flink can restore the complete state of your 
application and resume
-processing as though nothing had gone wrong.
+如果发生故障，Flink 可以恢复应用程序的完整状态并恢复处理，就如同没有出现过异常一样。
 
-This state that Flink manages is stored in a _state backend_. Two 
implementations of state backends
-are available -- one based on RocksDB, an embedded key/value store that keeps 
its working state on
-disk, and another heap-based state backend that keeps its working state in 
memory, on the Java heap.
-This heap-based state backend comes in two flavors: the FsStateBackend that 
persists its state
-snapshots to a distributed file system, and the MemoryStateBackend that uses 
the JobManager's heap.
+Flink 管理的状态存储在 _state backend_。
+state backends 的两种实现 -- 一种是基于 RocksDB 内嵌 key/value 存储将其工作状态保存在磁盘上的，另一种基于堆的 
state backend，将其工作状态保存在 Java 的 堆内存中。
+这种基于堆的 state backend 有两种方式：保存其状态快照到分布式文件系统的 FsStateBackend，以及使用 JobManager 堆的 
MemoryStateBackend。
 
 <table class="table table-bordered">
   <thead>
     <tr class="alert alert-info">
-      <th class="text-left">Name</th>
+      <th class="text-left">名称</th>
       <th class="text-left">Working State</th>
-      <th class="text-left">State Backup</th>
-      <th class="text-left">Snapshotting</th>
+      <th class="text-left">状态备份</th>
+      <th class="text-left">快照</th>
     </tr>
   </thead>
   <tbody>
     <tr>
       <th class="text-left">RocksDBStateBackend</th>
-      <td class="text-left">Local disk (tmp dir)</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full / Incremental</td>
+      <td class="text-left">本地磁盘（tmp dir）</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量 / 增量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Supports state larger than available memory</li>
-          <li>Rule of thumb: 10x slower than heap-based backends</li>
+          <li>支持大于内存大小的状态</li>
+          <li>经验法则：比基于堆的后端慢10倍</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">FsStateBackend</th>
       <td class="text-left">JVM Heap</td>
-      <td class="text-left">Distributed file system</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">分布式文件系统</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Fast, requires large heap</li>
-          <li>Subject to GC</li>
+          <li>快速，需要大的堆内存</li>
+          <li>受限制于 GC</li>
         </ul>
       </td>
     </tr>
     <tr>
       <th class="text-left">MemoryStateBackend</th>
       <td class="text-left">JVM Heap</td>
       <td class="text-left">JobManager JVM Heap</td>
-      <td class="text-left">Full</td>
+      <td class="text-left">全量</td>
     </tr>
     <tr>
       <td colspan="4" class="text-left">
         <ul>
-          <li>Good for testing and experimentation with small state 
(locally)</li>
+          <li>适用于小状态（本地）的测试和实验</li>
         </ul>
       </td>
     </tr>
   </tbody>
 </table>
 
-When working with state kept in a heap-based state backend, accesses and 
updates involve reading and
-writing objects on the heap. But for objects kept in the 
`RocksDBStateBackend`, accesses and updates
-involve serialization and deserialization, and so are much more expensive. But 
the amount of state
-you can have with RocksDB is limited only by the size of the local disk. Note 
also that only the
-`RocksDBStateBackend` is able to do incremental snapshotting, which is a 
significant benefit for
-applications with large amounts of slowly changing state.
+当使用基于堆的 state backend 保存状态时，访问和更新涉及在堆上读写对象。
+但是对于保存在 `RocksDBStateBackend` 中的对象，访问和更新涉及序列化和反序列化，所以会有更加巨大的开销。
+但 RocksDB 的状态量仅受本地磁盘大小的限制。
+还要注意，只有 `RocksDBStateBackend` 能够进行增量快照，这对于具有大量变化缓慢状态的应用程序来说是大有裨益的。
 
-All of these state backends are able to do asynchronous snapshotting, meaning 
that they can take a
-snapshot without impeding the ongoing stream processing.
+所有这些 state backends 都能够异步执行快照，这意味着它们可以在不妨碍正在进行的流处理的情况下执行快照。
 
 {% top %}
 
-## State Snapshots
+## 状态快照
 
-### Definitions
+### 定义
 
-* _Snapshot_ -- a generic term referring to a global, consistent image of the 
state of a Flink job.
-  A snapshot includes a pointer into each of the data sources (e.g., an offset 
into a file or Kafka
-  partition), as well as a copy of the state from each of the job's stateful 
operators that resulted
-  from having processed all of the events up to those positions in the sources.
-* _Checkpoint_ -- a snapshot taken automatically by Flink for the purpose of 
being able to recover
-  from faults. Checkpoints can be incremental, and are optimized for being 
restored quickly.
-* _Externalized Checkpoint_ -- normally checkpoints are not intended to be 
manipulated by users.
-  Flink retains only the _n_-most-recent checkpoints (_n_ being configurable) 
while a job is
-  running, and deletes them when a job is cancelled. But you can configure 
them to be retained
-  instead, in which case you can manually resume from them.
-* _Savepoint_ -- a snapshot triggered manually by a user (or an API call) for 
some operational
-  purpose, such as a stateful redeploy/upgrade/rescaling operation. Savepoints 
are always complete,
-  and are optimized for operational flexibility.
+* _快照_ -- 通用术语，指的是 Flink 作业状态的全局一致镜像。
+快照包括指向每个数据源的指针（例如，到文件或 Kafka 分区的偏移量）以及每个作业的有状态运算符的状态副本，该状态副本包含了处理所有事件直至 
sources 中的那些位置。
+  
+* _Checkpoint_ -- 一种由 Flink 自动执行的快照，其目的是能够从故障中恢复。
+Checkpoints 可以是增量的，并为快速恢复进行了优化。
 
-### How does State Snapshotting Work?
+* _外部化的 Checkpoint_ -- 通常 checkpoints 不会被用户操纵。
+Flink 只保留作业运行时的最近的 _n_ 个 checkpoints（_n_ 可配置），并在作业取消时删除它们。
+但你可以将它们配置为保留，在这种情况下，你可以手动从中恢复。
 
-Flink uses a variant of the [Chandy-Lamport
-algorithm](https://en.wikipedia.org/wiki/Chandy-Lamport_algorithm) known as 
_asynchronous barrier
-snapshotting_.
+* _Savepoint_ -- 用户出于某种操作目的（例如有状态的重新部署/升级/缩放操作）手动（或 API 调用）触发的快照。Savepoints 
始终是完整的，并且已针对操作灵活性进行了优化。
 
-When a task manager is instructed by the checkpoint coordinator (part of the 
job manager) to begin a
-checkpoint, it has all of the sources record their offsets and insert numbered 
_checkpoint barriers_
-into their streams. These barriers flow through the job graph, indicating the 
part of the stream
-before and after each checkpoint. 
+### 状态快照如何工作？
+
+Flink 使用 [Chandy-Lamport 
algorithm](https://en.wikipedia.org/wiki/Chandy-Lamport_algorithm) 
算法的一种变体，称为异步屏障快照（_asynchronous barrier snapshotting_）。
+
+当 checkpoint coordinator（job manager 的一部分）指示 task manager 开始 checkpoint 
时，它会让所有 sources 记录它们的偏移量，并将编号的 _checkpoint barriers_ 
插入到它们的流中。这些屏障（barriers）流经作业图（job graph），标注每个 checkpoint 前后的流部分。
 
 <img src="{{ site.baseurl }}/fig/stream_barriers.svg" alt="Checkpoint barriers 
are inserted into the streams" class="center" width="80%" />

Review comment:
        @wuchong ,OK.
   Thank you for your reply.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] RocMarshal commented on a change in pull request #12727: [FLINK-17292][docs] Translate Fault Tolerance training lesson to Chinese

Reply via email to