xintongsong commented on a change in pull request #9805: [FLINK-14227] 
translate dev/stream/state/checkpointing into Chinese
URL: https://github.com/apache/flink/pull/9805#discussion_r329423687
 
 

 ##########
 File path: docs/dev/stream/state/checkpointing.zh.md
 ##########
 @@ -25,146 +25,138 @@ under the License.
 * ToC
 {:toc}
 
-Every function and operator in Flink can be **stateful** (see [working with 
state](state.html) for details).
-Stateful functions store data across the processing of individual 
elements/events, making state a critical building block for
-any type of more elaborate operation.
+Flink 中的每个方法或算子都能够被**状态化**(阅读 [working with state](state.html) 查看详细)。
+状态化的方法在处理单个 元素/事件 的时候存储数据,让状态成为使各个类型的算子更加精细的重要部分。
+为了让状态容错,Flink 需要为状态添加**检查点(Checkpoint)**。检查点允许 Flink 
恢复状态,并且能够在流中定位,让应用像无故障一样的运行。
 
-In order to make state fault tolerant, Flink needs to **checkpoint** the 
state. Checkpoints allow Flink to recover state and positions
-in the streams to give the application the same semantics as a failure-free 
execution.
+[Documentation on streaming fault tolerance]({{ site.baseurl 
}}/internals/stream_checkpointing.html) 介绍了 Flink 流计算容错机制的内部技术原理。
 
-The [documentation on streaming fault tolerance]({{ site.baseurl 
}}/internals/stream_checkpointing.html) describes in detail the technique 
behind Flink's streaming fault tolerance mechanism.
 
+## 前提条件
 
-## Prerequisites
+Flink 的检查点机制会与流和状态的持久化存储交互。一般需要:
 
-Flink's checkpointing mechanism interacts with durable storage for streams and 
state. In general, it requires:
+  - 一个能够在一定时间内重新得到记录的持久化数据源,例如持久化消息队列(例如 Apache Kafka、RabbitMQ、 Amazon 
Kinesis、 Google PubSub 等)或文件系统(例如 HDFS、 S3、 GFS、 NFS、 Ceph 等)。
+  - 存放状态的持久化存储,通常为分布式文件系统(比如 HDFS、 S3、 GFS、 NFS、 Ceph 等)。
 
-  - A *persistent* (or *durable*) data source that can replay records for a 
certain amount of time. Examples for such sources are persistent messages 
queues (e.g., Apache Kafka, RabbitMQ, Amazon Kinesis, Google PubSub) or file 
systems (e.g., HDFS, S3, GFS, NFS, Ceph, ...).
-  - A persistent storage for state, typically a distributed filesystem (e.g., 
HDFS, S3, GFS, NFS, Ceph, ...)
+## 激活与配置检查点
 
+默认情况下,检查点是禁用的。通过调用 `StreamExecutionEnvironment` 的 `enableCheckpointing(n)` 
来激活检查点,里面的 *n* 是进行检查点的间隔,单位毫秒。
 
-## Enabling and Configuring Checkpointing
+检查点其他的属性包括:
 
-By default, checkpointing is disabled. To enable checkpointing, call 
`enableCheckpointing(n)` on the `StreamExecutionEnvironment`, where *n* is the 
checkpoint interval in milliseconds.
-
-Other parameters for checkpointing include:
-
-  - *exactly-once vs. at-least-once*: You can optionally pass a mode to the 
`enableCheckpointing(n)` method to choose between the two guarantee levels.
-    Exactly-once is preferable for most applications. At-least-once may be 
relevant for certain super-low-latency (consistently few milliseconds) 
applications.
-
-  - *checkpoint timeout*: The time after which a checkpoint-in-progress is 
aborted, if it did not complete by then.
-
-  - *minimum time between checkpoints*: To make sure that the streaming 
application makes a certain amount of progress between checkpoints,
-    one can define how much time needs to pass between checkpoints. If this 
value is set for example to *5000*, the next checkpoint will be
-    started no sooner than 5 seconds after the previous checkpoint completed, 
regardless of the checkpoint duration and the checkpoint interval.
-    Note that this implies that the checkpoint interval will never be smaller 
than this parameter.
+  - *仅仅一次(exactly-once) 对比 至少一次(at-least-once)*:你可以选择向 
`enableCheckpointing(n)` 方法中传入一个模式来选择使用两种保证等级中的哪一种。
+    对于大多数应用来说,仅仅一次是较好的选择。至少一次可能与某些延迟超低(始终只有几毫秒)的应用的关联较大。
+  
+  - *检查点超时(checkpoint timeout)*:如果过了这个时间,还在进行中的检查点操作就会被抛弃。
+  
+  - *检查点之间的最小时间(minimum time between checkpoints)*: 
为了确保流应用在检查点之间有足够的进展,可以定义在检查点之间需要多久的时间。如果值设置为了 *5000*,
+    无论检查点持续时间与间隔是多久,在前一个检查点完成的五秒后才会开始下一个检查点。
     
-    It is often easier to configure applications by defining the "time between 
checkpoints" than the checkpoint interval, because the "time between 
checkpoints"
-    is not susceptible to the fact that checkpoints may sometimes take longer 
than on average (for example if the target storage system is temporarily slow).
-
-    Note that this value also implies that the number of concurrent 
checkpoints is *one*.
-
-  - *number of concurrent checkpoints*: By default, the system will not 
trigger another checkpoint while one is still in progress.
-    This ensures that the topology does not spend too much time on checkpoints 
and not make progress with processing the streams.
-    It is possible to allow for multiple overlapping checkpoints, which is 
interesting for pipelines that have a certain processing delay
-    (for example because the functions call external services that need some 
time to respond) but that still want to do very frequent checkpoints
-    (100s of milliseconds) to re-process very little upon failures.
-
-    This option cannot be used when a minimum time between checkpoints is 
defined.
-
-  - *externalized checkpoints*: You can configure periodic checkpoints to be 
persisted externally. Externalized checkpoints write their meta data out to 
persistent storage and are *not* automatically cleaned up when the job fails. 
This way, you will have a checkpoint around to resume from if your job fails. 
There are more details in the [deployment notes on externalized checkpoints]({{ 
site.baseurl }}/ops/state/checkpoints.html#externalized-checkpoints).
-
-  - *fail/continue task on checkpoint errors*: This determines if a task will 
be failed if an error occurs in the execution of the task's checkpoint 
procedure. This is the default behaviour. Alternatively, when this is disabled, 
the task will simply decline the checkpoint to the checkpoint coordinator and 
continue running.
+    
往往使用"检查点之间的时间"来配置应用会比检查点间隔容易很多,因为"检查点之间的时间"在检查点的执行时间在超过平均值时不会受到影响(例如如果目标的存储系统忽然变得很慢)。
+    
+    注意这个值也意味着并发检查点的数目是*一*。
 
-  - *prefer checkpoint for recovery*: This determines if a job will fallback 
to latest checkpoint even when there are more recent savepoints available to 
potentially reduce recovery time.
+  - *并发检查点的数目*: 默认情况下,系统不会在有一个检查点在进行时触发另一个检查点。这确保了拓扑不会再检查点上花费太多时间,并且不会和流处理一块进行。
+    在感兴趣的 pipelines 
上有确定的处理延迟(例如是因为某方法所调用的外部服务需要些时间来回复),但是仍然想进行频繁的检查点去最小化故障重跑时,允许多个检查点重叠是可能的。
 
 Review comment:
   这里interesting指的是"allow for multiple overlapping checkpoints"对于"pipelines 
that have certain processing delay"是有意义的

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to