[GitHub] [flink] RocMarshal commented on a change in pull request #16304: [FLINK-12438][doc-zh]Translate Task Lifecycle document into Chinese

GitBox Thu, 15 Jul 2021 05:00:42 -0700


RocMarshal commented on a change in pull request #16304:
URL: https://github.com/apache/flink/pull/16304#discussion_r670344867




##########
File path: docs/content.zh/docs/internals/task_lifecycle.md
##########
@@ -24,181 +24,123 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+<a name='task-lifecycle'> </a>
+
 # Task 生命周期
 
-A task in Flink is the basic unit of execution. It is the place where each 
parallel instance of an
-operator is executed. As an example, an operator with a parallelism of *5* 
will have each of its
-instances executed by a separate task.
+Task 是 Flink 的基本执行单元。算子的每个并行实例都在 task 里执行。例如，一个并行度为 5 的算子，它的每个实例都由一个单独的 task 
来执行。

Review comment:
       ```suggestion
   Task 是 Flink 的基本执行单元。算子的每个并行实例都在 task 里执行。例如，一个并行度为 *5* 的算子，它的每个实例都由一个单独的 
task 来执行。
   ```

##########
File path: docs/content.zh/docs/internals/task_lifecycle.md
##########
@@ -24,181 +24,123 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+<a name='task-lifecycle'> </a>
+
 # Task 生命周期
 
-A task in Flink is the basic unit of execution. It is the place where each 
parallel instance of an
-operator is executed. As an example, an operator with a parallelism of *5* 
will have each of its
-instances executed by a separate task.
+Task 是 Flink 的基本执行单元。算子的每个并行实例都在 task 里执行。例如，一个并行度为 5 的算子，它的每个实例都由一个单独的 task 
来执行。
+
+在 Flink 流式计算引擎里，`StreamTask` 是所有不同 task 子类的基础。本文会深入讲解  `StreamTask` 
生命周期的不同阶段，并描述每个阶段的主要方法。
+
+<a name="operator-lifecycle-in-a-nutshell"> </a>
 
-The `StreamTask` is the base for all different task sub-types in Flink's 
streaming engine. This
-document goes through the different phases in the lifecycle of the 
`StreamTask` and describes the
-main methods representing each of these phases.
+## 算子生命周期简介
 
-## Operator Lifecycle in a nutshell
+因为 task 是算子并行实例的执行实体，所以它的生命周期跟算子的生命周期紧密联系在一起。因此，在深入介绍 `StreamTask` 
生命周期之前，先简要介绍一下代表算子生命周期的各个基本方法。这些方法按调用的先后顺序如下所示。考虑到算子可能是用户自定义函数(*UDF*)，在每个算子的下方也展示（以缩进的方式）了
 UDF 生命周期里调用的各个方法。如果算子继承了 `AbstractUdfStreamOperator` 
的话，这些方法都是可用的，`AbstractUdfStreamOperator` 是所有继承 UDF 算子的基类。
 
-Because the task is the entity that executes a parallel instance of an 
operator, its lifecycle is tightly integrated 
-with that of an operator. So, we will briefly mention the basic methods 
representing the lifecycle of an operator before 
-diving into those of the `StreamTask` itself. The list is presented below in 
the order that each of the methods is called. 
-Given that an operator can have a user-defined function (*UDF*), below each of 
the operator methods we also present 
-(indented) the methods in the lifecycle of the UDF that it calls. These 
methods are available if your operator extends 
-the `AbstractUdfStreamOperator`, which is the basic class for all operators 
that execute UDFs.
 
-        // initialization phase
+ 
+
+        // 初始化阶段
         OPERATOR::setup
             UDF::setRuntimeContext
         OPERATOR::initializeState
         OPERATOR::open
             UDF::open
-
-        // processing phase (called on every element/watermark)
+        
+        // 调用处理阶段（通过每条数据或 watermark 来调用）
         OPERATOR::processElement
             UDF::run
         OPERATOR::processWatermark
         
-        // checkpointing phase (called asynchronously on every checkpoint)
+        // checkpointing 阶段（通过每个 checkpoint 异步调用）
         OPERATOR::snapshotState
-
-        // notify the operator about the end of processing records
+        
+        // 通知 operator 处理记录的过程结束
         OPERATOR::finish
-
-        // termination phase
+                
+        // 结束阶段
         OPERATOR::close
             UDF::close
-    
-In a nutshell, the `setup()` is called to initialize some operator-specific 
machinery, such as its `RuntimeContext` and 
-its metric collection data-structures. After this, the `initializeState()` 
gives an operator its initial state, and the 
- `open()` method executes any operator-specific initialization, such as 
opening the user-defined function in the case of 
-the `AbstractUdfStreamOperator`. 
+
+简而言之，`setup()` 在算子初始化时被调用，比如 `RuntimeContext` 和指标收集的数据结构。在这之后，算子通过 
`initializeState()` 初始化状态，算子的所有初始化工作在 `open()` 方法中执行，比如在继承 
`AbstractUdfStreamOperator` 的情况下，初始化用户定义函数。

Review comment:
       ```suggestion
   简而言之，在算子初始化时调用 `setup()` 来初始化算子的特定设置，比如 `RuntimeContext` 
和指标收集的数据结构。在这之后，算子通过 `initializeState()` 初始化状态，算子的所有初始化工作在 `open()` 方法中执行，比如在继承 
`AbstractUdfStreamOperator` 的情况下，初始化用户自定义函数。
   ```

##########
File path: docs/content.zh/docs/internals/task_lifecycle.md
##########
@@ -24,181 +24,123 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+<a name='task-lifecycle'> </a>
+
 # Task 生命周期
 
-A task in Flink is the basic unit of execution. It is the place where each 
parallel instance of an
-operator is executed. As an example, an operator with a parallelism of *5* 
will have each of its
-instances executed by a separate task.
+Task 是 Flink 的基本执行单元。算子的每个并行实例都在 task 里执行。例如，一个并行度为 5 的算子，它的每个实例都由一个单独的 task 
来执行。
+
+在 Flink 流式计算引擎里，`StreamTask` 是所有不同 task 子类的基础。本文会深入讲解  `StreamTask` 
生命周期的不同阶段，并描述每个阶段的主要方法。
+
+<a name="operator-lifecycle-in-a-nutshell"> </a>
 
-The `StreamTask` is the base for all different task sub-types in Flink's 
streaming engine. This
-document goes through the different phases in the lifecycle of the 
`StreamTask` and describes the
-main methods representing each of these phases.
+## 算子生命周期简介
 
-## Operator Lifecycle in a nutshell
+因为 task 是算子并行实例的执行实体，所以它的生命周期跟算子的生命周期紧密联系在一起。因此，在深入介绍 `StreamTask` 
生命周期之前，先简要介绍一下代表算子生命周期的各个基本方法。这些方法按调用的先后顺序如下所示。考虑到算子可能是用户自定义函数(*UDF*)，在每个算子的下方也展示（以缩进的方式）了
 UDF 生命周期里调用的各个方法。如果算子继承了 `AbstractUdfStreamOperator` 
的话，这些方法都是可用的，`AbstractUdfStreamOperator` 是所有继承 UDF 算子的基类。
 
-Because the task is the entity that executes a parallel instance of an 
operator, its lifecycle is tightly integrated 
-with that of an operator. So, we will briefly mention the basic methods 
representing the lifecycle of an operator before 
-diving into those of the `StreamTask` itself. The list is presented below in 
the order that each of the methods is called. 
-Given that an operator can have a user-defined function (*UDF*), below each of 
the operator methods we also present 
-(indented) the methods in the lifecycle of the UDF that it calls. These 
methods are available if your operator extends 
-the `AbstractUdfStreamOperator`, which is the basic class for all operators 
that execute UDFs.
 
-        // initialization phase
+ 
+
+        // 初始化阶段
         OPERATOR::setup
             UDF::setRuntimeContext
         OPERATOR::initializeState
         OPERATOR::open
             UDF::open
-
-        // processing phase (called on every element/watermark)
+        
+        // 调用处理阶段（通过每条数据或 watermark 来调用）
         OPERATOR::processElement
             UDF::run
         OPERATOR::processWatermark
         
-        // checkpointing phase (called asynchronously on every checkpoint)
+        // checkpointing 阶段（通过每个 checkpoint 异步调用）

Review comment:
       ```suggestion
           // checkpointing 阶段（对每个 checkpoint 异步调用）
   ```

##########
File path: docs/content.zh/docs/internals/task_lifecycle.md
##########
@@ -24,181 +24,123 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+<a name='task-lifecycle'> </a>
+
 # Task 生命周期
 
-A task in Flink is the basic unit of execution. It is the place where each 
parallel instance of an
-operator is executed. As an example, an operator with a parallelism of *5* 
will have each of its
-instances executed by a separate task.
+Task 是 Flink 的基本执行单元。算子的每个并行实例都在 task 里执行。例如，一个并行度为 5 的算子，它的每个实例都由一个单独的 task 
来执行。
+
+在 Flink 流式计算引擎里，`StreamTask` 是所有不同 task 子类的基础。本文会深入讲解  `StreamTask` 
生命周期的不同阶段，并描述每个阶段的主要方法。
+
+<a name="operator-lifecycle-in-a-nutshell"> </a>
 
-The `StreamTask` is the base for all different task sub-types in Flink's 
streaming engine. This
-document goes through the different phases in the lifecycle of the 
`StreamTask` and describes the
-main methods representing each of these phases.
+## 算子生命周期简介
 
-## Operator Lifecycle in a nutshell
+因为 task 是算子并行实例的执行实体，所以它的生命周期跟算子的生命周期紧密联系在一起。因此，在深入介绍 `StreamTask` 
生命周期之前，先简要介绍一下代表算子生命周期的各个基本方法。这些方法按调用的先后顺序如下所示。考虑到算子可能是用户自定义函数(*UDF*)，在每个算子的下方也展示（以缩进的方式）了
 UDF 生命周期里调用的各个方法。如果算子继承了 `AbstractUdfStreamOperator` 
的话，这些方法都是可用的，`AbstractUdfStreamOperator` 是所有继承 UDF 算子的基类。
 
-Because the task is the entity that executes a parallel instance of an 
operator, its lifecycle is tightly integrated 
-with that of an operator. So, we will briefly mention the basic methods 
representing the lifecycle of an operator before 
-diving into those of the `StreamTask` itself. The list is presented below in 
the order that each of the methods is called. 
-Given that an operator can have a user-defined function (*UDF*), below each of 
the operator methods we also present 
-(indented) the methods in the lifecycle of the UDF that it calls. These 
methods are available if your operator extends 
-the `AbstractUdfStreamOperator`, which is the basic class for all operators 
that execute UDFs.
 
-        // initialization phase
+ 
+
+        // 初始化阶段
         OPERATOR::setup
             UDF::setRuntimeContext
         OPERATOR::initializeState
         OPERATOR::open
             UDF::open
-
-        // processing phase (called on every element/watermark)
+        
+        // 调用处理阶段（通过每条数据或 watermark 来调用）
         OPERATOR::processElement
             UDF::run
         OPERATOR::processWatermark
         
-        // checkpointing phase (called asynchronously on every checkpoint)
+        // checkpointing 阶段（通过每个 checkpoint 异步调用）
         OPERATOR::snapshotState
-
-        // notify the operator about the end of processing records
+        
+        // 通知 operator 处理记录的过程结束
         OPERATOR::finish
-
-        // termination phase
+                

Review comment:
       ```suggestion
   
   ```

##########
File path: docs/content.zh/docs/internals/task_lifecycle.md
##########
@@ -24,181 +24,123 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+<a name='task-lifecycle'> </a>
+
 # Task 生命周期
 
-A task in Flink is the basic unit of execution. It is the place where each 
parallel instance of an
-operator is executed. As an example, an operator with a parallelism of *5* 
will have each of its
-instances executed by a separate task.
+Task 是 Flink 的基本执行单元。算子的每个并行实例都在 task 里执行。例如，一个并行度为 5 的算子，它的每个实例都由一个单独的 task 
来执行。
+
+在 Flink 流式计算引擎里，`StreamTask` 是所有不同 task 子类的基础。本文会深入讲解  `StreamTask` 
生命周期的不同阶段，并描述每个阶段的主要方法。
+
+<a name="operator-lifecycle-in-a-nutshell"> </a>
 
-The `StreamTask` is the base for all different task sub-types in Flink's 
streaming engine. This
-document goes through the different phases in the lifecycle of the 
`StreamTask` and describes the
-main methods representing each of these phases.
+## 算子生命周期简介
 
-## Operator Lifecycle in a nutshell
+因为 task 是算子并行实例的执行实体，所以它的生命周期跟算子的生命周期紧密联系在一起。因此，在深入介绍 `StreamTask` 
生命周期之前，先简要介绍一下代表算子生命周期的各个基本方法。这些方法按调用的先后顺序如下所示。考虑到算子可能是用户自定义函数(*UDF*)，在每个算子的下方也展示（以缩进的方式）了
 UDF 生命周期里调用的各个方法。如果算子继承了 `AbstractUdfStreamOperator` 
的话，这些方法都是可用的，`AbstractUdfStreamOperator` 是所有继承 UDF 算子的基类。
 
-Because the task is the entity that executes a parallel instance of an 
operator, its lifecycle is tightly integrated 
-with that of an operator. So, we will briefly mention the basic methods 
representing the lifecycle of an operator before 
-diving into those of the `StreamTask` itself. The list is presented below in 
the order that each of the methods is called. 
-Given that an operator can have a user-defined function (*UDF*), below each of 
the operator methods we also present 
-(indented) the methods in the lifecycle of the UDF that it calls. These 
methods are available if your operator extends 
-the `AbstractUdfStreamOperator`, which is the basic class for all operators 
that execute UDFs.
 
-        // initialization phase
+ 
+
+        // 初始化阶段
         OPERATOR::setup
             UDF::setRuntimeContext
         OPERATOR::initializeState
         OPERATOR::open
             UDF::open
-
-        // processing phase (called on every element/watermark)
+        
+        // 调用处理阶段（通过每条数据或 watermark 来调用）
         OPERATOR::processElement
             UDF::run
         OPERATOR::processWatermark
         
-        // checkpointing phase (called asynchronously on every checkpoint)
+        // checkpointing 阶段（通过每个 checkpoint 异步调用）
         OPERATOR::snapshotState
-
-        // notify the operator about the end of processing records
+        
+        // 通知 operator 处理记录的过程结束
         OPERATOR::finish
-
-        // termination phase
+                
+        // 结束阶段
         OPERATOR::close
             UDF::close
-    
-In a nutshell, the `setup()` is called to initialize some operator-specific 
machinery, such as its `RuntimeContext` and 
-its metric collection data-structures. After this, the `initializeState()` 
gives an operator its initial state, and the 
- `open()` method executes any operator-specific initialization, such as 
opening the user-defined function in the case of 
-the `AbstractUdfStreamOperator`. 
+
+简而言之，`setup()` 在算子初始化时被调用，比如 `RuntimeContext` 和指标收集的数据结构。在这之后，算子通过 
`initializeState()` 初始化状态，算子的所有初始化工作在 `open()` 方法中执行，比如在继承 
`AbstractUdfStreamOperator` 的情况下，初始化用户定义函数。
 
 {{< hint info >}}
-The `initializeState()` contains both the logic for initializing the 
-state of the operator during its initial execution (*e.g.* register any keyed 
state), and also the logic to retrieve its
-state from a checkpoint after a failure. More about this on the rest of this 
page.
+
+`initializeState()` 既包含状态的初始化逻辑（比如注册 keyed 状态），又包含从 checkpoint 
中恢复原有状态的逻辑。在接下来的篇幅会更详细的介绍这些。
 {{< /hint >}}
 
-Now that everything is set, the operator is ready to process incoming data. 
Incoming elements can be one of the following: 
-input elements, watermark, and checkpoint barriers. Each one of them has a 
special element for handling it. Elements are 
-processed by the `processElement()` method, watermarks by the 
`processWatermark()`, and checkpoint barriers trigger a 
-checkpoint which invokes (asynchronously) the `snapshotState()` method, which 
we describe below. For each incoming element,
-depending on its type one of the aforementioned methods is called. Note that 
the `processElement()` is also the place 
-where the UDF's logic is invoked, *e.g.* the `map()` method of your 
`MapFunction`.
-
-Finally, in the case of a normal, fault-free termination of the operator 
(*e.g.* if the stream is
-finite and its end is reached), the `finish()` method is called to perform any 
final bookkeeping
-action required by the operator's logic (*e.g.* flush any buffered data, or 
emit data to mark end of
-procesing), and the `close()` is called after that to free any resources held 
by the operator
-(*e.g.* open network connections, io streams, or native memory held by the 
operator's data).
-
-In the case of a termination due to a failure or due to manual cancellation, 
the execution jumps directly to the `close()`
-and skips any intermediate phases between the phase the operator was in when 
the failure happened and the `close()`.
-
-**Checkpoints:** The `snapshotState()` method of the operator is called 
asynchronously to the rest of the methods described 
-above whenever a checkpoint barrier is received. Checkpoints are performed 
during the processing phase, *i.e.* after the 
-operator is opened and before it is closed. The responsibility of this method 
is to store the current state of the operator 
-to the specified [state backend]({{< ref "docs/ops/state/state_backends" >}}) 
from where it will be retrieved when 
-the job resumes execution after a failure. Below we include a brief 
description of Flink's checkpointing mechanism, 
-and for a more detailed discussion on the principles around checkpointing in 
Flink please read the corresponding documentation: 
-[Data Streaming Fault Tolerance]({{< ref "docs/learn-flink/fault_tolerance" 
>}}).
-
-## Task Lifecycle
-
-Following that brief introduction on the operator's main phases, this section 
describes in more detail how a task calls 
-the respective methods during its execution on a cluster. The sequence of the 
phases described here is mainly included 
-in the `invoke()` method of the `StreamTask` class. The remainder of this 
document is split into two subsections, one 
-describing the phases during a regular, fault-free execution of a task (see 
[Normal Execution](#normal-execution)), and 
-(a shorter) one describing the different sequence followed in case the task is 
cancelled (see [Interrupted Execution](#interrupted-execution)), 
-either manually, or due some other reason, *e.g.* an exception thrown during 
execution.
-
-### Normal Execution
-
-The steps a task goes through when executed until completion without being 
interrupted are illustrated below:
-
-        TASK::setInitialState
-        TASK::invoke
-            create basic utils (config, etc) and load the chain of operators
-            setup-operators
-            task-specific-init
-            initialize-operator-states
-            open-operators
-            run
-            finish-operators
-            close-operators
-            task-specific-cleanup
-            common-cleanup
-
-As shown above, after recovering the task configuration and initializing some 
important runtime parameters, the very 
-first step for the task is to retrieve its initial, task-wide state. This is 
done in the `setInitialState()`, and it is 
-particularly important in two cases:
-
-1. when the task is recovering from a failure and restarts from the last 
successful checkpoint
-2. when resuming from a [savepoint]({{< ref "docs/ops/state/savepoints" >}}). 
-
-If it is the first time the task is executed, the initial task state is empty. 
-
-After recovering any initial state, the task goes into its `invoke()` method. 
There, it first initializes the operators 
-involved in the local computation by calling the `setup()` method of each one 
of them and then performs its task-specific 
-initialization by calling the local `init()` method. By task-specific, we mean 
that depending on the type of the task 
-(`SourceTask`, `OneInputStreamTask` or `TwoInputStreamTask`, etc), this step 
may differ, but in any case, here is where 
-the necessary task-wide resources are acquired. As an example, the 
`OneInputStreamTask`, which represents a task that 
-expects to have a single input stream, initializes the connection(s) to the 
location(s) of the different partitions of 
-the input stream that are relevant to the local task.
-
-Having acquired the necessary resources, it is time for the different 
operators and user-defined functions to acquire 
-their individual state from the task-wide state retrieved above. This is done 
in the `initializeState()` method, which 
-calls the `initializeState()` of each individual operator. This method should 
be overridden by every stateful operator 
-and should contain the state initialization logic, both for the first time a 
job is executed, and also for the case when 
-the task recovers from a failure or when using a savepoint.
-
-Now that all operators in the task have been initialized, the `open()` method 
of each individual operator is called by 
-the `openAllOperators()` method of the `StreamTask`. This method performs all 
the operational initialization, 
-such as registering any retrieved timers with the timer service. A single task 
may be executing multiple operators with one 
-consuming the output of its predecessor. In this case, the `open()` method is 
called from the last operator, *i.e.* the 
-one whose output is also the output of the task itself, to the first. This is 
done so that when the first operator starts 
-processing the task's input, all downstream operators are ready to receive its 
output.
+当所有初始化都完成之后，算子开始处理流入的数据。流入的数据可以分为三种类型：用户数据、watermark 和 checkpoint 
barriers。每种类型的数据都有单独的方法来处理。用户数据通过 `processElement()` 方法来处理，watermark 通过 
`processWatermark()` 来处理，checkpoint barriers 会触发异步执行的 `snapshotState()` 方法来进行 
checkpoint。对于每个流入的数据，根据其类型调用上述方法之一。`processElement()`方法也是用户自定义函数逻辑执行的地方，比如用户自定义 
`MapFunction` 里的  `map()` 方法。
+
+最后，在正常无失败的情况下（比如，如果流式数据是有限的，并且最后一个数据已经到达），会调用 `finish()` 
方法结束算子并进行必要的清理工作（比如刷新所有缓冲数据，或发送处理结束的标记数据）。在这之后会调用 `close()` 
方法来释放算子持有的资源（比如算子数据持有的本地内存）。
+
+在作业失败或手动取消的情况下，会略过中间所有步骤，直接跳到 `close()` 方法结束算子。

Review comment:
       ```suggestion
   在作业失败或手动取消的情况下，会略过从算子异常位置到 `close()` 中间的所有步骤，直接跳到 `close()` 方法结束算子。
   ```

##########
File path: docs/content.zh/docs/internals/task_lifecycle.md
##########
@@ -24,181 +24,123 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+<a name='task-lifecycle'> </a>
+
 # Task 生命周期
 
-A task in Flink is the basic unit of execution. It is the place where each 
parallel instance of an
-operator is executed. As an example, an operator with a parallelism of *5* 
will have each of its
-instances executed by a separate task.
+Task 是 Flink 的基本执行单元。算子的每个并行实例都在 task 里执行。例如，一个并行度为 5 的算子，它的每个实例都由一个单独的 task 
来执行。
+
+在 Flink 流式计算引擎里，`StreamTask` 是所有不同 task 子类的基础。本文会深入讲解  `StreamTask` 
生命周期的不同阶段，并描述每个阶段的主要方法。
+
+<a name="operator-lifecycle-in-a-nutshell"> </a>
 
-The `StreamTask` is the base for all different task sub-types in Flink's 
streaming engine. This
-document goes through the different phases in the lifecycle of the 
`StreamTask` and describes the
-main methods representing each of these phases.
+## 算子生命周期简介
 
-## Operator Lifecycle in a nutshell
+因为 task 是算子并行实例的执行实体，所以它的生命周期跟算子的生命周期紧密联系在一起。因此，在深入介绍 `StreamTask` 
生命周期之前，先简要介绍一下代表算子生命周期的各个基本方法。这些方法按调用的先后顺序如下所示。考虑到算子可能是用户自定义函数(*UDF*)，在每个算子的下方也展示（以缩进的方式）了
 UDF 生命周期里调用的各个方法。如果算子继承了 `AbstractUdfStreamOperator` 
的话，这些方法都是可用的，`AbstractUdfStreamOperator` 是所有继承 UDF 算子的基类。
 
-Because the task is the entity that executes a parallel instance of an 
operator, its lifecycle is tightly integrated 
-with that of an operator. So, we will briefly mention the basic methods 
representing the lifecycle of an operator before 
-diving into those of the `StreamTask` itself. The list is presented below in 
the order that each of the methods is called. 
-Given that an operator can have a user-defined function (*UDF*), below each of 
the operator methods we also present 
-(indented) the methods in the lifecycle of the UDF that it calls. These 
methods are available if your operator extends 
-the `AbstractUdfStreamOperator`, which is the basic class for all operators 
that execute UDFs.
 
-        // initialization phase
+ 
+
+        // 初始化阶段
         OPERATOR::setup
             UDF::setRuntimeContext
         OPERATOR::initializeState
         OPERATOR::open
             UDF::open
-
-        // processing phase (called on every element/watermark)
+        

Review comment:
       ```suggestion
   
   ```

##########
File path: docs/content.zh/docs/internals/task_lifecycle.md
##########
@@ -24,181 +24,123 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+<a name='task-lifecycle'> </a>
+
 # Task 生命周期
 
-A task in Flink is the basic unit of execution. It is the place where each 
parallel instance of an
-operator is executed. As an example, an operator with a parallelism of *5* 
will have each of its
-instances executed by a separate task.
+Task 是 Flink 的基本执行单元。算子的每个并行实例都在 task 里执行。例如，一个并行度为 5 的算子，它的每个实例都由一个单独的 task 
来执行。
+
+在 Flink 流式计算引擎里，`StreamTask` 是所有不同 task 子类的基础。本文会深入讲解  `StreamTask` 
生命周期的不同阶段，并描述每个阶段的主要方法。
+
+<a name="operator-lifecycle-in-a-nutshell"> </a>
 
-The `StreamTask` is the base for all different task sub-types in Flink's 
streaming engine. This
-document goes through the different phases in the lifecycle of the 
`StreamTask` and describes the
-main methods representing each of these phases.
+## 算子生命周期简介
 
-## Operator Lifecycle in a nutshell
+因为 task 是算子并行实例的执行实体，所以它的生命周期跟算子的生命周期紧密联系在一起。因此，在深入介绍 `StreamTask` 
生命周期之前，先简要介绍一下代表算子生命周期的各个基本方法。这些方法按调用的先后顺序如下所示。考虑到算子可能是用户自定义函数(*UDF*)，在每个算子的下方也展示（以缩进的方式）了
 UDF 生命周期里调用的各个方法。如果算子继承了 `AbstractUdfStreamOperator` 
的话，这些方法都是可用的，`AbstractUdfStreamOperator` 是所有继承 UDF 算子的基类。

Review comment:
       ```suggestion
   因为 task 是算子并行实例的执行实体，所以它的生命周期跟算子的生命周期紧密联系在一起。因此，在深入介绍 `StreamTask` 
生命周期之前，先简要介绍一下代表算子生命周期的基本方法。这些方法按调用的先后顺序如下所示。考虑到算子可能是用户自定义函数（*UDF*），因此我们在每个算子下也展示（以缩进的方式）了
 UDF 生命周期中调用的各个方法。`AbstractUdfStreamOperator` 是所有执行 UDF 的算子的基类，如果算子继承了 
`AbstractUdfStreamOperator`，那么这些方法都是可用的。
   ```

##########
File path: docs/content.zh/docs/internals/task_lifecycle.md
##########
@@ -24,181 +24,123 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+<a name='task-lifecycle'> </a>
+

Review comment:
       Maybe we should not add tag `a` . The new documetation engine has 
already updated into `hugo` .
   

##########
File path: docs/content.zh/docs/internals/task_lifecycle.md
##########
@@ -24,181 +24,123 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+<a name='task-lifecycle'> </a>
+
 # Task 生命周期
 
-A task in Flink is the basic unit of execution. It is the place where each 
parallel instance of an
-operator is executed. As an example, an operator with a parallelism of *5* 
will have each of its
-instances executed by a separate task.
+Task 是 Flink 的基本执行单元。算子的每个并行实例都在 task 里执行。例如，一个并行度为 5 的算子，它的每个实例都由一个单独的 task 
来执行。
+
+在 Flink 流式计算引擎里，`StreamTask` 是所有不同 task 子类的基础。本文会深入讲解  `StreamTask` 
生命周期的不同阶段，并描述每个阶段的主要方法。

Review comment:
       What about ``` `StreamTask` 是 Flink 流式计算引擎中所有不同子类型 task 的基础。本文会深入讲解  
`StreamTask` 生命周期的不同阶段，并阐述每个阶段的主要方法。```
   It's just a minor comments. Maybe we could translate it in a better way.

##########
File path: docs/content.zh/docs/internals/task_lifecycle.md
##########
@@ -24,181 +24,123 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+<a name='task-lifecycle'> </a>
+
 # Task 生命周期
 
-A task in Flink is the basic unit of execution. It is the place where each 
parallel instance of an
-operator is executed. As an example, an operator with a parallelism of *5* 
will have each of its
-instances executed by a separate task.
+Task 是 Flink 的基本执行单元。算子的每个并行实例都在 task 里执行。例如，一个并行度为 5 的算子，它的每个实例都由一个单独的 task 
来执行。
+
+在 Flink 流式计算引擎里，`StreamTask` 是所有不同 task 子类的基础。本文会深入讲解  `StreamTask` 
生命周期的不同阶段，并描述每个阶段的主要方法。
+
+<a name="operator-lifecycle-in-a-nutshell"> </a>
 
-The `StreamTask` is the base for all different task sub-types in Flink's 
streaming engine. This
-document goes through the different phases in the lifecycle of the 
`StreamTask` and describes the
-main methods representing each of these phases.
+## 算子生命周期简介
 
-## Operator Lifecycle in a nutshell
+因为 task 是算子并行实例的执行实体，所以它的生命周期跟算子的生命周期紧密联系在一起。因此，在深入介绍 `StreamTask` 
生命周期之前，先简要介绍一下代表算子生命周期的各个基本方法。这些方法按调用的先后顺序如下所示。考虑到算子可能是用户自定义函数(*UDF*)，在每个算子的下方也展示（以缩进的方式）了
 UDF 生命周期里调用的各个方法。如果算子继承了 `AbstractUdfStreamOperator` 
的话，这些方法都是可用的，`AbstractUdfStreamOperator` 是所有继承 UDF 算子的基类。
 
-Because the task is the entity that executes a parallel instance of an 
operator, its lifecycle is tightly integrated 
-with that of an operator. So, we will briefly mention the basic methods 
representing the lifecycle of an operator before 
-diving into those of the `StreamTask` itself. The list is presented below in 
the order that each of the methods is called. 
-Given that an operator can have a user-defined function (*UDF*), below each of 
the operator methods we also present 
-(indented) the methods in the lifecycle of the UDF that it calls. These 
methods are available if your operator extends 
-the `AbstractUdfStreamOperator`, which is the basic class for all operators 
that execute UDFs.
 
-        // initialization phase
+ 
+
+        // 初始化阶段
         OPERATOR::setup
             UDF::setRuntimeContext
         OPERATOR::initializeState
         OPERATOR::open
             UDF::open
-
-        // processing phase (called on every element/watermark)
+        
+        // 调用处理阶段（通过每条数据或 watermark 来调用）

Review comment:
       ```suggestion
           // 处理阶段（对每个 element 或 watermark 调用）
   ```
   Please let me know your opinions.

##########
File path: docs/content.zh/docs/internals/task_lifecycle.md
##########
@@ -24,181 +24,123 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+<a name='task-lifecycle'> </a>
+
 # Task 生命周期
 
-A task in Flink is the basic unit of execution. It is the place where each 
parallel instance of an
-operator is executed. As an example, an operator with a parallelism of *5* 
will have each of its
-instances executed by a separate task.
+Task 是 Flink 的基本执行单元。算子的每个并行实例都在 task 里执行。例如，一个并行度为 5 的算子，它的每个实例都由一个单独的 task 
来执行。
+
+在 Flink 流式计算引擎里，`StreamTask` 是所有不同 task 子类的基础。本文会深入讲解  `StreamTask` 
生命周期的不同阶段，并描述每个阶段的主要方法。
+
+<a name="operator-lifecycle-in-a-nutshell"> </a>
 
-The `StreamTask` is the base for all different task sub-types in Flink's 
streaming engine. This
-document goes through the different phases in the lifecycle of the 
`StreamTask` and describes the
-main methods representing each of these phases.
+## 算子生命周期简介
 
-## Operator Lifecycle in a nutshell
+因为 task 是算子并行实例的执行实体，所以它的生命周期跟算子的生命周期紧密联系在一起。因此，在深入介绍 `StreamTask` 
生命周期之前，先简要介绍一下代表算子生命周期的各个基本方法。这些方法按调用的先后顺序如下所示。考虑到算子可能是用户自定义函数(*UDF*)，在每个算子的下方也展示（以缩进的方式）了
 UDF 生命周期里调用的各个方法。如果算子继承了 `AbstractUdfStreamOperator` 
的话，这些方法都是可用的，`AbstractUdfStreamOperator` 是所有继承 UDF 算子的基类。
 
-Because the task is the entity that executes a parallel instance of an 
operator, its lifecycle is tightly integrated 
-with that of an operator. So, we will briefly mention the basic methods 
representing the lifecycle of an operator before 
-diving into those of the `StreamTask` itself. The list is presented below in 
the order that each of the methods is called. 
-Given that an operator can have a user-defined function (*UDF*), below each of 
the operator methods we also present 
-(indented) the methods in the lifecycle of the UDF that it calls. These 
methods are available if your operator extends 
-the `AbstractUdfStreamOperator`, which is the basic class for all operators 
that execute UDFs.
 
-        // initialization phase
+ 
+
+        // 初始化阶段
         OPERATOR::setup
             UDF::setRuntimeContext
         OPERATOR::initializeState
         OPERATOR::open
             UDF::open
-
-        // processing phase (called on every element/watermark)
+        
+        // 调用处理阶段（通过每条数据或 watermark 来调用）
         OPERATOR::processElement
             UDF::run
         OPERATOR::processWatermark
         
-        // checkpointing phase (called asynchronously on every checkpoint)
+        // checkpointing 阶段（通过每个 checkpoint 异步调用）
         OPERATOR::snapshotState
-
-        // notify the operator about the end of processing records
+        
+        // 通知 operator 处理记录的过程结束
         OPERATOR::finish
-
-        // termination phase
+                
+        // 结束阶段
         OPERATOR::close
             UDF::close
-    
-In a nutshell, the `setup()` is called to initialize some operator-specific 
machinery, such as its `RuntimeContext` and 
-its metric collection data-structures. After this, the `initializeState()` 
gives an operator its initial state, and the 
- `open()` method executes any operator-specific initialization, such as 
opening the user-defined function in the case of 
-the `AbstractUdfStreamOperator`. 
+
+简而言之，`setup()` 在算子初始化时被调用，比如 `RuntimeContext` 和指标收集的数据结构。在这之后，算子通过 
`initializeState()` 初始化状态，算子的所有初始化工作在 `open()` 方法中执行，比如在继承 
`AbstractUdfStreamOperator` 的情况下，初始化用户定义函数。
 
 {{< hint info >}}
-The `initializeState()` contains both the logic for initializing the 
-state of the operator during its initial execution (*e.g.* register any keyed 
state), and also the logic to retrieve its
-state from a checkpoint after a failure. More about this on the rest of this 
page.
+
+`initializeState()` 既包含状态的初始化逻辑（比如注册 keyed 状态），又包含从 checkpoint 
中恢复原有状态的逻辑。在接下来的篇幅会更详细的介绍这些。
 {{< /hint >}}
 
-Now that everything is set, the operator is ready to process incoming data. 
Incoming elements can be one of the following: 
-input elements, watermark, and checkpoint barriers. Each one of them has a 
special element for handling it. Elements are 
-processed by the `processElement()` method, watermarks by the 
`processWatermark()`, and checkpoint barriers trigger a 
-checkpoint which invokes (asynchronously) the `snapshotState()` method, which 
we describe below. For each incoming element,
-depending on its type one of the aforementioned methods is called. Note that 
the `processElement()` is also the place 
-where the UDF's logic is invoked, *e.g.* the `map()` method of your 
`MapFunction`.
-
-Finally, in the case of a normal, fault-free termination of the operator 
(*e.g.* if the stream is
-finite and its end is reached), the `finish()` method is called to perform any 
final bookkeeping
-action required by the operator's logic (*e.g.* flush any buffered data, or 
emit data to mark end of
-procesing), and the `close()` is called after that to free any resources held 
by the operator
-(*e.g.* open network connections, io streams, or native memory held by the 
operator's data).
-
-In the case of a termination due to a failure or due to manual cancellation, 
the execution jumps directly to the `close()`
-and skips any intermediate phases between the phase the operator was in when 
the failure happened and the `close()`.
-
-**Checkpoints:** The `snapshotState()` method of the operator is called 
asynchronously to the rest of the methods described 
-above whenever a checkpoint barrier is received. Checkpoints are performed 
during the processing phase, *i.e.* after the 
-operator is opened and before it is closed. The responsibility of this method 
is to store the current state of the operator 
-to the specified [state backend]({{< ref "docs/ops/state/state_backends" >}}) 
from where it will be retrieved when 
-the job resumes execution after a failure. Below we include a brief 
description of Flink's checkpointing mechanism, 
-and for a more detailed discussion on the principles around checkpointing in 
Flink please read the corresponding documentation: 
-[Data Streaming Fault Tolerance]({{< ref "docs/learn-flink/fault_tolerance" 
>}}).
-
-## Task Lifecycle
-
-Following that brief introduction on the operator's main phases, this section 
describes in more detail how a task calls 
-the respective methods during its execution on a cluster. The sequence of the 
phases described here is mainly included 
-in the `invoke()` method of the `StreamTask` class. The remainder of this 
document is split into two subsections, one 
-describing the phases during a regular, fault-free execution of a task (see 
[Normal Execution](#normal-execution)), and 
-(a shorter) one describing the different sequence followed in case the task is 
cancelled (see [Interrupted Execution](#interrupted-execution)), 
-either manually, or due some other reason, *e.g.* an exception thrown during 
execution.
-
-### Normal Execution
-
-The steps a task goes through when executed until completion without being 
interrupted are illustrated below:
-
-        TASK::setInitialState
-        TASK::invoke
-            create basic utils (config, etc) and load the chain of operators
-            setup-operators
-            task-specific-init
-            initialize-operator-states
-            open-operators
-            run
-            finish-operators
-            close-operators
-            task-specific-cleanup
-            common-cleanup
-
-As shown above, after recovering the task configuration and initializing some 
important runtime parameters, the very 
-first step for the task is to retrieve its initial, task-wide state. This is 
done in the `setInitialState()`, and it is 
-particularly important in two cases:
-
-1. when the task is recovering from a failure and restarts from the last 
successful checkpoint
-2. when resuming from a [savepoint]({{< ref "docs/ops/state/savepoints" >}}). 
-
-If it is the first time the task is executed, the initial task state is empty. 
-
-After recovering any initial state, the task goes into its `invoke()` method. 
There, it first initializes the operators 
-involved in the local computation by calling the `setup()` method of each one 
of them and then performs its task-specific 
-initialization by calling the local `init()` method. By task-specific, we mean 
that depending on the type of the task 
-(`SourceTask`, `OneInputStreamTask` or `TwoInputStreamTask`, etc), this step 
may differ, but in any case, here is where 
-the necessary task-wide resources are acquired. As an example, the 
`OneInputStreamTask`, which represents a task that 
-expects to have a single input stream, initializes the connection(s) to the 
location(s) of the different partitions of 
-the input stream that are relevant to the local task.
-
-Having acquired the necessary resources, it is time for the different 
operators and user-defined functions to acquire 
-their individual state from the task-wide state retrieved above. This is done 
in the `initializeState()` method, which 
-calls the `initializeState()` of each individual operator. This method should 
be overridden by every stateful operator 
-and should contain the state initialization logic, both for the first time a 
job is executed, and also for the case when 
-the task recovers from a failure or when using a savepoint.
-
-Now that all operators in the task have been initialized, the `open()` method 
of each individual operator is called by 
-the `openAllOperators()` method of the `StreamTask`. This method performs all 
the operational initialization, 
-such as registering any retrieved timers with the timer service. A single task 
may be executing multiple operators with one 
-consuming the output of its predecessor. In this case, the `open()` method is 
called from the last operator, *i.e.* the 
-one whose output is also the output of the task itself, to the first. This is 
done so that when the first operator starts 
-processing the task's input, all downstream operators are ready to receive its 
output.
+当所有初始化都完成之后，算子开始处理流入的数据。流入的数据可以分为三种类型：用户数据、watermark 和 checkpoint 
barriers。每种类型的数据都有单独的方法来处理。用户数据通过 `processElement()` 方法来处理，watermark 通过 
`processWatermark()` 来处理，checkpoint barriers 会触发异步执行的 `snapshotState()` 方法来进行 
checkpoint。对于每个流入的数据，根据其类型调用上述方法之一。`processElement()`方法也是用户自定义函数逻辑执行的地方，比如用户自定义 
`MapFunction` 里的  `map()` 方法。
+
+最后，在正常无失败的情况下（比如，如果流式数据是有限的，并且最后一个数据已经到达），会调用 `finish()` 
方法结束算子并进行必要的清理工作（比如刷新所有缓冲数据，或发送处理结束的标记数据）。在这之后会调用 `close()` 
方法来释放算子持有的资源（比如算子数据持有的本地内存）。

Review comment:
       ```suggestion
   最后，在算子正常无故障的情况下（比如，如果流式数据是有限的，并且最后一个数据已经到达），会调用 `finish()` 
方法结束算子并进行必要的清理工作（比如刷新所有缓冲数据，或发送处理结束的标记数据）。在这之后会调用 `close()` 
方法来释放算子持有的资源（比如算子数据持有的本地内存）。
   ```

##########
File path: docs/content.zh/docs/internals/task_lifecycle.md
##########
@@ -24,181 +24,123 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+<a name='task-lifecycle'> </a>
+
 # Task 生命周期
 
-A task in Flink is the basic unit of execution. It is the place where each 
parallel instance of an
-operator is executed. As an example, an operator with a parallelism of *5* 
will have each of its
-instances executed by a separate task.
+Task 是 Flink 的基本执行单元。算子的每个并行实例都在 task 里执行。例如，一个并行度为 5 的算子，它的每个实例都由一个单独的 task 
来执行。
+
+在 Flink 流式计算引擎里，`StreamTask` 是所有不同 task 子类的基础。本文会深入讲解  `StreamTask` 
生命周期的不同阶段，并描述每个阶段的主要方法。
+
+<a name="operator-lifecycle-in-a-nutshell"> </a>
 
-The `StreamTask` is the base for all different task sub-types in Flink's 
streaming engine. This
-document goes through the different phases in the lifecycle of the 
`StreamTask` and describes the
-main methods representing each of these phases.
+## 算子生命周期简介
 
-## Operator Lifecycle in a nutshell
+因为 task 是算子并行实例的执行实体，所以它的生命周期跟算子的生命周期紧密联系在一起。因此，在深入介绍 `StreamTask` 
生命周期之前，先简要介绍一下代表算子生命周期的各个基本方法。这些方法按调用的先后顺序如下所示。考虑到算子可能是用户自定义函数(*UDF*)，在每个算子的下方也展示（以缩进的方式）了
 UDF 生命周期里调用的各个方法。如果算子继承了 `AbstractUdfStreamOperator` 
的话，这些方法都是可用的，`AbstractUdfStreamOperator` 是所有继承 UDF 算子的基类。
 
-Because the task is the entity that executes a parallel instance of an 
operator, its lifecycle is tightly integrated 
-with that of an operator. So, we will briefly mention the basic methods 
representing the lifecycle of an operator before 
-diving into those of the `StreamTask` itself. The list is presented below in 
the order that each of the methods is called. 
-Given that an operator can have a user-defined function (*UDF*), below each of 
the operator methods we also present 
-(indented) the methods in the lifecycle of the UDF that it calls. These 
methods are available if your operator extends 
-the `AbstractUdfStreamOperator`, which is the basic class for all operators 
that execute UDFs.
 
-        // initialization phase
+ 
+
+        // 初始化阶段
         OPERATOR::setup
             UDF::setRuntimeContext
         OPERATOR::initializeState
         OPERATOR::open
             UDF::open
-
-        // processing phase (called on every element/watermark)
+        
+        // 调用处理阶段（通过每条数据或 watermark 来调用）
         OPERATOR::processElement
             UDF::run
         OPERATOR::processWatermark
         
-        // checkpointing phase (called asynchronously on every checkpoint)
+        // checkpointing 阶段（通过每个 checkpoint 异步调用）
         OPERATOR::snapshotState
-
-        // notify the operator about the end of processing records
+        
+        // 通知 operator 处理记录的过程结束
         OPERATOR::finish
-
-        // termination phase
+                
+        // 结束阶段
         OPERATOR::close
             UDF::close
-    
-In a nutshell, the `setup()` is called to initialize some operator-specific 
machinery, such as its `RuntimeContext` and 
-its metric collection data-structures. After this, the `initializeState()` 
gives an operator its initial state, and the 
- `open()` method executes any operator-specific initialization, such as 
opening the user-defined function in the case of 
-the `AbstractUdfStreamOperator`. 
+
+简而言之，`setup()` 在算子初始化时被调用，比如 `RuntimeContext` 和指标收集的数据结构。在这之后，算子通过 
`initializeState()` 初始化状态，算子的所有初始化工作在 `open()` 方法中执行，比如在继承 
`AbstractUdfStreamOperator` 的情况下，初始化用户定义函数。
 
 {{< hint info >}}
-The `initializeState()` contains both the logic for initializing the 
-state of the operator during its initial execution (*e.g.* register any keyed 
state), and also the logic to retrieve its
-state from a checkpoint after a failure. More about this on the rest of this 
page.
+

Review comment:
       ```suggestion
   ```

##########
File path: docs/content.zh/docs/internals/task_lifecycle.md
##########
@@ -24,181 +24,123 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+<a name='task-lifecycle'> </a>
+
 # Task 生命周期
 
-A task in Flink is the basic unit of execution. It is the place where each 
parallel instance of an
-operator is executed. As an example, an operator with a parallelism of *5* 
will have each of its
-instances executed by a separate task.
+Task 是 Flink 的基本执行单元。算子的每个并行实例都在 task 里执行。例如，一个并行度为 5 的算子，它的每个实例都由一个单独的 task 
来执行。
+
+在 Flink 流式计算引擎里，`StreamTask` 是所有不同 task 子类的基础。本文会深入讲解  `StreamTask` 
生命周期的不同阶段，并描述每个阶段的主要方法。
+
+<a name="operator-lifecycle-in-a-nutshell"> </a>
 
-The `StreamTask` is the base for all different task sub-types in Flink's 
streaming engine. This
-document goes through the different phases in the lifecycle of the 
`StreamTask` and describes the
-main methods representing each of these phases.
+## 算子生命周期简介
 
-## Operator Lifecycle in a nutshell
+因为 task 是算子并行实例的执行实体，所以它的生命周期跟算子的生命周期紧密联系在一起。因此，在深入介绍 `StreamTask` 
生命周期之前，先简要介绍一下代表算子生命周期的各个基本方法。这些方法按调用的先后顺序如下所示。考虑到算子可能是用户自定义函数(*UDF*)，在每个算子的下方也展示（以缩进的方式）了
 UDF 生命周期里调用的各个方法。如果算子继承了 `AbstractUdfStreamOperator` 
的话，这些方法都是可用的，`AbstractUdfStreamOperator` 是所有继承 UDF 算子的基类。
 
-Because the task is the entity that executes a parallel instance of an 
operator, its lifecycle is tightly integrated 
-with that of an operator. So, we will briefly mention the basic methods 
representing the lifecycle of an operator before 
-diving into those of the `StreamTask` itself. The list is presented below in 
the order that each of the methods is called. 
-Given that an operator can have a user-defined function (*UDF*), below each of 
the operator methods we also present 
-(indented) the methods in the lifecycle of the UDF that it calls. These 
methods are available if your operator extends 
-the `AbstractUdfStreamOperator`, which is the basic class for all operators 
that execute UDFs.
 
-        // initialization phase
+ 
+
+        // 初始化阶段
         OPERATOR::setup
             UDF::setRuntimeContext
         OPERATOR::initializeState
         OPERATOR::open
             UDF::open
-
-        // processing phase (called on every element/watermark)
+        
+        // 调用处理阶段（通过每条数据或 watermark 来调用）
         OPERATOR::processElement
             UDF::run
         OPERATOR::processWatermark
         
-        // checkpointing phase (called asynchronously on every checkpoint)
+        // checkpointing 阶段（通过每个 checkpoint 异步调用）
         OPERATOR::snapshotState
-
-        // notify the operator about the end of processing records
+        
+        // 通知 operator 处理记录的过程结束
         OPERATOR::finish
-
-        // termination phase
+                
+        // 结束阶段
         OPERATOR::close
             UDF::close
-    
-In a nutshell, the `setup()` is called to initialize some operator-specific 
machinery, such as its `RuntimeContext` and 
-its metric collection data-structures. After this, the `initializeState()` 
gives an operator its initial state, and the 
- `open()` method executes any operator-specific initialization, such as 
opening the user-defined function in the case of 
-the `AbstractUdfStreamOperator`. 
+
+简而言之，`setup()` 在算子初始化时被调用，比如 `RuntimeContext` 和指标收集的数据结构。在这之后，算子通过 
`initializeState()` 初始化状态，算子的所有初始化工作在 `open()` 方法中执行，比如在继承 
`AbstractUdfStreamOperator` 的情况下，初始化用户定义函数。
 
 {{< hint info >}}
-The `initializeState()` contains both the logic for initializing the 
-state of the operator during its initial execution (*e.g.* register any keyed 
state), and also the logic to retrieve its
-state from a checkpoint after a failure. More about this on the rest of this 
page.
+
+`initializeState()` 既包含状态的初始化逻辑（比如注册 keyed 状态），又包含从 checkpoint 
中恢复原有状态的逻辑。在接下来的篇幅会更详细的介绍这些。

Review comment:
       ```suggestion
   `initializeState()` 既包含在初始化过程中算子状态的初始化逻辑（比如注册 keyed 状态），又包含异常后从 checkpoint 
中恢复原有状态的逻辑。在接下来的篇幅会进行更详细的介绍。
   ```

##########
File path: docs/content.zh/docs/internals/task_lifecycle.md
##########
@@ -24,181 +24,123 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+<a name='task-lifecycle'> </a>
+
 # Task 生命周期
 
-A task in Flink is the basic unit of execution. It is the place where each 
parallel instance of an
-operator is executed. As an example, an operator with a parallelism of *5* 
will have each of its
-instances executed by a separate task.
+Task 是 Flink 的基本执行单元。算子的每个并行实例都在 task 里执行。例如，一个并行度为 5 的算子，它的每个实例都由一个单独的 task 
来执行。
+
+在 Flink 流式计算引擎里，`StreamTask` 是所有不同 task 子类的基础。本文会深入讲解  `StreamTask` 
生命周期的不同阶段，并描述每个阶段的主要方法。
+
+<a name="operator-lifecycle-in-a-nutshell"> </a>
 
-The `StreamTask` is the base for all different task sub-types in Flink's 
streaming engine. This
-document goes through the different phases in the lifecycle of the 
`StreamTask` and describes the
-main methods representing each of these phases.
+## 算子生命周期简介
 
-## Operator Lifecycle in a nutshell
+因为 task 是算子并行实例的执行实体，所以它的生命周期跟算子的生命周期紧密联系在一起。因此，在深入介绍 `StreamTask` 
生命周期之前，先简要介绍一下代表算子生命周期的各个基本方法。这些方法按调用的先后顺序如下所示。考虑到算子可能是用户自定义函数(*UDF*)，在每个算子的下方也展示（以缩进的方式）了
 UDF 生命周期里调用的各个方法。如果算子继承了 `AbstractUdfStreamOperator` 
的话，这些方法都是可用的，`AbstractUdfStreamOperator` 是所有继承 UDF 算子的基类。
 
-Because the task is the entity that executes a parallel instance of an 
operator, its lifecycle is tightly integrated 
-with that of an operator. So, we will briefly mention the basic methods 
representing the lifecycle of an operator before 
-diving into those of the `StreamTask` itself. The list is presented below in 
the order that each of the methods is called. 
-Given that an operator can have a user-defined function (*UDF*), below each of 
the operator methods we also present 
-(indented) the methods in the lifecycle of the UDF that it calls. These 
methods are available if your operator extends 
-the `AbstractUdfStreamOperator`, which is the basic class for all operators 
that execute UDFs.
 
-        // initialization phase
+ 
+
+        // 初始化阶段
         OPERATOR::setup
             UDF::setRuntimeContext
         OPERATOR::initializeState
         OPERATOR::open
             UDF::open
-
-        // processing phase (called on every element/watermark)
+        
+        // 调用处理阶段（通过每条数据或 watermark 来调用）
         OPERATOR::processElement
             UDF::run
         OPERATOR::processWatermark
         
-        // checkpointing phase (called asynchronously on every checkpoint)
+        // checkpointing 阶段（通过每个 checkpoint 异步调用）
         OPERATOR::snapshotState
-
-        // notify the operator about the end of processing records
+        

Review comment:
       ```suggestion
   
   ```

##########
File path: docs/content.zh/docs/internals/task_lifecycle.md
##########
@@ -24,181 +24,123 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+<a name='task-lifecycle'> </a>
+
 # Task 生命周期
 
-A task in Flink is the basic unit of execution. It is the place where each 
parallel instance of an
-operator is executed. As an example, an operator with a parallelism of *5* 
will have each of its
-instances executed by a separate task.
+Task 是 Flink 的基本执行单元。算子的每个并行实例都在 task 里执行。例如，一个并行度为 5 的算子，它的每个实例都由一个单独的 task 
来执行。
+
+在 Flink 流式计算引擎里，`StreamTask` 是所有不同 task 子类的基础。本文会深入讲解  `StreamTask` 
生命周期的不同阶段，并描述每个阶段的主要方法。
+
+<a name="operator-lifecycle-in-a-nutshell"> </a>
 
-The `StreamTask` is the base for all different task sub-types in Flink's 
streaming engine. This
-document goes through the different phases in the lifecycle of the 
`StreamTask` and describes the
-main methods representing each of these phases.
+## 算子生命周期简介
 
-## Operator Lifecycle in a nutshell
+因为 task 是算子并行实例的执行实体，所以它的生命周期跟算子的生命周期紧密联系在一起。因此，在深入介绍 `StreamTask` 
生命周期之前，先简要介绍一下代表算子生命周期的各个基本方法。这些方法按调用的先后顺序如下所示。考虑到算子可能是用户自定义函数(*UDF*)，在每个算子的下方也展示（以缩进的方式）了
 UDF 生命周期里调用的各个方法。如果算子继承了 `AbstractUdfStreamOperator` 
的话，这些方法都是可用的，`AbstractUdfStreamOperator` 是所有继承 UDF 算子的基类。
 
-Because the task is the entity that executes a parallel instance of an 
operator, its lifecycle is tightly integrated 
-with that of an operator. So, we will briefly mention the basic methods 
representing the lifecycle of an operator before 
-diving into those of the `StreamTask` itself. The list is presented below in 
the order that each of the methods is called. 
-Given that an operator can have a user-defined function (*UDF*), below each of 
the operator methods we also present 
-(indented) the methods in the lifecycle of the UDF that it calls. These 
methods are available if your operator extends 
-the `AbstractUdfStreamOperator`, which is the basic class for all operators 
that execute UDFs.
 
-        // initialization phase
+ 
+
+        // 初始化阶段
         OPERATOR::setup
             UDF::setRuntimeContext
         OPERATOR::initializeState
         OPERATOR::open
             UDF::open
-
-        // processing phase (called on every element/watermark)
+        
+        // 调用处理阶段（通过每条数据或 watermark 来调用）
         OPERATOR::processElement
             UDF::run
         OPERATOR::processWatermark
         
-        // checkpointing phase (called asynchronously on every checkpoint)
+        // checkpointing 阶段（通过每个 checkpoint 异步调用）
         OPERATOR::snapshotState
-
-        // notify the operator about the end of processing records
+        
+        // 通知 operator 处理记录的过程结束
         OPERATOR::finish
-
-        // termination phase
+                
+        // 结束阶段
         OPERATOR::close
             UDF::close
-    
-In a nutshell, the `setup()` is called to initialize some operator-specific 
machinery, such as its `RuntimeContext` and 
-its metric collection data-structures. After this, the `initializeState()` 
gives an operator its initial state, and the 
- `open()` method executes any operator-specific initialization, such as 
opening the user-defined function in the case of 
-the `AbstractUdfStreamOperator`. 
+
+简而言之，`setup()` 在算子初始化时被调用，比如 `RuntimeContext` 和指标收集的数据结构。在这之后，算子通过 
`initializeState()` 初始化状态，算子的所有初始化工作在 `open()` 方法中执行，比如在继承 
`AbstractUdfStreamOperator` 的情况下，初始化用户定义函数。
 
 {{< hint info >}}
-The `initializeState()` contains both the logic for initializing the 
-state of the operator during its initial execution (*e.g.* register any keyed 
state), and also the logic to retrieve its
-state from a checkpoint after a failure. More about this on the rest of this 
page.
+
+`initializeState()` 既包含状态的初始化逻辑（比如注册 keyed 状态），又包含从 checkpoint 
中恢复原有状态的逻辑。在接下来的篇幅会更详细的介绍这些。
 {{< /hint >}}
 
-Now that everything is set, the operator is ready to process incoming data. 
Incoming elements can be one of the following: 
-input elements, watermark, and checkpoint barriers. Each one of them has a 
special element for handling it. Elements are 
-processed by the `processElement()` method, watermarks by the 
`processWatermark()`, and checkpoint barriers trigger a 
-checkpoint which invokes (asynchronously) the `snapshotState()` method, which 
we describe below. For each incoming element,
-depending on its type one of the aforementioned methods is called. Note that 
the `processElement()` is also the place 
-where the UDF's logic is invoked, *e.g.* the `map()` method of your 
`MapFunction`.
-
-Finally, in the case of a normal, fault-free termination of the operator 
(*e.g.* if the stream is
-finite and its end is reached), the `finish()` method is called to perform any 
final bookkeeping
-action required by the operator's logic (*e.g.* flush any buffered data, or 
emit data to mark end of
-procesing), and the `close()` is called after that to free any resources held 
by the operator
-(*e.g.* open network connections, io streams, or native memory held by the 
operator's data).
-
-In the case of a termination due to a failure or due to manual cancellation, 
the execution jumps directly to the `close()`
-and skips any intermediate phases between the phase the operator was in when 
the failure happened and the `close()`.
-
-**Checkpoints:** The `snapshotState()` method of the operator is called 
asynchronously to the rest of the methods described 
-above whenever a checkpoint barrier is received. Checkpoints are performed 
during the processing phase, *i.e.* after the 
-operator is opened and before it is closed. The responsibility of this method 
is to store the current state of the operator 
-to the specified [state backend]({{< ref "docs/ops/state/state_backends" >}}) 
from where it will be retrieved when 
-the job resumes execution after a failure. Below we include a brief 
description of Flink's checkpointing mechanism, 
-and for a more detailed discussion on the principles around checkpointing in 
Flink please read the corresponding documentation: 
-[Data Streaming Fault Tolerance]({{< ref "docs/learn-flink/fault_tolerance" 
>}}).
-
-## Task Lifecycle
-
-Following that brief introduction on the operator's main phases, this section 
describes in more detail how a task calls 
-the respective methods during its execution on a cluster. The sequence of the 
phases described here is mainly included 
-in the `invoke()` method of the `StreamTask` class. The remainder of this 
document is split into two subsections, one 
-describing the phases during a regular, fault-free execution of a task (see 
[Normal Execution](#normal-execution)), and 
-(a shorter) one describing the different sequence followed in case the task is 
cancelled (see [Interrupted Execution](#interrupted-execution)), 
-either manually, or due some other reason, *e.g.* an exception thrown during 
execution.
-
-### Normal Execution
-
-The steps a task goes through when executed until completion without being 
interrupted are illustrated below:
-
-        TASK::setInitialState
-        TASK::invoke
-            create basic utils (config, etc) and load the chain of operators
-            setup-operators
-            task-specific-init
-            initialize-operator-states
-            open-operators
-            run
-            finish-operators
-            close-operators
-            task-specific-cleanup
-            common-cleanup
-
-As shown above, after recovering the task configuration and initializing some 
important runtime parameters, the very 
-first step for the task is to retrieve its initial, task-wide state. This is 
done in the `setInitialState()`, and it is 
-particularly important in two cases:
-
-1. when the task is recovering from a failure and restarts from the last 
successful checkpoint
-2. when resuming from a [savepoint]({{< ref "docs/ops/state/savepoints" >}}). 
-
-If it is the first time the task is executed, the initial task state is empty. 
-
-After recovering any initial state, the task goes into its `invoke()` method. 
There, it first initializes the operators 
-involved in the local computation by calling the `setup()` method of each one 
of them and then performs its task-specific 
-initialization by calling the local `init()` method. By task-specific, we mean 
that depending on the type of the task 
-(`SourceTask`, `OneInputStreamTask` or `TwoInputStreamTask`, etc), this step 
may differ, but in any case, here is where 
-the necessary task-wide resources are acquired. As an example, the 
`OneInputStreamTask`, which represents a task that 
-expects to have a single input stream, initializes the connection(s) to the 
location(s) of the different partitions of 
-the input stream that are relevant to the local task.
-
-Having acquired the necessary resources, it is time for the different 
operators and user-defined functions to acquire 
-their individual state from the task-wide state retrieved above. This is done 
in the `initializeState()` method, which 
-calls the `initializeState()` of each individual operator. This method should 
be overridden by every stateful operator 
-and should contain the state initialization logic, both for the first time a 
job is executed, and also for the case when 
-the task recovers from a failure or when using a savepoint.
-
-Now that all operators in the task have been initialized, the `open()` method 
of each individual operator is called by 
-the `openAllOperators()` method of the `StreamTask`. This method performs all 
the operational initialization, 
-such as registering any retrieved timers with the timer service. A single task 
may be executing multiple operators with one 
-consuming the output of its predecessor. In this case, the `open()` method is 
called from the last operator, *i.e.* the 
-one whose output is also the output of the task itself, to the first. This is 
done so that when the first operator starts 
-processing the task's input, all downstream operators are ready to receive its 
output.
+当所有初始化都完成之后，算子开始处理流入的数据。流入的数据可以分为三种类型：用户数据、watermark 和 checkpoint 
barriers。每种类型的数据都有单独的方法来处理。用户数据通过 `processElement()` 方法来处理，watermark 通过 
`processWatermark()` 来处理，checkpoint barriers 会触发异步执行的 `snapshotState()` 方法来进行 
checkpoint。对于每个流入的数据，根据其类型调用上述方法之一。`processElement()`方法也是用户自定义函数逻辑执行的地方，比如用户自定义 
`MapFunction` 里的  `map()` 方法。

Review comment:
       ```suggestion
   当所有初始化都完成之后，算子开始处理流入的数据。流入的数据可以分为三种类型：用户数据、watermark 和 checkpoint 
barriers。每种类型的数据都有单独的方法来处理。用户数据通过 `processElement()` 方法来处理，watermark 通过 
`processWatermark()` 来处理，checkpoint barriers 会调用（异步）`snapshotState()` 方法触发 
checkpoint。对于每个流入的数据，根据其类型调用上述相应的方法。注意，`processElement()` 
方法也是用户自定义函数逻辑执行的地方，比如用户自定义 `MapFunction` 里的  `map()` 方法。
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] RocMarshal commented on a change in pull request #16304: [FLINK-12438][doc-zh]Translate Task Lifecycle document into Chinese

Reply via email to