Re: [PR] [FLINK-32443][docs-zh] Translate "State Processor API" page into Chinese [flink]

via GitHub Tue, 20 Feb 2024 02:48:13 -0800


Zakelly commented on code in PR #23496:
URL: https://github.com/apache/flink/pull/23496#discussion_r1495572575



##########
docs/content.zh/docs/libs/state_processor_api.md:
##########
@@ -27,77 +27,62 @@ under the License.
 
 # State Processor API
 
-Apache Flink's State Processor API provides powerful functionality to reading, 
writing, and modifying savepoints and checkpoints using Flink’s DataStream API 
under `BATCH` execution.
-Due to the [interoperability of DataStream and Table API]({{< ref 
"docs/dev/table/data_stream_api" >}}), you can even use relational Table API or 
SQL queries to analyze and process state data.
+Apache Flink 的 State Processor API 提供了批模式 (BATCH) 下使用 DataStream API 读取、写入、修改 
savepoint 和 checkpoint 的强大能力。
+由于 [DataStream 和 Table API 是等价的]({{< ref "docs/dev/table/data_stream_api" 
>}})，也可以使用 Table API 或 SQL 来分析和处理 savepoint 或 checkpoint 中的状态数据。
 
-For example, you can take a savepoint of a running stream processing 
application and analyze it with a DataStream batch program to verify that the 
application behaves correctly.
-Or you can read a batch of data from any store, preprocess it, and write the 
result to a savepoint that you use to bootstrap the state of a streaming 
application.
-It is also possible to fix inconsistent state entries.
-Finally, the State Processor API opens up many ways to evolve a stateful 
application that was previously blocked by parameter and design choices that 
could not be changed without losing all the state of the application after it 
was started.
-For example, you can now arbitrarily modify the data types of states, adjust 
the maximum parallelism of operators, split or merge operator state, re-assign 
operator UIDs, and so on.
+例如，可以获取一个正在运行的流应用程序的 savepoint，使用 State Processor API 在批模式下对该 savepoint 
进行分析，以验证应用程序的行为是否正确；
+还可以从任意存储中读取并预处理一批数据后将结果写入一个 savepoint，然后基于这个 savepoint 初始化流应用程序的状态； State 
Processor API 也可以用来修复不一致的状态条目。
+State Processor API 为有状态应用程序的演化提供了新的方式，以前只能通过无状态重启的方式来更新一个有状态应用程序的状态，现在可以通过 
State Processor API 修改状态的数据类型、调整操作符的最大并行度、拆分或合并操作符状态、重新分配操作符UID等。
 
-To get started with the state processor api, include the following library in 
your application.
+请在应用程序中包含以下库以使用 State Processor API。
 
 {{< artifact flink-state-processor-api >}}
 
-## Mapping Application State to DataSets
+## 从应用状态到逻辑表
 
-The State Processor API maps the state of a streaming application to one or 
more data sets that can be processed separately.
-In order to be able to use the API, you need to understand how this mapping 
works.
+State Processor API 将流应用程序的状态映射到若干个可以单独处理的逻辑表中，为了能使用 API，您需要先理解这种映射是如何工作的。

Review Comment:
   我建议不叫逻辑表，数据集可能用户更能理解一些



##########
docs/content.zh/docs/libs/state_processor_api.md:
##########
@@ -27,77 +27,62 @@ under the License.
 
 # State Processor API
 
-Apache Flink's State Processor API provides powerful functionality to reading, 
writing, and modifying savepoints and checkpoints using Flink’s DataStream API 
under `BATCH` execution.
-Due to the [interoperability of DataStream and Table API]({{< ref 
"docs/dev/table/data_stream_api" >}}), you can even use relational Table API or 
SQL queries to analyze and process state data.
+Apache Flink 的 State Processor API 提供了批模式 (BATCH) 下使用 DataStream API 读取、写入、修改 
savepoint 和 checkpoint 的强大能力。
+由于 [DataStream 和 Table API 是等价的]({{< ref "docs/dev/table/data_stream_api" 
>}})，也可以使用 Table API 或 SQL 来分析和处理 savepoint 或 checkpoint 中的状态数据。
 
-For example, you can take a savepoint of a running stream processing 
application and analyze it with a DataStream batch program to verify that the 
application behaves correctly.
-Or you can read a batch of data from any store, preprocess it, and write the 
result to a savepoint that you use to bootstrap the state of a streaming 
application.
-It is also possible to fix inconsistent state entries.
-Finally, the State Processor API opens up many ways to evolve a stateful 
application that was previously blocked by parameter and design choices that 
could not be changed without losing all the state of the application after it 
was started.
-For example, you can now arbitrarily modify the data types of states, adjust 
the maximum parallelism of operators, split or merge operator state, re-assign 
operator UIDs, and so on.
+例如，可以获取一个正在运行的流应用程序的 savepoint，使用 State Processor API 在批模式下对该 savepoint 
进行分析，以验证应用程序的行为是否正确；
+还可以从任意存储中读取并预处理一批数据后将结果写入一个 savepoint，然后基于这个 savepoint 初始化流应用程序的状态； State 
Processor API 也可以用来修复不一致的状态条目。
+State Processor API 为有状态应用程序的演化提供了新的方式，以前只能通过无状态重启的方式来更新一个有状态应用程序的状态，现在可以通过 
State Processor API 修改状态的数据类型、调整操作符的最大并行度、拆分或合并操作符状态、重新分配操作符UID等。
 
-To get started with the state processor api, include the following library in 
your application.
+请在应用程序中包含以下库以使用 State Processor API。
 
 {{< artifact flink-state-processor-api >}}
 
-## Mapping Application State to DataSets
+## 从应用状态到逻辑表

Review Comment:
   ```suggestion
   ## 将状态转化为数据集
   ```



##########
docs/content.zh/docs/libs/state_processor_api.md:
##########
@@ -321,22 +303,19 @@ savepoint
 
 ```
 
-Additionally, trigger state - from `CountTrigger`s or custom triggers - can be 
read using the method
-`Context#triggerState` inside the `WindowReaderFunction`.
+另外，可以通过`WindowReaderFunction`里的`Context#triggerState`方法读取`CountTrigger`或自定义触发器的状态。

Review Comment:
   这里是否需要带空格，英文前后



##########
docs/content.zh/docs/libs/state_processor_api.md:
##########
@@ -145,13 +130,13 @@ DataStream<Integer> listState = savepoint.readListState<>(
     new MyCustomIntSerializer());
 ```
 
-### Keyed State
+### 分区状态 Keyed State
 
-[Keyed state]({{< ref "docs/dev/datastream/fault-tolerance/state" 
>}}#keyed-state), or partitioned state, is any state that is partitioned 
relative to a key.
-When reading a keyed state, users specify the operator id and a 
`KeyedStateReaderFunction<KeyType, OutputType>`.
+[Keyed state]({{< ref "docs/dev/datastream/fault-tolerance/state" 
>}}#keyed-state)，又叫分区状态(partitioned state)，是与 key 相对应的状态。

Review Comment:
   ```suggestion
   [Keyed state]({{< ref "docs/dev/datastream/fault-tolerance/state" 
>}}#keyed-state)，又叫分区状态(partitioned state)，是使用一个 key 进行分区的状态。
   ```



##########
docs/content.zh/docs/libs/state_processor_api.md:
##########
@@ -321,22 +303,19 @@ savepoint
 
 ```
 
-Additionally, trigger state - from `CountTrigger`s or custom triggers - can be 
read using the method
-`Context#triggerState` inside the `WindowReaderFunction`.
+另外，可以通过`WindowReaderFunction`里的`Context#triggerState`方法读取`CountTrigger`或自定义触发器的状态。
 
-## Writing New Savepoints
+## 通过 State Processor API 写出状态
 
-`Savepoint`'s may also be written, which allows such use cases as 
bootstrapping state based on historical data.
-Each savepoint is made up of one or more `StateBootstrapTransformation`'s 
(explained below), each of which defines the state for an individual operator.
+State processor API 可以用来生成 savepoint，这使得用户可以基于历史数据进行状态的初始化。
+每个 savepoint 可以由若干个 `StateBootstrapTransformation` 生成，每个 
`StateBootstrapTransformation` 定义了一个 operator 的状态。

Review Comment:
   这里的operator需不需要翻译成算子



##########
docs/content.zh/docs/libs/state_processor_api.md:
##########
@@ -321,22 +303,19 @@ savepoint
 
 ```
 
-Additionally, trigger state - from `CountTrigger`s or custom triggers - can be 
read using the method
-`Context#triggerState` inside the `WindowReaderFunction`.
+另外，可以通过`WindowReaderFunction`里的`Context#triggerState`方法读取`CountTrigger`或自定义触发器的状态。
 
-## Writing New Savepoints
+## 通过 State Processor API 写出状态

Review Comment:
   ```suggestion
   ## 通过 State Processor API 写出新状态
   ```



##########
docs/content.zh/docs/libs/state_processor_api.md:
##########
@@ -133,9 +118,9 @@ DataStream<Tuple2<Integer, Integer>> broadcastState = 
savepoint.readBroadcastSta
     Types.INT);
 ```
 
-#### Using Custom Serializers
+#### 使用自定义序列化器
 
-Each of the operator state readers support using custom `TypeSerializers` if 
one was used to define the `StateDescriptor` that wrote out the state.
+Operator state readers 支持使用自定义的 `TypeSerializers`，如果在写出状态时 `StateDescriptor` 
使用了自定义的 `TypeSerializer`。

Review Comment:
   ```suggestion
   如果在写出状态时 `StateDescriptor` 使用了自定义的 `TypeSerializer`，Operator state 也支持使用自定义的 
`TypeSerializers`进行读取。
   ```



##########
docs/content.zh/docs/libs/state_processor_api.md:
##########
@@ -27,77 +27,62 @@ under the License.
 
 # State Processor API
 
-Apache Flink's State Processor API provides powerful functionality to reading, 
writing, and modifying savepoints and checkpoints using Flink’s DataStream API 
under `BATCH` execution.
-Due to the [interoperability of DataStream and Table API]({{< ref 
"docs/dev/table/data_stream_api" >}}), you can even use relational Table API or 
SQL queries to analyze and process state data.
+Apache Flink 的 State Processor API 提供了批模式 (BATCH) 下使用 DataStream API 读取、写入、修改 
savepoint 和 checkpoint 的强大能力。
+由于 [DataStream 和 Table API 是等价的]({{< ref "docs/dev/table/data_stream_api" 
>}})，也可以使用 Table API 或 SQL 来分析和处理 savepoint 或 checkpoint 中的状态数据。
 
-For example, you can take a savepoint of a running stream processing 
application and analyze it with a DataStream batch program to verify that the 
application behaves correctly.
-Or you can read a batch of data from any store, preprocess it, and write the 
result to a savepoint that you use to bootstrap the state of a streaming 
application.
-It is also possible to fix inconsistent state entries.
-Finally, the State Processor API opens up many ways to evolve a stateful 
application that was previously blocked by parameter and design choices that 
could not be changed without losing all the state of the application after it 
was started.
-For example, you can now arbitrarily modify the data types of states, adjust 
the maximum parallelism of operators, split or merge operator state, re-assign 
operator UIDs, and so on.
+例如，可以获取一个正在运行的流应用程序的 savepoint，使用 State Processor API 在批模式下对该 savepoint 
进行分析，以验证应用程序的行为是否正确；
+还可以从任意存储中读取并预处理一批数据后将结果写入一个 savepoint，然后基于这个 savepoint 初始化流应用程序的状态； State 
Processor API 也可以用来修复不一致的状态条目。
+State Processor API 为有状态应用程序的演化提供了新的方式，以前只能通过无状态重启的方式来更新一个有状态应用程序的状态，现在可以通过 
State Processor API 修改状态的数据类型、调整操作符的最大并行度、拆分或合并操作符状态、重新分配操作符UID等。
 
-To get started with the state processor api, include the following library in 
your application.
+请在应用程序中包含以下库以使用 State Processor API。
 
 {{< artifact flink-state-processor-api >}}
 
-## Mapping Application State to DataSets
+## 从应用状态到逻辑表
 
-The State Processor API maps the state of a streaming application to one or 
more data sets that can be processed separately.
-In order to be able to use the API, you need to understand how this mapping 
works.
+State Processor API 将流应用程序的状态映射到若干个可以单独处理的逻辑表中，为了能使用 API，您需要先理解这种映射是如何工作的。
 
-But let us first have a look at what a stateful Flink job looks like.
-A Flink job is composed of operators; typically one or more source operators, 
a few operators for the actual processing, and one or more sink operators.
-Each operator runs in parallel in one or more tasks and can work with 
different types of state.
-An operator can have zero, one, or more *“operator states”* which are 
organized as lists that are scoped to the operator's tasks.
-If the operator is applied on a keyed stream, it can also have zero, one, or 
more *“keyed states”* which are scoped to a key that is extracted from each 
processed record.
-You can think of keyed state as a distributed key-value map.
+让我们先看看有状态的 Flink 作业是什么样子的。Flink 作业由算子 (Operator) 组成: 一个作业通常包括若干个 Source 
算子，一些实际用于计算处理的算子以及若干个 Sink 算子。
+每个算子由若干个子任务并行运行，一个算子中可以有不同类型的 State。一个算子可以有若干个 operator 
state，这些状态被组织成列表，每个子任务的 State 对应列表中的一个元素。
+如果一个算子是 keyed stream 中的，则它可以有若干个 keyed state，用来存储从 record 中提取出的 key，keyed 
state 可以看作分布式键值映射。
 
-The following figure shows the application “MyApp” which consists of three 
operators called “Src”, “Proc”, and “Snk”.
-Src has one operator state (os1), Proc has one operator state (os2) and two 
keyed states (ks1, ks2) and Snk is stateless.
+下图展示了应用程序 MyApp 中的状态，它由三个名为 Src、Proc 和 Snk 的算子组成。Src 算子有一个 operator state 
(os1)，Proc 算子有一个 operator state (os2) 和两个 keyed state (ks1、ks2)，Snk 算子是无状态的。
 
 {{< img src="/fig/application-my-app-state-processor-api.png" width="600px" 
alt="Application: MyApp" >}}
 
-A savepoint or checkpoint of MyApp consists of the data of all states, 
organized in a way that the states of each task can be restored.
-When processing the data of a savepoint (or checkpoint) with a batch job, we 
need a mental model that maps the data of the individual tasks' states into 
data sets or tables.
-In fact, we can think of a savepoint as a database. Every operator (identified 
by its UID) represents a namespace.
-Each operator state of an operator is mapped to a dedicated table in the 
namespace with a single column that holds the state's data of all tasks.
-All keyed states of an operator are mapped to a single table consisting of a 
column for the key, and one column for each keyed state.
-The following figure shows how a savepoint of MyApp is mapped to a database.
+MyApp 的 savepoint 或 checkpoint 包含了所有状态数据，可以用来恢复每个子任务的状态。当使用批处理作业处理 
savepoint/checkpoint 的数据时，我们需要一个逻辑映射模型，将各个任务的状态数据映射到逻辑表中。 
+事实上，可以将 savepoint 视为数据库，每个算子（由其 UID 标识）代表一个命名空间。算子的 operator state 
可以映射为命名空间中一个单列的表，表中的一行代表一个子任务。
+算子所有的 keyed state 可以看作一个多列的表，每一列表示一个 keyed state。下图展示了 MyApp 的 savepoint 
和逻辑表间的映射关系。
 
 {{< img src="/fig/database-my-app-state-processor-api.png" width="600px" 
alt="Database: MyApp" >}}
 
-The figure shows how the values of Src's operator state are mapped to a table 
with one column and five rows, one row for each of the list entries across all 
parallel tasks of Src.
-Operator state os2 of the operator “Proc” is similarly mapped to an individual 
table.
-The keyed states ks1 and ks2 are combined to a single table with three 
columns, one for the key, one for ks1 and one for ks2.
-The keyed table holds one row for each distinct key of both keyed states.
-Since the operator “Snk” does not have any state, its namespace is empty.
+上图显示了 Src 算子的 operator state 与逻辑表的映射，逻辑表的每一行表示一个 Src 算子的子任务的状态。
+Proc 算子的 os2 也类似地映射到一个单列的表。Proc 算子的 ks1 和 ks2 组合成一个三列的表，第一列表示key，第二列表示 
ks1，第三列表示 ks2，每一行表示一个key的状态。
+Snk 算子没有状态，因此它的命名空间是空的。
 
-## Identifying operators
+## 算子的标识
 
-The State Processor API allows you to identify operators using [UIDs]({{< ref 
"docs/concepts/glossary" >}}#UID) or [UID hashes]({{< ref 
"docs/concepts/glossary" >}}#UID-hashes) via 
`OperatorIdentifier#forUid/forUidHash`.
-Hashes should only be used when the use of `UIDs` is not possible, for example 
when the application that created the [savepoint]({{< ref 
"docs/ops/state/savepoints" >}}) did not specify them or when the `UID` is 
unknown.
+State Processor API 允许使用 [UIDs]({{< ref "docs/concepts/glossary" >}}#UID) 或 
[UID hash]({{< ref "docs/concepts/glossary" 
>}}#UID-hashes)来识别算子：`OperatorIdentifier#forUid/forUidHash`。
+仅当无法使用 UID 时才应使用 UID hash，例如，当创建 [savepoint]({{< ref 
"docs/ops/state/savepoints" >}}) 的应用程序未指定 UID 或算子 UID 未知时。
 
-## Reading State
+## 通过 State Processor API 读取状态
 
-Reading state begins by specifying the path to a valid savepoint or checkpoint 
along with the `StateBackend` that should be used to restore the data.
-The compatibility guarantees for restoring state are identical to those when 
restoring a `DataStream` application.
+读取状态首先需要指定 savepoint 或 checkpoint 的路径以及用于恢复数据的 `状态存储后端(StateBackend)`。兼容性保证了 
state processor API 恢复的状态与 DataStream 应用恢复的状态是一致的。

Review Comment:
   ```suggestion
   读取状态首先需要指定 savepoint 或 checkpoint 的路径以及用于恢复数据的 `状态存储后端(StateBackend)`。 State 
processor API 恢复的状态与 DataStream 应用恢复的状态是一致的。
   ```



##########
docs/content.zh/docs/libs/state_processor_api.md:
##########
@@ -145,13 +130,13 @@ DataStream<Integer> listState = savepoint.readListState<>(
     new MyCustomIntSerializer());
 ```
 
-### Keyed State
+### 分区状态 Keyed State
 
-[Keyed state]({{< ref "docs/dev/datastream/fault-tolerance/state" 
>}}#keyed-state), or partitioned state, is any state that is partitioned 
relative to a key.
-When reading a keyed state, users specify the operator id and a 
`KeyedStateReaderFunction<KeyType, OutputType>`.
+[Keyed state]({{< ref "docs/dev/datastream/fault-tolerance/state" 
>}}#keyed-state)，又叫分区状态(partitioned state)，是与 key 相对应的状态。
+当读取 keyed state 时，需要指定算子 id 和一个 `KeyedStateReaderFunction<KeyType, 
OutputType>`。
 
-The `KeyedStateReaderFunction` allows users to read arbitrary columns and 
complex state types such as ListState, MapState, and AggregatingState.
-This means if an operator contains a stateful process function such as:
+`KeyedStateReaderFunction` 允许用户读取任意列和复杂的状态类型，如 ListState, MapState, 和 
AggregatingState。 
+这意味着如果一个算子包含一个 stateful process function，如：

Review Comment:
   ```suggestion
   这意味着如果一个算子包含一个带状态的处理函数，如：
   ```



##########
docs/content.zh/docs/libs/state_processor_api.md:
##########
@@ -145,13 +130,13 @@ DataStream<Integer> listState = savepoint.readListState<>(
     new MyCustomIntSerializer());
 ```
 
-### Keyed State
+### 分区状态 Keyed State

Review Comment:
   这个地方就叫 Keyed State 就好了吧？



##########
docs/content.zh/docs/libs/state_processor_api.md:
##########
@@ -223,19 +208,16 @@ public class ReaderFunction extends 
KeyedStateReaderFunction<Integer, KeyedState
 }
 ```
 
-Along with reading registered state values, each key has access to a `Context` 
with metadata such as registered event time and processing time timers.
-
-**Note:** When using a `KeyedStateReaderFunction`, all state descriptors must 
be registered eagerly inside of open. Any attempt to call a 
`RuntimeContext#get*State` will result in a `RuntimeException`.
+除了读取注册的状态之外，每个 key 还可以访问包括 event time 和 processing time 
[计时器](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/concepts/time/)等元数据的
 `Context`。
 
-### Window State
+**注意：** 当使用 `KeyedStateReaderFunction` 时，所有状态描述符必须在 open 函数中注册。 否则任何尝试调用 
`RuntimeContext#get*State` 将导致 `RuntimeException`。

Review Comment:
   ```suggestion
   **注意：** 当使用 `KeyedStateReaderFunction` 时，所有状态描述符必须在 `open` 函数中注册。 否则任何尝试调用 
`RuntimeContext#get*State` 将导致 `RuntimeException`。
   ```



##########
docs/content.zh/docs/libs/state_processor_api.md:
##########
@@ -223,19 +208,16 @@ public class ReaderFunction extends 
KeyedStateReaderFunction<Integer, KeyedState
 }
 ```
 
-Along with reading registered state values, each key has access to a `Context` 
with metadata such as registered event time and processing time timers.
-
-**Note:** When using a `KeyedStateReaderFunction`, all state descriptors must 
be registered eagerly inside of open. Any attempt to call a 
`RuntimeContext#get*State` will result in a `RuntimeException`.
+除了读取注册的状态之外，每个 key 还可以访问包括 event time 和 processing time 
[计时器](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/concepts/time/)等元数据的
 `Context`。
 
-### Window State
+**注意：** 当使用 `KeyedStateReaderFunction` 时，所有状态描述符必须在 open 函数中注册。 否则任何尝试调用 
`RuntimeContext#get*State` 将导致 `RuntimeException`。
 
-The state processor api supports reading state from a [window operator]({{< 
ref "docs/dev/datastream/operators/windows" >}}).
-When reading a window state, users specify the operator id, window assigner, 
and aggregation type.
+### 窗口状态 Window State
 
-Additionally, a `WindowReaderFunction` can be specified to enrich each read 
with additional information similar
-to a `WindowFunction` or `ProcessWindowFunction`.
+State Processor api 支持读取[窗口算子]({{< ref "docs/dev/datastream/operators/windows" 
>}})的状态，当读取窗口状态时，需要指定算子 id，窗口分配器和聚合类型。

Review Comment:
   我建议API统一大写



##########
docs/content.zh/docs/libs/state_processor_api.md:
##########
@@ -27,77 +27,62 @@ under the License.
 
 # State Processor API
 
-Apache Flink's State Processor API provides powerful functionality to reading, 
writing, and modifying savepoints and checkpoints using Flink’s DataStream API 
under `BATCH` execution.
-Due to the [interoperability of DataStream and Table API]({{< ref 
"docs/dev/table/data_stream_api" >}}), you can even use relational Table API or 
SQL queries to analyze and process state data.
+Apache Flink 的 State Processor API 提供了批模式 (BATCH) 下使用 DataStream API 读取、写入、修改 
savepoint 和 checkpoint 的强大能力。
+由于 [DataStream 和 Table API 是等价的]({{< ref "docs/dev/table/data_stream_api" 
>}})，也可以使用 Table API 或 SQL 来分析和处理 savepoint 或 checkpoint 中的状态数据。
 
-For example, you can take a savepoint of a running stream processing 
application and analyze it with a DataStream batch program to verify that the 
application behaves correctly.
-Or you can read a batch of data from any store, preprocess it, and write the 
result to a savepoint that you use to bootstrap the state of a streaming 
application.
-It is also possible to fix inconsistent state entries.
-Finally, the State Processor API opens up many ways to evolve a stateful 
application that was previously blocked by parameter and design choices that 
could not be changed without losing all the state of the application after it 
was started.
-For example, you can now arbitrarily modify the data types of states, adjust 
the maximum parallelism of operators, split or merge operator state, re-assign 
operator UIDs, and so on.
+例如，可以获取一个正在运行的流应用程序的 savepoint，使用 State Processor API 在批模式下对该 savepoint 
进行分析，以验证应用程序的行为是否正确；
+还可以从任意存储中读取并预处理一批数据后将结果写入一个 savepoint，然后基于这个 savepoint 初始化流应用程序的状态； State 
Processor API 也可以用来修复不一致的状态条目。
+State Processor API 为有状态应用程序的演化提供了新的方式，以前只能通过无状态重启的方式来更新一个有状态应用程序的状态，现在可以通过 
State Processor API 修改状态的数据类型、调整操作符的最大并行度、拆分或合并操作符状态、重新分配操作符UID等。
 
-To get started with the state processor api, include the following library in 
your application.
+请在应用程序中包含以下库以使用 State Processor API。
 
 {{< artifact flink-state-processor-api >}}
 
-## Mapping Application State to DataSets
+## 从应用状态到逻辑表
 
-The State Processor API maps the state of a streaming application to one or 
more data sets that can be processed separately.
-In order to be able to use the API, you need to understand how this mapping 
works.
+State Processor API 将流应用程序的状态映射到若干个可以单独处理的逻辑表中，为了能使用 API，您需要先理解这种映射是如何工作的。
 
-But let us first have a look at what a stateful Flink job looks like.
-A Flink job is composed of operators; typically one or more source operators, 
a few operators for the actual processing, and one or more sink operators.
-Each operator runs in parallel in one or more tasks and can work with 
different types of state.
-An operator can have zero, one, or more *“operator states”* which are 
organized as lists that are scoped to the operator's tasks.
-If the operator is applied on a keyed stream, it can also have zero, one, or 
more *“keyed states”* which are scoped to a key that is extracted from each 
processed record.
-You can think of keyed state as a distributed key-value map.
+让我们先看看有状态的 Flink 作业是什么样子的。Flink 作业由算子 (Operator) 组成: 一个作业通常包括若干个 Source 
算子，一些实际用于计算处理的算子以及若干个 Sink 算子。
+每个算子由若干个子任务并行运行，一个算子中可以有不同类型的 State。一个算子可以有若干个 operator 
state，这些状态被组织成列表，每个子任务的 State 对应列表中的一个元素。
+如果一个算子是 keyed stream 中的，则它可以有若干个 keyed state，用来存储从 record 中提取出的 key，keyed 
state 可以看作分布式键值映射。
 
-The following figure shows the application “MyApp” which consists of three 
operators called “Src”, “Proc”, and “Snk”.
-Src has one operator state (os1), Proc has one operator state (os2) and two 
keyed states (ks1, ks2) and Snk is stateless.
+下图展示了应用程序 MyApp 中的状态，它由三个名为 Src、Proc 和 Snk 的算子组成。Src 算子有一个 operator state 
(os1)，Proc 算子有一个 operator state (os2) 和两个 keyed state (ks1、ks2)，Snk 算子是无状态的。
 
 {{< img src="/fig/application-my-app-state-processor-api.png" width="600px" 
alt="Application: MyApp" >}}
 
-A savepoint or checkpoint of MyApp consists of the data of all states, 
organized in a way that the states of each task can be restored.
-When processing the data of a savepoint (or checkpoint) with a batch job, we 
need a mental model that maps the data of the individual tasks' states into 
data sets or tables.
-In fact, we can think of a savepoint as a database. Every operator (identified 
by its UID) represents a namespace.
-Each operator state of an operator is mapped to a dedicated table in the 
namespace with a single column that holds the state's data of all tasks.
-All keyed states of an operator are mapped to a single table consisting of a 
column for the key, and one column for each keyed state.
-The following figure shows how a savepoint of MyApp is mapped to a database.
+MyApp 的 savepoint 或 checkpoint 包含了所有状态数据，可以用来恢复每个子任务的状态。当使用批处理作业处理 
savepoint/checkpoint 的数据时，我们需要一个逻辑映射模型，将各个任务的状态数据映射到逻辑表中。 
+事实上，可以将 savepoint 视为数据库，每个算子（由其 UID 标识）代表一个命名空间。算子的 operator state 
可以映射为命名空间中一个单列的表，表中的一行代表一个子任务。
+算子所有的 keyed state 可以看作一个多列的表，每一列表示一个 keyed state。下图展示了 MyApp 的 savepoint 
和逻辑表间的映射关系。
 
 {{< img src="/fig/database-my-app-state-processor-api.png" width="600px" 
alt="Database: MyApp" >}}
 
-The figure shows how the values of Src's operator state are mapped to a table 
with one column and five rows, one row for each of the list entries across all 
parallel tasks of Src.
-Operator state os2 of the operator “Proc” is similarly mapped to an individual 
table.
-The keyed states ks1 and ks2 are combined to a single table with three 
columns, one for the key, one for ks1 and one for ks2.
-The keyed table holds one row for each distinct key of both keyed states.
-Since the operator “Snk” does not have any state, its namespace is empty.
+上图显示了 Src 算子的 operator state 与逻辑表的映射，逻辑表的每一行表示一个 Src 算子的子任务的状态。
+Proc 算子的 os2 也类似地映射到一个单列的表。Proc 算子的 ks1 和 ks2 组合成一个三列的表，第一列表示key，第二列表示 
ks1，第三列表示 ks2，每一行表示一个key的状态。
+Snk 算子没有状态，因此它的命名空间是空的。
 
-## Identifying operators
+## 算子的标识
 
-The State Processor API allows you to identify operators using [UIDs]({{< ref 
"docs/concepts/glossary" >}}#UID) or [UID hashes]({{< ref 
"docs/concepts/glossary" >}}#UID-hashes) via 
`OperatorIdentifier#forUid/forUidHash`.
-Hashes should only be used when the use of `UIDs` is not possible, for example 
when the application that created the [savepoint]({{< ref 
"docs/ops/state/savepoints" >}}) did not specify them or when the `UID` is 
unknown.
+State Processor API 允许使用 [UIDs]({{< ref "docs/concepts/glossary" >}}#UID) 或 
[UID hash]({{< ref "docs/concepts/glossary" 
>}}#UID-hashes)来识别算子：`OperatorIdentifier#forUid/forUidHash`。
+仅当无法使用 UID 时才应使用 UID hash，例如，当创建 [savepoint]({{< ref 
"docs/ops/state/savepoints" >}}) 的应用程序未指定 UID 或算子 UID 未知时。
 
-## Reading State
+## 通过 State Processor API 读取状态
 
-Reading state begins by specifying the path to a valid savepoint or checkpoint 
along with the `StateBackend` that should be used to restore the data.
-The compatibility guarantees for restoring state are identical to those when 
restoring a `DataStream` application.
+读取状态首先需要指定 savepoint 或 checkpoint 的路径以及用于恢复数据的 `状态存储后端(StateBackend)`。兼容性保证了 
state processor API 恢复的状态与 DataStream 应用恢复的状态是一致的。
 
 ```java
 StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
 SavepointReader savepoint = SavepointReader.read(env, "hdfs://path/", new 
HashMapStateBackend());
 ```
 
-
 ### Operator State
 
-[Operator state]({{< ref "docs/dev/datastream/fault-tolerance/state" 
>}}#operator-state) is any non-keyed state in Flink.
-This includes, but is not limited to, any use of `CheckpointedFunction` or 
`BroadcastState` within an application.
-When reading operator state, users specify the operator uid, the state name, 
and the type information.
+Flink 中的 non-keyed state 被称为 [operator state]({{< ref 
"docs/dev/datastream/fault-tolerance/state" >}}#operator-state)。
+在应用程序中使用 `CheckpointedFunction` 或 `BroadcastState` 会生成 operator State。 读取 
operator state 时，需要指定算子 uid、状态名称和类型信息。 

Review Comment:
   这个 UID 是否需要大写



##########
docs/content.zh/docs/libs/state_processor_api.md:
##########
@@ -27,77 +27,62 @@ under the License.
 
 # State Processor API
 
-Apache Flink's State Processor API provides powerful functionality to reading, 
writing, and modifying savepoints and checkpoints using Flink’s DataStream API 
under `BATCH` execution.
-Due to the [interoperability of DataStream and Table API]({{< ref 
"docs/dev/table/data_stream_api" >}}), you can even use relational Table API or 
SQL queries to analyze and process state data.
+Apache Flink 的 State Processor API 提供了批模式 (BATCH) 下使用 DataStream API 读取、写入、修改 
savepoint 和 checkpoint 的强大能力。
+由于 [DataStream 和 Table API 是等价的]({{< ref "docs/dev/table/data_stream_api" 
>}})，也可以使用 Table API 或 SQL 来分析和处理 savepoint 或 checkpoint 中的状态数据。
 
-For example, you can take a savepoint of a running stream processing 
application and analyze it with a DataStream batch program to verify that the 
application behaves correctly.
-Or you can read a batch of data from any store, preprocess it, and write the 
result to a savepoint that you use to bootstrap the state of a streaming 
application.
-It is also possible to fix inconsistent state entries.
-Finally, the State Processor API opens up many ways to evolve a stateful 
application that was previously blocked by parameter and design choices that 
could not be changed without losing all the state of the application after it 
was started.
-For example, you can now arbitrarily modify the data types of states, adjust 
the maximum parallelism of operators, split or merge operator state, re-assign 
operator UIDs, and so on.
+例如，可以获取一个正在运行的流应用程序的 savepoint，使用 State Processor API 在批模式下对该 savepoint 
进行分析，以验证应用程序的行为是否正确；
+还可以从任意存储中读取并预处理一批数据后将结果写入一个 savepoint，然后基于这个 savepoint 初始化流应用程序的状态； State 
Processor API 也可以用来修复不一致的状态条目。
+State Processor API 为有状态应用程序的演化提供了新的方式，以前只能通过无状态重启的方式来更新一个有状态应用程序的状态，现在可以通过 
State Processor API 修改状态的数据类型、调整操作符的最大并行度、拆分或合并操作符状态、重新分配操作符UID等。

Review Comment:
   这句话好像缺少了点原文信息。
   
   ```suggestion
   State Processor API 为有状态应用程序的演化提供了新的方式。以前有状态应用程序不能够进行更改，否则会丢失所有状态。现在可以通过 
State Processor API 修改状态的数据类型、调整操作符的最大并行度、拆分或合并操作符状态、重新分配操作符UID等。
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-32443][docs-zh] Translate "State Processor API" page into Chinese [flink]

Reply via email to