Myasuka commented on a change in pull request #18460:
URL: https://github.com/apache/flink/pull/18460#discussion_r793284042
##########
File path: docs/content.zh/docs/dev/datastream/fault-tolerance/state.md
##########
@@ -85,15 +76,12 @@ keyed = words.key_by(lambda row: row[0])
{{< /tab >}}
{{< /tabs >}}
-#### Tuple Keys and Expression Keys
+#### Tuple Keys 和 Expression Keys
-Flink also has two alternative ways of defining keys: tuple keys and expression
-keys in the Java/Scala API(still not supported in the Python API). With this
you can
-specify keys using tuple field indices or expressions
-for selecting fields of objects. We don't recommend using these today but you
-can refer to the Javadoc of DataStream to learn about them. Using a KeySelector
-function is strictly superior: with Java lambdas they are easy to use and they
-have potentially less overhead at runtime.
+Flink 也有两种不同定义 key 的方式:Java/Scala API 的 Tuple key 和 Expression key (Python API
仍未支持)。
+借此你可以通过 tuple 字段索引,或者是选取对象字段的表达式来指定 key。
+如今我们不建议这样使用,但你可以参考 `DataStream` 的 Javadoc 来了解它们。
+使用 KeySelector 函数绝对是更好的。它配合 Java Lambda 更易于使用,运行时可能具有更小的开销。
Review comment:
```suggestion
使用 KeySelector 函数显然是更好的。以几乎可以忽略的额外开销为代价,结合 Java Lambda
表达式,我们可以更方便得使用KeySelector。
```
##########
File path: docs/content.zh/docs/dev/datastream/fault-tolerance/state.md
##########
@@ -605,41 +593,27 @@ val counts: DataStream[(String, Int)] = stream
})
```
-## Operator State
+## 算子状态 (Operator State)
-*Operator State* (or *non-keyed state*) is state that is bound to one
-parallel operator instance. The [Kafka Connector]({{< ref
"docs/connectors/datastream/kafka" >}}) is a good motivating example for the
use of
-Operator State in Flink. Each parallel instance of the Kafka consumer maintains
-a map of topic partitions and offsets as its Operator State.
+*算子状态*(或者*非 keyed 状态*)是绑定到一个并行算子实例的状态。[Kafka Connector]({{< ref
"docs/connectors/datastream/kafka" >}}) 是 Flink 中使用算子状态一个很具有启发性的例子。Kafka
consumer 每个并行实例维护了 topic partitions 和偏移量的 map 作为它的算子状态。
-The Operator State interfaces support redistributing state among parallel
-operator instances when the parallelism is changed. There are different schemes
-for doing this redistribution.
+当并行度改变的时候,算子状态接口支持将状态重新分发给各并行算子实例。处理重分发过程有多种不同的方案。
-In a typical stateful Flink Application you don't need operators state. It is
-mostly a special type of state that is used in source/sink implementations and
-scenarios where you don't have a key by which state can be partitioned.
+在典型的有状态 Flink 应用中你无需使用算子状态。它大都作为一种特殊类型的状态使用。用于实现 source/sink,以及你没有用来对 state
分区的 key 这类场景中。
Review comment:
```suggestion
在典型的有状态 Flink 应用中你无需使用算子状态。它大都作为一种特殊类型的状态使用。用于实现 source/sink,以及无法对 state
进行分区而没有主键的这类场景中。
```
##########
File path: docs/content.zh/docs/dev/datastream/fault-tolerance/state.md
##########
@@ -85,15 +76,12 @@ keyed = words.key_by(lambda row: row[0])
{{< /tab >}}
{{< /tabs >}}
-#### Tuple Keys and Expression Keys
+#### Tuple Keys 和 Expression Keys
-Flink also has two alternative ways of defining keys: tuple keys and expression
-keys in the Java/Scala API(still not supported in the Python API). With this
you can
-specify keys using tuple field indices or expressions
-for selecting fields of objects. We don't recommend using these today but you
-can refer to the Javadoc of DataStream to learn about them. Using a KeySelector
-function is strictly superior: with Java lambdas they are easy to use and they
-have potentially less overhead at runtime.
+Flink 也有两种不同定义 key 的方式:Java/Scala API 的 Tuple key 和 Expression key (Python API
仍未支持)。
Review comment:
还是建议对 Tuple keys 和 Exprssion keys 做一个解释。
##########
File path: docs/content.zh/docs/dev/datastream/fault-tolerance/state.md
##########
@@ -605,41 +593,27 @@ val counts: DataStream[(String, Int)] = stream
})
```
-## Operator State
+## 算子状态 (Operator State)
-*Operator State* (or *non-keyed state*) is state that is bound to one
-parallel operator instance. The [Kafka Connector]({{< ref
"docs/connectors/datastream/kafka" >}}) is a good motivating example for the
use of
-Operator State in Flink. Each parallel instance of the Kafka consumer maintains
-a map of topic partitions and offsets as its Operator State.
+*算子状态*(或者*非 keyed 状态*)是绑定到一个并行算子实例的状态。[Kafka Connector]({{< ref
"docs/connectors/datastream/kafka" >}}) 是 Flink 中使用算子状态一个很具有启发性的例子。Kafka
consumer 每个并行实例维护了 topic partitions 和偏移量的 map 作为它的算子状态。
-The Operator State interfaces support redistributing state among parallel
-operator instances when the parallelism is changed. There are different schemes
-for doing this redistribution.
+当并行度改变的时候,算子状态接口支持将状态重新分发给各并行算子实例。处理重分发过程有多种不同的方案。
Review comment:
```suggestion
当并行度改变的时候,算子状态支持将状态重新分发给各并行算子实例。处理重分发过程有多种不同的方案。
```
##########
File path: docs/content.zh/docs/dev/datastream/fault-tolerance/state.md
##########
@@ -25,32 +25,23 @@ specific language governing permissions and limitations
under the License.
-->
-# Working with State
+# 使用状态
-In this section you will learn about the APIs that Flink provides for writing
-stateful programs. Please take a look at [Stateful Stream
-Processing]({{< ref "docs/concepts/stateful-stream-processing" >}})
-to learn about the concepts behind stateful stream processing.
+本章节您将了解 Flink 用于编写有状态程序的 API。要了解有状态流处理背后的概念,请参阅[Stateful Stream
+Processing]({{< ref "docs/concepts/stateful-stream-processing" >}})。
## Keyed DataStream
-If you want to use keyed state, you first need to specify a key on a
-`DataStream` that should be used to partition the state (and also the records
-in the stream themselves). You can specify a key using `keyBy(KeySelector)`
-in Java/Scala API or `key_by(KeySelector)` in Python API on a `DataStream`.
-This will yield a `KeyedStream`, which then allows operations that use keyed
state.
+如果你希望使用 keyed state,首先需要为`DataStream`指定 key。这个 key 用于状态分区(也会给数据流中的记录本身分区)。
+你可以使用 `DataStream` 中 Java/Scala API 的 `keyBy(KeySelector)` 或者是 Python API 的
`key_by(KeySelector)` 来指定 key。
+它将生成 `KeyedStream`,接下来允许使用 keyed state 操作。
-A key selector function takes a single record as input and returns the key for
-that record. The key can be of any type and **must** be derived from
-deterministic computations.
+Key selector 函数接收单条记录作为输入,返回这条记录的 key。该 key 可以为任何类型,它**必须**源于确定计算。
-The data model of Flink is not based on key-value pairs. Therefore, you do not
-need to physically pack the data set types into keys and values. Keys are
-"virtual": they are defined as functions over the actual data to guide the
-grouping operator.
+Flink 的数据模型不基于 key-value 对,因此实际上将数据集的类型打包为 key 和 value 是没有必要的。
Review comment:
```suggestion
Flink 的数据模型不基于 key-value 对,因此实际上将数据集在物理上封装成 key 和 value 是没有必要的。
```
##########
File path: docs/content.zh/docs/dev/datastream/fault-tolerance/state.md
##########
@@ -25,32 +25,23 @@ specific language governing permissions and limitations
under the License.
-->
-# Working with State
+# 使用状态
-In this section you will learn about the APIs that Flink provides for writing
-stateful programs. Please take a look at [Stateful Stream
-Processing]({{< ref "docs/concepts/stateful-stream-processing" >}})
-to learn about the concepts behind stateful stream processing.
+本章节您将了解 Flink 用于编写有状态程序的 API。要了解有状态流处理背后的概念,请参阅[Stateful Stream
+Processing]({{< ref "docs/concepts/stateful-stream-processing" >}})。
## Keyed DataStream
-If you want to use keyed state, you first need to specify a key on a
-`DataStream` that should be used to partition the state (and also the records
-in the stream themselves). You can specify a key using `keyBy(KeySelector)`
-in Java/Scala API or `key_by(KeySelector)` in Python API on a `DataStream`.
-This will yield a `KeyedStream`, which then allows operations that use keyed
state.
+如果你希望使用 keyed state,首先需要为`DataStream`指定 key。这个 key 用于状态分区(也会给数据流中的记录本身分区)。
+你可以使用 `DataStream` 中 Java/Scala API 的 `keyBy(KeySelector)` 或者是 Python API 的
`key_by(KeySelector)` 来指定 key。
+它将生成 `KeyedStream`,接下来允许使用 keyed state 操作。
-A key selector function takes a single record as input and returns the key for
-that record. The key can be of any type and **must** be derived from
-deterministic computations.
+Key selector 函数接收单条记录作为输入,返回这条记录的 key。该 key 可以为任何类型,它**必须**源于确定计算。
Review comment:
```suggestion
Key selector 函数接收单条记录作为输入,返回这条记录的 key。该 key 可以为任何类型,但是它的计算产生方式**必须**是具备确定性的。
```
##########
File path: docs/content.zh/docs/dev/datastream/fault-tolerance/state.md
##########
@@ -25,32 +25,23 @@ specific language governing permissions and limitations
under the License.
-->
-# Working with State
+# 使用状态
-In this section you will learn about the APIs that Flink provides for writing
-stateful programs. Please take a look at [Stateful Stream
-Processing]({{< ref "docs/concepts/stateful-stream-processing" >}})
-to learn about the concepts behind stateful stream processing.
+本章节您将了解 Flink 用于编写有状态程序的 API。要了解有状态流处理背后的概念,请参阅[Stateful Stream
+Processing]({{< ref "docs/concepts/stateful-stream-processing" >}})。
## Keyed DataStream
-If you want to use keyed state, you first need to specify a key on a
-`DataStream` that should be used to partition the state (and also the records
-in the stream themselves). You can specify a key using `keyBy(KeySelector)`
-in Java/Scala API or `key_by(KeySelector)` in Python API on a `DataStream`.
-This will yield a `KeyedStream`, which then allows operations that use keyed
state.
+如果你希望使用 keyed state,首先需要为`DataStream`指定 key。这个 key 用于状态分区(也会给数据流中的记录本身分区)。
Review comment:
```suggestion
如果你希望使用 keyed state,首先需要为`DataStream`指定 key(主键)。这个主键 用于状态分区(也会给数据流中的记录本身分区)。
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]