twentyworld commented on a change in pull request #12237:
URL: https://github.com/apache/flink/pull/12237#discussion_r437403068
##########
File path: docs/training/streaming_analytics.zh.md
##########
@@ -29,123 +29,104 @@ under the License.
## Event Time and Watermarks
-### Introduction
+<a name="Introduction"></a>
+### 概要
-Flink explicitly supports three different notions of time:
+Flink 明确支持以下三种时间语义:
-* _event time:_ the time when an event occurred, as recorded by the device
producing (or storing) the event
+* _事件时间(event time):_ 事件产生的时间,记录的是设备生产(或者存储)事件的时间
-* _ingestion time:_ a timestamp recorded by Flink at the moment it ingests the
event
+* _摄取时间(ingestion time):_ Flink 读取事件时记录的时间
-* _processing time:_ the time when a specific operator in your pipeline is
processing the event
+* _处理时间(processing time):_ Flink pipeline 中具体算子处理事件的时间
-For reproducible results, e.g., when computing the maximum price a stock
reached during the first
-hour of trading on a given day, you should use event time. In this way the
result won't depend on
-when the calculation is performed. This kind of real-time application is
sometimes performed using
-processing time, but then the results are determined by the events that happen
to be processed
-during that hour, rather than the events that occurred then. Computing
analytics based on processing
-time causes inconsistencies, and makes it difficult to re-analyze historic
data or test new
-implementations.
+为了获得可重现的结果,例如在计算过去的特定一天里第一个小时股票的最高价格时,我们应该使用事件时间。这样的话,无论
+什么时间去计算都不会影响输出结果。然而有些人,在实时计算应用中使用处理时间,这样的话,输出结果就会被处理时间点所决
+定,而不是生产事件的时间。基于处理时间会导致多次计算的结果不一致,也可能会导致再次分析历史数据或者测试新代码变得异常困难。
Review comment:
是的,这一点,我反复斟酌,还是没弄好,这里中间 加上 `的计算` 会好很多。
或者这样:`多次运行基于 processing time 的实时程序,可能得到的结果都不相同`?
感觉这样的话,很清晰,也很直白。
##########
File path: docs/training/streaming_analytics.zh.md
##########
@@ -29,123 +29,104 @@ under the License.
## Event Time and Watermarks
-### Introduction
+<a name="Introduction"></a>
+### 概要
-Flink explicitly supports three different notions of time:
+Flink 明确支持以下三种时间语义:
-* _event time:_ the time when an event occurred, as recorded by the device
producing (or storing) the event
+* _事件时间(event time):_ 事件产生的时间,记录的是设备生产(或者存储)事件的时间
-* _ingestion time:_ a timestamp recorded by Flink at the moment it ingests the
event
+* _摄取时间(ingestion time):_ Flink 读取事件时记录的时间
-* _processing time:_ the time when a specific operator in your pipeline is
processing the event
+* _处理时间(processing time):_ Flink pipeline 中具体算子处理事件的时间
-For reproducible results, e.g., when computing the maximum price a stock
reached during the first
-hour of trading on a given day, you should use event time. In this way the
result won't depend on
-when the calculation is performed. This kind of real-time application is
sometimes performed using
-processing time, but then the results are determined by the events that happen
to be processed
-during that hour, rather than the events that occurred then. Computing
analytics based on processing
-time causes inconsistencies, and makes it difficult to re-analyze historic
data or test new
-implementations.
+为了获得可重现的结果,例如在计算过去的特定一天里第一个小时股票的最高价格时,我们应该使用事件时间。这样的话,无论
+什么时间去计算都不会影响输出结果。然而有些人,在实时计算应用中使用处理时间,这样的话,输出结果就会被处理时间点所决
+定,而不是生产事件的时间。基于处理时间会导致多次计算的结果不一致,也可能会导致再次分析历史数据或者测试新代码变得异常困难。
-### Working with Event Time
+<a name="Working-with-Event-Time"></a>
+### 使用 Event Time
-By default, Flink will use processing time. To change this, you can set the
Time Characteristic:
+Flink 在默认情况下是使用处理时间。也可以通过下面配置来告诉 Flink 选择哪种时间语义:
{% highlight java %}
final StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
{% endhighlight %}
-If you want to use event time, you will also need to supply a Timestamp
Extractor and Watermark
-Generator that Flink will use to track the progress of event time. This will
be covered in the
-section below on [Working with Watermarks]({% link
-training/streaming_analytics.zh.md %}#working-with-watermarks), but first we
should explain what
-watermarks are.
+如果想要使用事件时间,需要额外给 Flink 提供一个时间戳的提取器和 Watermark 生成器,Flink 将使用它们来跟踪事件时间的进度。这
+将在选节[使用 Watermarks]({% link training/streaming_analytics.zh.md
%}#Working-with-Watermarks)中介绍,但是首先我们需要解释一下
+ watermarks 是什么。
### Watermarks
-Let's work through a simple example that will show why watermarks are needed,
and how they work.
+让我们通过一个简单的示例来演示为什么需要 watermarks 及其工作方式。
-In this example you have a stream of timestamped events that arrive somewhat
out of order, as shown
-below. The numbers shown are timestamps that indicate when these events
actually occurred. The first
-event to arrive happened at time 4, and it is followed by an event that
happened earlier, at time 2,
-and so on:
+在此示例中,我们将看到带有混乱时间戳的事件流,如下所示。显示的数字表达的是这些事件实际发生时间的时间戳。到达的
+第一个事件发生在时间4,随后发生的事件发生在更早的时间2,依此类推:
<div class="text-center" style="font-size: x-large; word-spacing: 0.5em;
margin: 1em 0em;">
··· 23 19 22 24 21 14 17 13 12 15 9 11 7 2 4 →
</div>
-Now imagine that you are trying create a stream sorter. This is meant to be an
application that
-processes each event from a stream as it arrives, and emits a new stream
containing the same events,
-but ordered by their timestamps.
+假设我们要对数据流排序,我们想要达到的目的是:应用程序应该在数据流里的事件到达时就处理每个事件,并发出包含相同
+事件但按其时间戳排序的新流。
Review comment:
这句话翻译起来有点意思,感觉拿捏的不好:
应用程序应该在数据流里的事件到达时就有一个算子(我们暂且称之为排序)开始处理事件,这个算子所输出的流是按照时间戳排序好的。
##########
File path: docs/training/streaming_analytics.zh.md
##########
@@ -29,123 +29,104 @@ under the License.
## Event Time and Watermarks
-### Introduction
+<a name="Introduction"></a>
+### 概要
-Flink explicitly supports three different notions of time:
+Flink 明确支持以下三种时间语义:
-* _event time:_ the time when an event occurred, as recorded by the device
producing (or storing) the event
+* _事件时间(event time):_ 事件产生的时间,记录的是设备生产(或者存储)事件的时间
-* _ingestion time:_ a timestamp recorded by Flink at the moment it ingests the
event
+* _摄取时间(ingestion time):_ Flink 读取事件时记录的时间
-* _processing time:_ the time when a specific operator in your pipeline is
processing the event
+* _处理时间(processing time):_ Flink pipeline 中具体算子处理事件的时间
-For reproducible results, e.g., when computing the maximum price a stock
reached during the first
-hour of trading on a given day, you should use event time. In this way the
result won't depend on
-when the calculation is performed. This kind of real-time application is
sometimes performed using
-processing time, but then the results are determined by the events that happen
to be processed
-during that hour, rather than the events that occurred then. Computing
analytics based on processing
-time causes inconsistencies, and makes it difficult to re-analyze historic
data or test new
-implementations.
+为了获得可重现的结果,例如在计算过去的特定一天里第一个小时股票的最高价格时,我们应该使用事件时间。这样的话,无论
+什么时间去计算都不会影响输出结果。然而有些人,在实时计算应用中使用处理时间,这样的话,输出结果就会被处理时间点所决
+定,而不是生产事件的时间。基于处理时间会导致多次计算的结果不一致,也可能会导致再次分析历史数据或者测试新代码变得异常困难。
-### Working with Event Time
+<a name="Working-with-Event-Time"></a>
+### 使用 Event Time
-By default, Flink will use processing time. To change this, you can set the
Time Characteristic:
+Flink 在默认情况下是使用处理时间。也可以通过下面配置来告诉 Flink 选择哪种时间语义:
{% highlight java %}
final StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
{% endhighlight %}
-If you want to use event time, you will also need to supply a Timestamp
Extractor and Watermark
-Generator that Flink will use to track the progress of event time. This will
be covered in the
-section below on [Working with Watermarks]({% link
-training/streaming_analytics.zh.md %}#working-with-watermarks), but first we
should explain what
-watermarks are.
+如果想要使用事件时间,需要额外给 Flink 提供一个时间戳的提取器和 Watermark 生成器,Flink 将使用它们来跟踪事件时间的进度。这
+将在选节[使用 Watermarks]({% link training/streaming_analytics.zh.md
%}#Working-with-Watermarks)中介绍,但是首先我们需要解释一下
+ watermarks 是什么。
### Watermarks
-Let's work through a simple example that will show why watermarks are needed,
and how they work.
+让我们通过一个简单的示例来演示为什么需要 watermarks 及其工作方式。
-In this example you have a stream of timestamped events that arrive somewhat
out of order, as shown
-below. The numbers shown are timestamps that indicate when these events
actually occurred. The first
-event to arrive happened at time 4, and it is followed by an event that
happened earlier, at time 2,
-and so on:
+在此示例中,我们将看到带有混乱时间戳的事件流,如下所示。显示的数字表达的是这些事件实际发生时间的时间戳。到达的
+第一个事件发生在时间4,随后发生的事件发生在更早的时间2,依此类推:
<div class="text-center" style="font-size: x-large; word-spacing: 0.5em;
margin: 1em 0em;">
··· 23 19 22 24 21 14 17 13 12 15 9 11 7 2 4 →
</div>
-Now imagine that you are trying create a stream sorter. This is meant to be an
application that
-processes each event from a stream as it arrives, and emits a new stream
containing the same events,
-but ordered by their timestamps.
+假设我们要对数据流排序,我们想要达到的目的是:应用程序应该在数据流里的事件到达时就处理每个事件,并发出包含相同
+事件但按其时间戳排序的新流。
-Some observations:
+让我们重新审视这些数据:
-(1) The first element your stream sorter sees is the 4, but you can't just
immediately release it as
-the first element of the sorted stream. It may have arrived out of order, and
an earlier event might
-yet arrive. In fact, you have the benefit of some god-like knowledge of this
stream's future, and
-you can see that your stream sorter should wait at least until the 2 arrives
before producing any
-results.
+(1) 我们的排序器第一个看到的数据是4,但是我们不能立即将其作为已排序流的第一个元素释放。因为我们并不能确定它是
+有序的,并且较早的事件有可能并未到达。事实上,如果站在上帝视角,我们知道,必须要等到2到来时,排序器才可以有事件输出。
-*Some buffering, and some delay, is necessary.*
+*需要一些缓冲,需要一些时间,但这都是值得的*
-(2) If you do this wrong, you could end up waiting forever. First the sorter
saw an event from time
-4, and then an event from time 2. Will an event with a timestamp less than 2
ever arrive? Maybe.
-Maybe not. You could wait forever and never see a 1.
+(2) 接下来的这一步,如果我们选择的是固执的等待,我们永远不会有结果。首先,我们从时间4看到了一个事件,然后从时
+间2看到了一个事件。可是,时间戳小于2的事件接下来会不会到来呢?可能会,也可能不会。再次站在上帝视角,我们知道,我
+们永远不会看到1。
-*Eventually you have to be courageous and emit the 2 as the start of the
sorted stream.*
+*最终,我们必须勇于承担责任,并发出指令,把2作为已排序的事件流的开始*
-(3) What you need then is some sort of policy that defines when, for any given
timestamped event, to
-stop waiting for the arrival of earlier events.
+(3)然后,我们需要一种策略,该策略定义:对于任何给定时间戳的事件,Flink何时停止等待较早事件的到来。
-*This is precisely what watermarks do* — they define when to stop waiting for
earlier events.
+*这正是 watermarks 的作用* — 它们定义何时停止等待较早的事件。
-Event time processing in Flink depends on *watermark generators* that insert
special timestamped
-elements into the stream, called *watermarks*. A watermark for time _t_ is an
assertion that the
-stream is (probably) now complete up through time _t_.
+Flink 中事件时间的处理取决于 *watermark 生成器*,后者将带有时间戳的特殊元素插入流中形成 *watermarks*。事件
+时间 _t_ 的 watermark 代表 _t_ 之后(很可能)不会有新的元素到达。
-When should this stream sorter stop waiting, and push out the 2 to start the
sorted stream? When a
-watermark arrives with a timestamp of 2, or greater.
+事件流的排序器应何时停止等待,并推出2以启动已分类的流?当 watermark 以2或更大的时间戳到达时!
Review comment:
当 watermark 以 2 或更大的时间戳到达时,事件流的排序器应何时停止等待,并推出 2 作为已经排序好的流。
这样会好一点,感觉
##########
File path: docs/training/streaming_analytics.zh.md
##########
@@ -29,123 +29,104 @@ under the License.
## Event Time and Watermarks
-### Introduction
+<a name="Introduction"></a>
+### 概要
-Flink explicitly supports three different notions of time:
+Flink 明确支持以下三种时间语义:
-* _event time:_ the time when an event occurred, as recorded by the device
producing (or storing) the event
+* _事件时间(event time):_ 事件产生的时间,记录的是设备生产(或者存储)事件的时间
-* _ingestion time:_ a timestamp recorded by Flink at the moment it ingests the
event
+* _摄取时间(ingestion time):_ Flink 读取事件时记录的时间
-* _processing time:_ the time when a specific operator in your pipeline is
processing the event
+* _处理时间(processing time):_ Flink pipeline 中具体算子处理事件的时间
-For reproducible results, e.g., when computing the maximum price a stock
reached during the first
-hour of trading on a given day, you should use event time. In this way the
result won't depend on
-when the calculation is performed. This kind of real-time application is
sometimes performed using
-processing time, but then the results are determined by the events that happen
to be processed
-during that hour, rather than the events that occurred then. Computing
analytics based on processing
-time causes inconsistencies, and makes it difficult to re-analyze historic
data or test new
-implementations.
+为了获得可重现的结果,例如在计算过去的特定一天里第一个小时股票的最高价格时,我们应该使用事件时间。这样的话,无论
+什么时间去计算都不会影响输出结果。然而有些人,在实时计算应用中使用处理时间,这样的话,输出结果就会被处理时间点所决
+定,而不是生产事件的时间。基于处理时间会导致多次计算的结果不一致,也可能会导致再次分析历史数据或者测试新代码变得异常困难。
-### Working with Event Time
+<a name="Working-with-Event-Time"></a>
+### 使用 Event Time
-By default, Flink will use processing time. To change this, you can set the
Time Characteristic:
+Flink 在默认情况下是使用处理时间。也可以通过下面配置来告诉 Flink 选择哪种时间语义:
{% highlight java %}
final StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
{% endhighlight %}
-If you want to use event time, you will also need to supply a Timestamp
Extractor and Watermark
-Generator that Flink will use to track the progress of event time. This will
be covered in the
-section below on [Working with Watermarks]({% link
-training/streaming_analytics.zh.md %}#working-with-watermarks), but first we
should explain what
-watermarks are.
+如果想要使用事件时间,需要额外给 Flink 提供一个时间戳的提取器和 Watermark 生成器,Flink 将使用它们来跟踪事件时间的进度。这
+将在选节[使用 Watermarks]({% link training/streaming_analytics.zh.md
%}#Working-with-Watermarks)中介绍,但是首先我们需要解释一下
+ watermarks 是什么。
### Watermarks
-Let's work through a simple example that will show why watermarks are needed,
and how they work.
+让我们通过一个简单的示例来演示为什么需要 watermarks 及其工作方式。
-In this example you have a stream of timestamped events that arrive somewhat
out of order, as shown
-below. The numbers shown are timestamps that indicate when these events
actually occurred. The first
-event to arrive happened at time 4, and it is followed by an event that
happened earlier, at time 2,
-and so on:
+在此示例中,我们将看到带有混乱时间戳的事件流,如下所示。显示的数字表达的是这些事件实际发生时间的时间戳。到达的
+第一个事件发生在时间4,随后发生的事件发生在更早的时间2,依此类推:
<div class="text-center" style="font-size: x-large; word-spacing: 0.5em;
margin: 1em 0em;">
··· 23 19 22 24 21 14 17 13 12 15 9 11 7 2 4 →
</div>
-Now imagine that you are trying create a stream sorter. This is meant to be an
application that
-processes each event from a stream as it arrives, and emits a new stream
containing the same events,
-but ordered by their timestamps.
+假设我们要对数据流排序,我们想要达到的目的是:应用程序应该在数据流里的事件到达时就处理每个事件,并发出包含相同
+事件但按其时间戳排序的新流。
-Some observations:
+让我们重新审视这些数据:
-(1) The first element your stream sorter sees is the 4, but you can't just
immediately release it as
-the first element of the sorted stream. It may have arrived out of order, and
an earlier event might
-yet arrive. In fact, you have the benefit of some god-like knowledge of this
stream's future, and
-you can see that your stream sorter should wait at least until the 2 arrives
before producing any
-results.
+(1) 我们的排序器第一个看到的数据是4,但是我们不能立即将其作为已排序流的第一个元素释放。因为我们并不能确定它是
+有序的,并且较早的事件有可能并未到达。事实上,如果站在上帝视角,我们知道,必须要等到2到来时,排序器才可以有事件输出。
-*Some buffering, and some delay, is necessary.*
+*需要一些缓冲,需要一些时间,但这都是值得的*
-(2) If you do this wrong, you could end up waiting forever. First the sorter
saw an event from time
-4, and then an event from time 2. Will an event with a timestamp less than 2
ever arrive? Maybe.
-Maybe not. You could wait forever and never see a 1.
+(2) 接下来的这一步,如果我们选择的是固执的等待,我们永远不会有结果。首先,我们从时间4看到了一个事件,然后从时
+间2看到了一个事件。可是,时间戳小于2的事件接下来会不会到来呢?可能会,也可能不会。再次站在上帝视角,我们知道,我
+们永远不会看到1。
-*Eventually you have to be courageous and emit the 2 as the start of the
sorted stream.*
+*最终,我们必须勇于承担责任,并发出指令,把2作为已排序的事件流的开始*
-(3) What you need then is some sort of policy that defines when, for any given
timestamped event, to
-stop waiting for the arrival of earlier events.
+(3)然后,我们需要一种策略,该策略定义:对于任何给定时间戳的事件,Flink何时停止等待较早事件的到来。
-*This is precisely what watermarks do* — they define when to stop waiting for
earlier events.
+*这正是 watermarks 的作用* — 它们定义何时停止等待较早的事件。
-Event time processing in Flink depends on *watermark generators* that insert
special timestamped
-elements into the stream, called *watermarks*. A watermark for time _t_ is an
assertion that the
-stream is (probably) now complete up through time _t_.
+Flink 中事件时间的处理取决于 *watermark 生成器*,后者将带有时间戳的特殊元素插入流中形成 *watermarks*。事件
+时间 _t_ 的 watermark 代表 _t_ 之后(很可能)不会有新的元素到达。
-When should this stream sorter stop waiting, and push out the 2 to start the
sorted stream? When a
-watermark arrives with a timestamp of 2, or greater.
+事件流的排序器应何时停止等待,并推出2以启动已分类的流?当 watermark 以2或更大的时间戳到达时!
Review comment:
```suggestion
事件流的排序器应何时停止等待,并推出2以启动已分类的流?当 watermark 以2或更大的时间戳到达时!
```
当 watermark 以 2 或更大的时间戳到达时,事件流的排序器应何时停止等待,并输出 2 作为已经排序好的流。
##########
File path: docs/training/streaming_analytics.zh.md
##########
@@ -397,36 +369,34 @@ stream.
.process(...);
{% endhighlight %}
-When the allowed lateness is greater than zero, only those events that are so
late that they would
-be dropped are sent to the side output (if it has been configured).
+当允许的延迟大于零时,只有那些超过最大无序边界以至于会被丢弃的事件才会被发送到侧输出流(如果已配置)。
-### Surprises
+<a name="Surprises"></a>
+### 什么是惊喜
Review comment:
`莫名其妙吗? 不 是有原因的`
这个单词是全篇最有意思的一句话,他其实并不是代表惊喜,文档之内更多的收获,其实更像答疑解惑。所以我开始在翻译的时候,期待被review出来,我们可以聊一下的。
我感觉以下会好很多:
- 答疑解惑
- 送分题
- 莫名其妙?
- 惊喜
##########
File path: docs/training/streaming_analytics.zh.md
##########
@@ -397,36 +369,34 @@ stream.
.process(...);
{% endhighlight %}
-When the allowed lateness is greater than zero, only those events that are so
late that they would
-be dropped are sent to the side output (if it has been configured).
+当允许的延迟大于零时,只有那些超过最大无序边界以至于会被丢弃的事件才会被发送到侧输出流(如果已配置)。
-### Surprises
+<a name="Surprises"></a>
+### 什么是惊喜
Review comment:
`莫名其妙吗? 不 是有原因的`
这个单词是全篇最有意思的一句话,他其实并不是代表惊喜,文档之内更多的收获,其实更像答疑解惑。所以我开始在翻译的时候,期待被review出来,我们可以聊一下的。
我感觉以下会好很多:
- 答疑解惑
- 送分题
- 莫名其妙?
- 惊喜
但总是有点词不达意的感觉。
当然, @klion26 你感觉怎么翻译能比较好一点?
##########
File path: docs/training/streaming_analytics.zh.md
##########
@@ -397,36 +369,34 @@ stream.
.process(...);
{% endhighlight %}
-When the allowed lateness is greater than zero, only those events that are so
late that they would
-be dropped are sent to the side output (if it has been configured).
+当允许的延迟大于零时,只有那些超过最大无序边界以至于会被丢弃的事件才会被发送到侧输出流(如果已配置)。
-### Surprises
+<a name="Surprises"></a>
+### 什么是惊喜
Review comment:
`莫名其妙吗? 不 是有原因的` 以往甚至给出过这个翻译,但是自己都笑了,然后改掉了。
这个单词是全篇最有意思的一句话,他其实并不是代表惊喜,文档之内更多的收获,其实更像答疑解惑。所以我开始在翻译的时候,期待被review出来,我们可以聊一下的。
我感觉以下会好很多:
- 答疑解惑
- 送分题
- 莫名其妙?
- 惊喜
但总是有点词不达意的感觉。
当然, @klion26 你感觉怎么翻译能比较好一点?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]