[GitHub] [flink] twentyworld commented on a change in pull request #12237: [FLINK-17290] [chinese-translation, Documentation / Training] Transla…

GitBox Thu, 21 May 2020 05:43:20 -0700


twentyworld commented on a change in pull request #12237:
URL: https://github.com/apache/flink/pull/12237#discussion_r428627111




##########
File path: docs/training/streaming_analytics.zh.md
##########
@@ -27,125 +27,101 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-## Event Time and Watermarks
+## 事件时间和水印
 
-### Introduction
+### 简介
 
-Flink explicitly supports three different notions of time:
+Flink 明确的支持以下三种事件时间:
 
-* _event time:_ the time when an event occurred, as recorded by the device 
producing (or storing) the event
+* _事件时间:_ 事件产生的时间，记录的是设备生产(或者存储)事件的时间
 
-* _ingestion time:_ a timestamp recorded by Flink at the moment it ingests the 
event
+* _摄取时间:_ Flink 提取事件时记录的时间戳
 
-* _processing time:_ the time when a specific operator in your pipeline is 
processing the event
+* _处理时间:_ Flink 中通过特定的操作处理事件的时间
 
-For reproducible results, e.g., when computing the maximum price a stock 
reached during the first
-hour of trading on a given day, you should use event time. In this way the 
result won't depend on
-when the calculation is performed. This kind of real-time application is 
sometimes performed using
-processing time, but then the results are determined by the events that happen 
to be processed
-during that hour, rather than the events that occurred then. Computing 
analytics based on processing
-time causes inconsistencies, and makes it difficult to re-analyze historic 
data or test new
-implementations.
+为了获得可重现的结果，例如在计算过去的特定一天里第一个小时股票的最高价格时，我们应该使用事件时间。这样的话，无论
+什么时间去计算都不会影响输出结果。然而有些人，在实时计算应用时使用处理时间，这样的话，输出结果就会被处理时间点所决
+定，而不是事件的生成时间。基于处理时间会导致多次计算的结果不一致，也可能会导致重新分析历史数据和测试变得异常困难。
 
-### Working with Event Time
+### 使用事件时间
 
-By default, Flink will use processing time. To change this, you can set the 
Time Characteristic:
+Flink 在默认情况下使用处理时间。也可以通过如下配置来告诉 Flink 选择哪种事件时间:
 
 {% highlight java %}
 final StreamExecutionEnvironment env =
     StreamExecutionEnvironment.getExecutionEnvironment();
 env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
 {% endhighlight %}
 
-If you want to use event time, you will also need to supply a Timestamp 
Extractor and Watermark
-Generator that Flink will use to track the progress of event time. This will 
be covered in the
-section below on [Working with Watermarks]({% link
-training/streaming_analytics.zh.md %}#working-with-watermarks), but first we 
should explain what
-watermarks are.
+如果想要使用事件时间，则需要额外给 Flink 提供一个时间戳的提取器和水印，Flink 将使用它们来跟踪事件时间的进度。这
+将在选节[使用水印]({% linktutorials/streaming_analytics.md %}#使用水印)中介绍，但是首先我们需要解释一下
+水印是什么。
 
-### Watermarks
+### 水印
 
-Let's work through a simple example that will show why watermarks are needed, 
and how they work.
+让我们通过一个简单的示例来演示，该示例将说明为什么需要水印及其工作方式。
 
-In this example you have a stream of timestamped events that arrive somewhat 
out of order, as shown
-below. The numbers shown are timestamps that indicate when these events 
actually occurred. The first
-event to arrive happened at time 4, and it is followed by an event that 
happened earlier, at time 2,
-and so on:
+在此示例中，我们将看到带有混乱时间戳的事件流，如下所示。显示的数字表达的是这些事件实际发生时间的时间戳。到达的
+第一个事件发生在时间4，随后发生的事件发生在更早的时间2，依此类推：
 
 <div class="text-center" style="font-size: x-large; word-spacing: 0.5em; 
margin: 1em 0em;">
 ··· 23 19 22 24 21 14 17 13 12 15 9 11 7 2 4 →
 </div>
 
-Now imagine that you are trying create a stream sorter. This is meant to be an 
application that
-processes each event from a stream as it arrives, and emits a new stream 
containing the same events,
-but ordered by their timestamps.
+假设我们要对数据流排序，我们想要达到的目的是：应用程序应该在数据流里的事件到达时就处理每个事件，并发出包含相同
+事件但按其时间戳排序的新流。
 
-Some observations:
+让我们重新审视这些数据:
 
-(1) The first element your stream sorter sees is the 4, but you can't just 
immediately release it as
-the first element of the sorted stream. It may have arrived out of order, and 
an earlier event might
-yet arrive. In fact, you have the benefit of some god-like knowledge of this 
stream's future, and
-you can see that your stream sorter should wait at least until the 2 arrives 
before producing any
-results.
+(1) 我们的排序器第一个看到的数据是4，但是我们不能立即将其作为已排序流的第一个元素释放。因为我们并不能确定它是
+有序的，并且较早的事件有可能并未到达。事实上，如果站在上帝视角，我们知道，必须要等到2到来时，排序器才可以有事件输出。
 
-*Some buffering, and some delay, is necessary.*
+*需要一些缓冲，需要一些时间，但这都是值得的*
 
-(2) If you do this wrong, you could end up waiting forever. First the sorter 
saw an event from time
-4, and then an event from time 2. Will an event with a timestamp less than 2 
ever arrive? Maybe.
-Maybe not. You could wait forever and never see a 1.
+(2) 接下来的这一步，如果我们选择的是固执的等待，我们永远不会有结果。首先，我们从时间4看到了一个事件，然后从时
+间2看到了一个事件。可是，时间戳小于2的事件接下来会不会到来呢？可能会，也可能不会。再次站在上帝视角，我们知道，我
+们永远不会看到1。
 
-*Eventually you have to be courageous and emit the 2 as the start of the 
sorted stream.*
+*最终，我们必须勇于承担责任，并发出指令，把2作为已排序的事件流的开始*
 
-(3) What you need then is some sort of policy that defines when, for any given 
timestamped event, to
-stop waiting for the arrival of earlier events.
+(3)然后，我们需要一种策略，该策略定义：对于任何给定时间戳的事件，Flink何时停止等待较早事件的到来。
 
-*This is precisely what watermarks do* — they define when to stop waiting for 
earlier events.
+*这正是水印的作用* — 它们定义何时停止等待较早的事件。
 
-Event time processing in Flink depends on *watermark generators* that insert 
special timestamped
-elements into the stream, called *watermarks*. A watermark for time _t_ is an 
assertion that the
-stream is (probably) now complete up through time _t_.
+Flink中事件时间的处理取决于 *水印生成器*，后者将带有时间戳的特殊元素插入流中，称为 *水印*。时间 _t_ 的水印是

Review comment:
       谢谢，我也挺想和大家探讨一下这个问题
   `Watermarks` 
如果有一些场景需要被翻译的话（当然，已经被建议最好不要翻译，以免误读），我知道的是在一些其他框架内，比如`Kafka`，大家会把他叫做水位线，这个形容也更形象，但是在`Flink`(我接触的不算很多，只是在工作和阅读源码)，结合我看到的有一些深度的文章，我个人感觉水印更符合`Flink`赋予的概念：
   1. 每一个事件都被打了一个戳
   2. Flink处理事件流会根据时间戳来处理
   如果有不对的地方或者不全面(或者有一些更好的文档，代码)，希望大家指正。




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] twentyworld commented on a change in pull request #12237: [FLINK-17290] [chinese-translation, Documentation / Training] Transla…

Reply via email to