LakeShen commented on issue #9097: [FLINK-11529][docs-zh] Translate the "DataStream API Tutorial" page into Chinese URL: https://github.com/apache/flink/pull/9097#issuecomment-511271265 Hi, thanks to your review,I will do that now------------------ 原始邮件 ------------------ 发件人: "Congxian Qiu"<[email protected]> 发送时间: 2019年7月14日(星期天) 下午5:32 收件人: "apache/flink"<[email protected]>; 抄送: "lakeshen"<[email protected]>;"Mention"<[email protected]>; 主题: Re: [apache/flink] [FLINK-11529][docs-zh] Translate the "DataStreamAPI Tutorial" page into Chinese (#9097) @klion26 commented on this pull request. @LakeShen thanks for your contribution, I passed the first-term review and left some comments. you can preview the translation locally by executing sh docs/build_docs.sh -p in Flink project, and open http://localhost:4000 in your browser. In docs/getting-started/tutorials/datastream_api.zh.md: > @@ -26,19 +26,14 @@ under the License. * This will be replaced by the TOC {:toc} -In this guide we will start from scratch and go from setting up a Flink project to running -a streaming analysis program on a Flink cluster. +在本节指南中,我们将从零开始创建一个在 flink 集群上面进行流分析的 Flink 项目。 ️ Suggested change -在本节指南中,我们将从零开始创建一个在 flink 集群上面进行流分析的 Flink 项目。 +在本节指南中,我们将在 Flink 集群上从零开始创建一个流分析项目。 In docs/getting-started/tutorials/datastream_api.zh.md: > -Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to -read this channel in Flink and count the number of bytes that each user edits within -a given window of time. This is easy enough to implement in a few minutes using Flink, but it will -give you a good foundation from which to start building more complex analysis programs on your own. +维基百科提供了一个能够记录所有对 wiki 编辑的 IRC 通道。我们将使用 Flink 读取该通道的数据,同时 ️ Suggested change -维基百科提供了一个能够记录所有对 wiki 编辑的 IRC 通道。我们将使用 Flink 读取该通道的数据,同时 +维基百科提供了一个记录所有 wiki 编辑历史的 IRC 通道。我们将使用 Flink 读取该通道的数据,同时 In docs/getting-started/tutorials/datastream_api.zh.md: > -Wikipedia provides an IRC channel where all edits to the wiki are logged. We are going to -read this channel in Flink and count the number of bytes that each user edits within -a given window of time. This is easy enough to implement in a few minutes using Flink, but it will -give you a good foundation from which to start building more complex analysis programs on your own. +维基百科提供了一个能够记录所有对 wiki 编辑的 IRC 通道。我们将使用 Flink 读取该通道的数据,同时 +在给定的时间窗口,计算出每个用户在其中编辑的字节数。这使用 Flink 很容易就能实现,但它会为你提供一个良好的基础去开始构建你自己更为复杂的分析程序。 计算出每个用户在给定时间窗口内的编辑字节数? In docs/getting-started/tutorials/datastream_api.zh.md: > -We are going to use a Flink Maven Archetype for creating our project structure. Please -see [Java API Quickstart]({{ site.baseurl }}/dev/projectsetup/java_api_quickstart.html) for more details -about this. For our purposes, the command to run is this: +我们准备使用 Flink Maven Archetype 创建项目结构。更多细节请查看[Java API 快速指南]({{ site.baseurl }}/zh/dev/projectsetup/java_api_quickstart.html)。项目运行命令如下: ️ Suggested change -我们准备使用 Flink Maven Archetype 创建项目结构。更多细节请查看[Java API 快速指南]({{ site.baseurl }}/zh/dev/projectsetup/java_api_quickstart.html)。项目运行命令如下: +我们准备使用 Flink Maven Archetype 创建项目结构。更多细节请查看 [Java API 快速指南]({{ site.baseurl }}/zh/dev/projectsetup/java_api_quickstart.html)。项目运行命令如下: do we need to translate Maven Archetype here? In docs/getting-started/tutorials/datastream_api.zh.md: > @@ -59,8 +54,7 @@ $ mvn archetype:generate \ </p> I think we need to translate the Note also In docs/getting-started/tutorials/datastream_api.zh.md: > -<a href="{{ site.baseurl }}/page/img/quickstart-example/jobmanager-job.png" ><img class="img-responsive" src="{{ site.baseurl }}/page/img/quickstart-example/jobmanager-job.png" alt="Example Job View"/></a> +<a href="{{ site.baseurl }}/zh/page/img/quickstart-example/jobmanager-job.png" ><img class="img-responsive" src="{{ site.baseurl }}/zh/page/img/quickstart-example/jobmanager-job.png" alt="样例作业视图"/></a> maybe we do not change the url of image? In docs/getting-started/tutorials/datastream_api.zh.md: > @@ -59,8 +54,7 @@ $ mvn archetype:generate \ </p> {% endunless %} -You can edit the `groupId`, `artifactId` and `package` if you like. With the above parameters, -Maven will create a project structure that looks like this: +你可以根据自己需求编辑 `groupId`、`artifactId` 以及 `package`。对于上面的参数,Maven 将会创建一个这样的项目结构: "你可以按需修改 groupId、artifactId 以及 package"? 对于上面的参数,Maven 将会创建一个这样的项目结构 seems a little odd to me, do you think we can make it better? In docs/getting-started/tutorials/datastream_api.zh.md: > @@ -76,16 +70,13 @@ wiki-edits/ └── log4j.properties {% endhighlight %} -There is our `pom.xml` file that already has the Flink dependencies added in the root directory and -several example Flink programs in `src/main/java`. We can delete the example programs, since -we are going to start from scratch: +项目根目录下的 `pom.xml` 文件已经将 Flink 依赖添加进来,同时在 `src/main/java` 目录下也有几个 Flink 程序实例。由于我们从头开始创建,我们可以删除程序实例: "Flink 依赖已经添加到根目录下的 pom.xml 文件中"? Flink 程序实例 -> Flink 实例程序? 由于我们将从头开始创建,因此可以删除这些实例程序? In docs/getting-started/tutorials/datastream_api.zh.md: > {% highlight bash %} $ rm wiki-edits/src/main/java/wikiedits/*.java {% endhighlight %} -As a last step we need to add the Flink Wikipedia connector as a dependency so that we can -use it in our program. Edit the `dependencies` section of the `pom.xml` so that it looks like this: +作为最后一步,我们需要添加 Flink 维基百科连接器作为依赖项,这样就可以在我们的项目中进行使用。编辑 `pom.xml` 的 `dependencies` 部分,使它看起来像这样: ️ Suggested change -作为最后一步,我们需要添加 Flink 维基百科连接器作为依赖项,这样就可以在我们的项目中进行使用。编辑 `pom.xml` 的 `dependencies` 部分,使它看起来像这样: +作为最后一步,我们需要添加 Flink 维基百科连接器的依赖,从而可以在项目中进行使用。修改 `pom.xml` 的 `dependencies` 部分,使它看起来像这样: In docs/getting-started/tutorials/datastream_api.zh.md: > -It's coding time. Fire up your favorite IDE and import the Maven project or open a text editor and -create the file `src/main/java/wikiedits/WikipediaAnalysis.java`: +现在是编程时间。启动你最喜欢的 IDE 并导入 Maven 项目或打开文本编辑器创建文件 `src/main/java/wikiedits/WikipediaAnalysis.java`: ️ Suggested change -现在是编程时间。启动你最喜欢的 IDE 并导入 Maven 项目或打开文本编辑器创建文件 `src/main/java/wikiedits/WikipediaAnalysis.java`: +现在是编程时间。启动你最喜欢的 IDE 并导入 Maven 项目或打开文本编辑器,然后创建文件 `src/main/java/wikiedits/WikipediaAnalysis.java`: In docs/getting-started/tutorials/datastream_api.zh.md: > -This concludes our little tour of Flink. If you have any questions, please don't hesitate to ask on our [Mailing Lists](http://flink.apache.org/community.html#mailing-lists). +这就结束了 Flink 项目构建之旅. 如果你有任何问题, 你可以在我们的 [邮件组](http://flink.apache.org/community.html#mailing-lists)提出. ️ Suggested change -这就结束了 Flink 项目构建之旅. 如果你有任何问题, 你可以在我们的 [邮件组](http://flink.apache.org/community.html#mailing-lists)提出. +这就结束了 Flink 项目构建之旅. 如果你有任何问题, 可以在我们的[邮件组](http://flink.apache.org/community.html#mailing-lists)提出. In docs/getting-started/tutorials/datastream_api.zh.md: > @@ -131,32 +120,24 @@ public class WikipediaAnalysis { } {% endhighlight %} -The program is very basic now, but we will fill it in as we go. Note that I'll not give -import statements here since IDEs can add them automatically. At the end of this section I'll show -the complete code with import statements if you simply want to skip ahead and enter that in your -editor. +这个程序现在很基础,但我们会边做边进行补充。注意我不会给出导入语句,因为 IDE 会自动添加它们。在本节的最后,我将展示带有导入语句的完整代码 边做边完善? In docs/getting-started/tutorials/datastream_api.zh.md: > @@ -131,32 +120,24 @@ public class WikipediaAnalysis { } {% endhighlight %} -The program is very basic now, but we will fill it in as we go. Note that I'll not give -import statements here since IDEs can add them automatically. At the end of this section I'll show -the complete code with import statements if you simply want to skip ahead and enter that in your -editor. +这个程序现在很基础,但我们会边做边进行补充。注意我不会给出导入语句,因为 IDE 会自动添加它们。在本节的最后,我将展示带有导入语句的完整代码 +如果您只是想跳过并在您的编辑器中编辑他们。 ,如果需要你可以将他们复制到你的编辑器中? In docs/getting-started/tutorials/datastream_api.zh.md: > -The first step in a Flink program is to create a `StreamExecutionEnvironment` -(or `ExecutionEnvironment` if you are writing a batch job). This can be used to set execution -parameters and create sources for reading from external systems. So let's go ahead and add -this to the main method: +在一个 Flink 程序中,首先你需要创建一个 `StreamExecutionEnvironment` (或者处理批作业环境的 `ExecutionEnvironment`)。这可以用来设置程序运行参数,同时也能够创建从外部系统读取的源。我们把这个添加到 main 方法中: 这可以用来设置程序运行参数、创建从外部系统读取的源? In docs/getting-started/tutorials/datastream_api.zh.md: > {% highlight java %} StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment(); {% endhighlight %} -Next we will create a source that reads from the Wikipedia IRC log: +接下来我们将创建一个读取维基百科 IRC 数据源: ️ Suggested change -接下来我们将创建一个读取维基百科 IRC 数据源: +接下来我们将创建一个读取维基百科 IRC 数据的源: In docs/getting-started/tutorials/datastream_api.zh.md: > {% highlight java %} DataStream<WikipediaEditEvent> edits = see.addSource(new WikipediaEditsSource()); {% endhighlight %} -This creates a `DataStream` of `WikipediaEditEvent` elements that we can further process. For -the purposes of this example we are interested in determining the number of added or removed -bytes that each user causes in a certain time window, let's say five seconds. For this we first -have to specify that we want to key the stream on the user name, that is to say that operations -on this stream should take the user name into account. In our case the summation of edited bytes in the windows -should be per unique user. For keying a Stream we have to provide a `KeySelector`, like this: +上面代码创建了一个 `WikipediaEditEvent` 事件的`DataStream`,我们可以进一步处理它。这个代码实例的目的是为了确定每个用户在特定时间窗口中添加或删除的字节数,比如5秒一个时间窗口。首先 ️ Suggested change -上面代码创建了一个 `WikipediaEditEvent` 事件的`DataStream`,我们可以进一步处理它。这个代码实例的目的是为了确定每个用户在特定时间窗口中添加或删除的字节数,比如5秒一个时间窗口。首先 +上面代码创建了一个 `WikipediaEditEvent` 事件的 `DataStream`,我们可以进一步处理它。这个代码实例的目的是为了确定每个用户在特定时间窗口中添加或删除的字节数,比如 5 秒一个时间窗口。首先 In docs/getting-started/tutorials/datastream_api.zh.md: > {% highlight java %} DataStream<WikipediaEditEvent> edits = see.addSource(new WikipediaEditsSource()); {% endhighlight %} -This creates a `DataStream` of `WikipediaEditEvent` elements that we can further process. For -the purposes of this example we are interested in determining the number of added or removed -bytes that each user causes in a certain time window, let's say five seconds. For this we first -have to specify that we want to key the stream on the user name, that is to say that operations -on this stream should take the user name into account. In our case the summation of edited bytes in the windows -should be per unique user. For keying a Stream we have to provide a `KeySelector`, like this: +上面代码创建了一个 `WikipediaEditEvent` 事件的`DataStream`,我们可以进一步处理它。这个代码实例的目的是为了确定每个用户在特定时间窗口中添加或删除的字节数,比如5秒一个时间窗口。首先 +我们必须指定用户名来划分我们的数据流,也就是说这个流上的操作应该考虑用户名。 根据用户名来划分? In docs/getting-started/tutorials/datastream_api.zh.md: > {% highlight java %} DataStream<WikipediaEditEvent> edits = see.addSource(new WikipediaEditsSource()); {% endhighlight %} -This creates a `DataStream` of `WikipediaEditEvent` elements that we can further process. For -the purposes of this example we are interested in determining the number of added or removed -bytes that each user causes in a certain time window, let's say five seconds. For this we first -have to specify that we want to key the stream on the user name, that is to say that operations -on this stream should take the user name into account. In our case the summation of edited bytes in the windows -should be per unique user. For keying a Stream we have to provide a `KeySelector`, like this: +上面代码创建了一个 `WikipediaEditEvent` 事件的`DataStream`,我们可以进一步处理它。这个代码实例的目的是为了确定每个用户在特定时间窗口中添加或删除的字节数,比如5秒一个时间窗口。首先 +我们必须指定用户名来划分我们的数据流,也就是说这个流上的操作应该考虑用户名。 +在我们这个统计窗口编辑的字节数的例子中,每个用户应该唯一的。对于划分一个数据流,我们必须提供一个 `KeySelector`,像这样: I think here does not mean "每个用户应该是唯一的", It means "每个不同的用户每个窗口都应该计算一个结果" In docs/getting-started/tutorials/datastream_api.zh.md: > @@ -203,26 +180,20 @@ DataStream<Tuple2<String, Long>> result = keyedEdits }); {% endhighlight %} -The first call, `.timeWindow()`, specifies that we want to have tumbling (non-overlapping) windows -of five seconds. The second call specifies a *Aggregate transformation* on each window slice for -each unique key. In our case we start from an initial value of `("", 0L)` and add to it the byte -difference of every edit in that time window for a user. The resulting Stream now contains -a `Tuple2<String, Long>` for every user which gets emitted every five seconds. +首先调用 `.timeWindow()` 方法指定五秒翻滚(非重叠)窗口。第二个调用方法对于每一个唯一关键字指定每个窗口片`聚合转换`。 +在本例中,我们从`("",0L)`初始值开始,并将每个用户编辑的字节添加到该时间窗口中。对于每个用户来说,结果流现在包含的元素为 `Tuple2<String, Long>`,它每5秒发出一次。 ️ Suggested change -在本例中,我们从`("",0L)`初始值开始,并将每个用户编辑的字节添加到该时间窗口中。对于每个用户来说,结果流现在包含的元素为 `Tuple2<String, Long>`,它每5秒发出一次。 +在本例中,我们从 `("",0L)` 初始值开始,并将每个用户编辑的字节添加到该时间窗口中。对于每个用户来说,结果流现在包含的元素为 `Tuple2<String, Long>`,它每5秒发出一次。 In docs/getting-started/tutorials/datastream_api.zh.md: > -The only thing left to do is print the stream to the console and start execution: +唯一剩下要做的就是将打印流输出到控制台并开始执行: ️ Suggested change -唯一剩下要做的就是将打印流输出到控制台并开始执行: +唯一剩下的就是将结果输出到控制台并开始执行: In docs/getting-started/tutorials/datastream_api.zh.md: > -This should get you started with writing your own Flink programs. To learn more -you can check out our guides -about [basic concepts]({{ site.baseurl }}/dev/api_concepts.html) and the -[DataStream API]({{ site.baseurl }}/dev/datastream_api.html). Stick -around for the bonus exercise if you want to learn about setting up a Flink cluster on -your own machine and writing results to [Kafka](http://kafka.apache.org). +这可以让你开始创建你自己的 Flink 项目。你可以查看[基本概念]({{ site.baseurl }}/zh/dev/api_concepts.html)和[DataStream API] +({{ site.baseurl }}/zh/dev/datastream_api.html)指南。如果你想学习了解更多关于 Flink 集群安装以及写入数据到 [Kafka](http://kafka.apache.org), [DataStream API] and ({{ site. baseurl }}..... have to be on the same line. In docs/getting-started/tutorials/datastream_api.zh.md: > -Please follow our [local setup tutorial](local_setup.html) for setting up a Flink distribution -on your machine and refer to the [Kafka quickstart](https://kafka.apache.org/0110/documentation.html#quickstart) -for setting up a Kafka installation before we proceed. +请按照我们的[本地安装教程](local_setup.html)在你的机器上构建一个Flink分布式环境,同时参考[Kafka快速指南](https://kafka.apache.org/0110/documentation.html#quickstart)安装一个我们需要使用的Kafka环境。 ️ Suggested change -请按照我们的[本地安装教程](local_setup.html)在你的机器上构建一个Flink分布式环境,同时参考[Kafka快速指南](https://kafka.apache.org/0110/documentation.html#quickstart)安装一个我们需要使用的Kafka环境。 +请按照我们的[本地安装教程](local_setup.html)在你的机器上构建一个Flink分布式环境,同时参考 [Kafka快速指南](https://kafka.apache.org/0110/documentation.html#quickstart)安装一个我们需要使用的Kafka环境。 In docs/getting-started/tutorials/datastream_api.zh.md: > {% highlight bash %} bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic wiki-result {% endhighlight %} -You can also check out the Flink dashboard which should be running at [http://localhost:8081](http://localhost:8081). -You get an overview of your cluster resources and running jobs: +你还可以查看运行在[http://localhost:8081](http://localhost:8081)上的 Flink 作业仪表盘。你可以概览集群资源以及正在运行的作业: ️ Suggested change -你还可以查看运行在[http://localhost:8081](http://localhost:8081)上的 Flink 作业仪表盘。你可以概览集群资源以及正在运行的作业: +你还可以查看运行在 [http://localhost:8081](http://localhost:8081) 上的 Flink 作业仪表盘。你可以概览集群资源以及正在运行的作业: In docs/getting-started/tutorials/datastream_api.zh.md: > @@ -168,12 +149,8 @@ KeyedStream<WikipediaEditEvent, String> keyedEdits = edits }); {% endhighlight %} -This gives us a Stream of `WikipediaEditEvent` that has a `String` key, the user name. -We can now specify that we want to have windows imposed on this stream and compute a -result based on elements in these windows. A window specifies a slice of a Stream -on which to perform a computation. Windows are required when computing aggregations -on an infinite stream of elements. In our example we will say -that we want to aggregate the sum of edited bytes for every five seconds: +这给了我们一个 `WikipediaEditEvent` 数据流,它有一个 `String` 键,即用户名。 maybe we can have a better translation for this paragraph. In docs/getting-started/tutorials/datastream_api.zh.md: > -This should get you started with writing your own Flink programs. To learn more -you can check out our guides -about [basic concepts]({{ site.baseurl }}/dev/api_concepts.html) and the -[DataStream API]({{ site.baseurl }}/dev/datastream_api.html). Stick -around for the bonus exercise if you want to learn about setting up a Flink cluster on -your own machine and writing results to [Kafka](http://kafka.apache.org). +这可以让你开始创建你自己的 Flink 项目。你可以查看[基本概念]({{ site.baseurl }}/zh/dev/api_concepts.html)和[DataStream API] +({{ site.baseurl }}/zh/dev/datastream_api.html)指南。如果你想学习了解更多关于 Flink 集群安装以及写入数据到 [Kafka](http://kafka.apache.org), +你可以自己多加以练习尝试。 where is the source of this translation? In docs/getting-started/tutorials/datastream_api.zh.md: > @@ -309,24 +279,17 @@ similar to this: 4> (KasparBot,-245) {% endhighlight %} -The number in front of each line tells you on which parallel instance of the print sink the output -was produced. +每行数据前面的数字代表着打印接收器在哪个并行实例上产生的输出数据。 每行数据前面的数字代表着打印接收器运行的并实例? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
