ChengkaiYang2022 commented on code in PR #20510:
URL: https://github.com/apache/flink/pull/20510#discussion_r946579423
##########
docs/content.zh/docs/try-flink/table_api.md:
##########
@@ -109,25 +107,25 @@ report(transactions).executeInsert("spend_report");
```
-## Breaking Down The Code
+## 代码分析
-#### The Execution Environment
+#### 执行环境
-The first two lines set up your `TableEnvironment`.
-The table environment is how you can set properties for your Job, specify
whether you are writing a batch or a streaming application, and create your
sources.
-This walkthrough creates a standard table environment that uses the streaming
execution.
+前两行创建的是 `TableEnvironment`(表环境)。
Review Comment:
前两行代码创建了 `TableEnvironment`(表环境)。
##########
docs/content.zh/docs/try-flink/table_api.md:
##########
@@ -163,44 +161,41 @@ tEnv.executeSql("CREATE TABLE spend_report (\n" +
")");
```
-The second table, `spend_report`, stores the final results of the aggregation.
-Its underlying storage is a table in a MySql database.
+第二张 `spend_report` 表存储聚合后的最终结果,底层存储是 MySQL 数据库中的一张表。
-#### The Query
+#### 查询数据
-With the environment configured and tables registered, you are ready to build
your first application.
-From the `TableEnvironment` you can read `from` an input table to read its
rows and then write those results into an output table using `executeInsert`.
-The `report` function is where you will implement your business logic.
-It is currently unimplemented.
+配置好环境并注册好表后,你就可以开始开发你的第一个应用了。
+通过 `TableEnvironment` ,你可以 `from` 输入表读取数据,然后将结果调用 `executeInsert` 写入到输出表。
+函数 `report` 用于实现具体的业务逻辑,这里暂时未实现。
```java
Table transactions = tEnv.from("transactions");
report(transactions).executeInsert("spend_report");
```
-## Testing
+## 测试
-The project contains a secondary testing class `SpendReportTest` that
validates the logic of the report.
-It creates a table environment in batch mode.
+项目还包含一个测试类 `SpendReportTest`,辅助验证报表逻辑。
+该测试类的表环境使用的是批处理模式。
```java
EnvironmentSettings settings = EnvironmentSettings.inBatchMode();
TableEnvironment tEnv = TableEnvironment.create(settings);
```
-One of Flink's unique properties is that it provides consistent semantics
across batch and streaming.
-This means you can develop and test applications in batch mode on static
datasets, and deploy to production as streaming applications.
+提供批流统一的语义是 Flink 的特性,这意味着应用的开发和测试可以在批模式下使用静态数据集完成,而实际部署到生产时再切换为流式。
Review Comment:
提供批流统一的语义是 Flink 的(?关键特性?)(It could be better if we translate the word
'unique' ),这意味着可以在批模式下使用静态数据集完成应用的开发和测试,部署到生产环境时使用流模式。
##########
docs/content.zh/docs/try-flink/table_api.md:
##########
@@ -216,10 +211,10 @@ public static Table report(Table transactions) {
}
```
-## User Defined Functions
+## 用户定义函数
Review Comment:
用户自定义函数
##########
docs/content.zh/docs/try-flink/table_api.md:
##########
@@ -163,44 +161,41 @@ tEnv.executeSql("CREATE TABLE spend_report (\n" +
")");
```
-The second table, `spend_report`, stores the final results of the aggregation.
-Its underlying storage is a table in a MySql database.
+第二张 `spend_report` 表存储聚合后的最终结果,底层存储是 MySQL 数据库中的一张表。
-#### The Query
+#### 查询数据
-With the environment configured and tables registered, you are ready to build
your first application.
-From the `TableEnvironment` you can read `from` an input table to read its
rows and then write those results into an output table using `executeInsert`.
-The `report` function is where you will implement your business logic.
-It is currently unimplemented.
+配置好环境并注册好表后,你就可以开始开发你的第一个应用了。
+通过 `TableEnvironment` ,你可以 `from` 输入表读取数据,然后将结果调用 `executeInsert` 写入到输出表。
+函数 `report` 用于实现具体的业务逻辑,这里暂时未实现。
```java
Table transactions = tEnv.from("transactions");
report(transactions).executeInsert("spend_report");
```
-## Testing
+## 测试
-The project contains a secondary testing class `SpendReportTest` that
validates the logic of the report.
-It creates a table environment in batch mode.
+项目还包含一个测试类 `SpendReportTest`,辅助验证报表逻辑。
+该测试类的表环境使用的是批处理模式。
```java
EnvironmentSettings settings = EnvironmentSettings.inBatchMode();
TableEnvironment tEnv = TableEnvironment.create(settings);
```
-One of Flink's unique properties is that it provides consistent semantics
across batch and streaming.
-This means you can develop and test applications in batch mode on static
datasets, and deploy to production as streaming applications.
+提供批流统一的语义是 Flink 的特性,这意味着应用的开发和测试可以在批模式下使用静态数据集完成,而实际部署到生产时再切换为流式。
-## Attempt One
+## 尝试下
-Now with the skeleton of a Job set-up, you are ready to add some business
logic.
-The goal is to build a report that shows the total spend for each account
across each hour of the day.
-This means the timestamp column needs be be rounded down from millisecond to
hour granularity.
+在作业拉起来的大体处理框架下,你可以再添加一些业务逻辑。
+现在的目标是创建一个报表,报表按照账户显示一天中每个小时的总支出。因此,毫秒粒度的时间戳字段需要向下舍入到小时。
-Flink supports developing relational applications in pure [SQL]({{< ref
"docs/dev/table/sql/overview" >}}) or using the [Table API]({{< ref
"docs/dev/table/tableApi" >}}).
-The Table API is a fluent DSL inspired by SQL, that can be written in Python,
Java, or Scala and supports strong IDE integration.
-Just like a SQL query, Table programs can select the required fields and group
by your keys.
-These features, along with [built-in functions]({{< ref
"docs/dev/table/functions/systemFunctions" >}}) like `floor` and `sum`, you can
write this report.
+Flink 支持纯 [SQL]({{< ref "docs/dev/table/sql/overview" >}}) 或者 [Table API]({{<
ref "docs/dev/table/tableApi" >}}) 开发关系型数据应用。
Review Comment:
Flink 支持**使用**纯 [SQL]({{< ref "docs/dev/table/sql/overview" >}}) 或者 [Table
API]({{< ref "docs/dev/table/tableApi" >}}) 开发关系型数据应用。
##########
docs/content.zh/docs/try-flink/table_api.md:
##########
@@ -275,38 +270,38 @@ public static Table report(Table transactions) {
}
```
-This defines your application as using one hour tumbling windows based on the
timestamp column.
-So a row with timestamp `2019-06-01 01:23:47` is put in the `2019-06-01
01:00:00` window.
+上面的代码含义为:应用使用滚动窗口,窗口按照指定的时间戳字段划分,区间为一小时。
Review Comment:
上面的代码含义为:使用滚动窗口,窗口按照指定的时间戳字段划分,区间为一小时。
##########
docs/content.zh/docs/try-flink/table_api.md:
##########
@@ -275,38 +270,38 @@ public static Table report(Table transactions) {
}
```
-This defines your application as using one hour tumbling windows based on the
timestamp column.
-So a row with timestamp `2019-06-01 01:23:47` is put in the `2019-06-01
01:00:00` window.
+上面的代码含义为:应用使用滚动窗口,窗口按照指定的时间戳字段划分,区间为一小时。
+所以,时间戳为 `2019-06-01 01:23:47` 的行会进入窗口 `2019-06-01 01:00:00`中。
+不同于其他属性,时间在一个持续不断的流式应用中总是向前移动,因此基于时间的聚合总是不重复的。
-Aggregations based on time are unique because time, as opposed to other
attributes, generally moves forward in a continuous streaming application.
-Unlike `floor` and your UDF, window functions are
[intrinsics](https://en.wikipedia.org/wiki/Intrinsic_function), which allows
the runtime to apply additional optimizations.
-In a batch context, windows offer a convenient API for grouping records by a
timestamp attribute.
+不同于 `floor` 以及 UDF,窗口函数是
[内部的][intrinsics](https://en.wikipedia.org/wiki/Intrinsic_function),可以运行时优化。
+批环境中,如果需要按照时间属性分组数据,窗口函数也有便利的 API。
-Running the test with this implementation will also pass.
+按此逻辑实现,测试也可以通过。
-## Once More, With Streaming!
+## 再用流式处理一次!
-And that's it, a fully functional, stateful, distributed streaming application!
-The query continuously consumes the stream of transactions from Kafka,
computes the hourly spendings, and emits results as soon as they are ready.
-Since the input is unbounded, the query keeps running until it is manually
stopped.
-And because the Job uses time window-based aggregations, Flink can perform
specific optimizations such as state clean up when the framework knows that no
more records will arrive for a particular window.
+这次的编写的应用是一个功能齐全、有状态的分布式流式应用!
Review Comment:
这次编写的应用是一个功能齐全、有状态的分布式流式应用!
##########
docs/content.zh/docs/try-flink/table_api.md:
##########
@@ -163,44 +161,41 @@ tEnv.executeSql("CREATE TABLE spend_report (\n" +
")");
```
-The second table, `spend_report`, stores the final results of the aggregation.
-Its underlying storage is a table in a MySql database.
+第二张 `spend_report` 表存储聚合后的最终结果,底层存储是 MySQL 数据库中的一张表。
-#### The Query
+#### 查询数据
-With the environment configured and tables registered, you are ready to build
your first application.
-From the `TableEnvironment` you can read `from` an input table to read its
rows and then write those results into an output table using `executeInsert`.
-The `report` function is where you will implement your business logic.
-It is currently unimplemented.
+配置好环境并注册好表后,你就可以开始开发你的第一个应用了。
+通过 `TableEnvironment` ,你可以 `from` 输入表读取数据,然后将结果调用 `executeInsert` 写入到输出表。
Review Comment:
通过 `TableEnvironment` ,你可以从输入表中读取数据,然后通过 `executeInsert`
将结果写入到输出表。
##########
docs/content.zh/docs/try-flink/table_api.md:
##########
@@ -109,25 +107,25 @@ report(transactions).executeInsert("spend_report");
```
-## Breaking Down The Code
+## 代码分析
-#### The Execution Environment
+#### 执行环境
-The first two lines set up your `TableEnvironment`.
-The table environment is how you can set properties for your Job, specify
whether you are writing a batch or a streaming application, and create your
sources.
-This walkthrough creates a standard table environment that uses the streaming
execution.
+前两行创建的是 `TableEnvironment`(表环境)。
+通过表环境,你可以设置作业属性,定义应用的批流模式,以及创建数据源。
Review Comment:
创建表环境时,你可以设置作业属性,定义应用的批流模式,以及创建数据源。
##########
docs/content.zh/docs/try-flink/table_api.md:
##########
@@ -216,10 +211,10 @@ public static Table report(Table transactions) {
}
```
-## User Defined Functions
+## 用户定义函数
-Flink contains a limited number of built-in functions, and sometimes you need
to extend it with a [user-defined function]({{< ref
"docs/dev/table/functions/udfs" >}}).
-If `floor` wasn't predefined, you could implement it yourself.
+Flink 内置的函数是有限的,有时是需要通过 [用户自定义函数]({{< ref "docs/dev/table/functions/udfs"
>}})来拓展这些函数。
+假如没有预设好的 `floor` 函数,也可以自己实现一个。
Review Comment:
假如 `floor` 函数不是系统预设函数,你也可以自己实现。
##########
docs/content.zh/docs/try-flink/table_api.md:
##########
@@ -254,14 +249,14 @@ public static Table report(Table transactions) {
}
```
-This query consumes all records from the `transactions` table, calculates the
report, and outputs the results in an efficient, scalable manner.
-Running the test with this implementation will pass.
+这条查询会从表 `transactions` 消费所有的记录,然后计算报表所需内容,最后将结果以高效、可拓展的方式输出。
+按此逻辑实现,可以通过测试。
-## Adding Windows
+## 添加窗口函数
-Grouping data based on time is a typical operation in data processing,
especially when working with infinite streams.
-A grouping based on time is called a [window]({{< ref
"docs/dev/datastream/operators/windows" >}}) and Flink offers flexible
windowing semantics.
-The most basic type of window is called a `Tumble` window, which has a fixed
size and whose buckets do not overlap.
+在数据处理中,按照时间做分组是一个典型的操作,尤其是在处理无限流时。
Review Comment:
在数据处理中,按照时间做分组是常见操作,在处理无限流时更是如此。
##########
docs/content.zh/docs/try-flink/table_api.md:
##########
@@ -254,14 +249,14 @@ public static Table report(Table transactions) {
}
```
-This query consumes all records from the `transactions` table, calculates the
report, and outputs the results in an efficient, scalable manner.
-Running the test with this implementation will pass.
+这条查询会从表 `transactions` 消费所有的记录,然后计算报表所需内容,最后将结果以高效、可拓展的方式输出。
+按此逻辑实现,可以通过测试。
-## Adding Windows
+## 添加窗口函数
-Grouping data based on time is a typical operation in data processing,
especially when working with infinite streams.
-A grouping based on time is called a [window]({{< ref
"docs/dev/datastream/operators/windows" >}}) and Flink offers flexible
windowing semantics.
-The most basic type of window is called a `Tumble` window, which has a fixed
size and whose buckets do not overlap.
+在数据处理中,按照时间做分组是一个典型的操作,尤其是在处理无限流时。
+按时间分组的函数叫 [window]({{< ref "docs/dev/datastream/operators/windows" >}}),Flink
提供了灵活的窗口函数语法。
+最常见的窗口是 `Tumble` ,此窗口固定窗口区间并且每个区间都不重叠。
Review Comment:
最常见的窗口是 `Tumble` ,窗口区间长度固定,并且区间不重叠。
##########
docs/content.zh/docs/try-flink/table_api.md:
##########
@@ -275,38 +270,38 @@ public static Table report(Table transactions) {
}
```
-This defines your application as using one hour tumbling windows based on the
timestamp column.
-So a row with timestamp `2019-06-01 01:23:47` is put in the `2019-06-01
01:00:00` window.
+上面的代码含义为:应用使用滚动窗口,窗口按照指定的时间戳字段划分,区间为一小时。
+所以,时间戳为 `2019-06-01 01:23:47` 的行会进入窗口 `2019-06-01 01:00:00`中。
Review Comment:
比如,一条时间戳为 `2019-06-01 01:23:47` 的数据会进入到窗口 `2019-06-01 01:00:00` 中。
Anyway, how about translate 'so' into '比如' instead of '所以', it's up to you:)
##########
docs/content.zh/docs/try-flink/table_api.md:
##########
@@ -275,38 +270,38 @@ public static Table report(Table transactions) {
}
```
-This defines your application as using one hour tumbling windows based on the
timestamp column.
-So a row with timestamp `2019-06-01 01:23:47` is put in the `2019-06-01
01:00:00` window.
+上面的代码含义为:应用使用滚动窗口,窗口按照指定的时间戳字段划分,区间为一小时。
+所以,时间戳为 `2019-06-01 01:23:47` 的行会进入窗口 `2019-06-01 01:00:00`中。
+不同于其他属性,时间在一个持续不断的流式应用中总是向前移动,因此基于时间的聚合总是不重复的。
-Aggregations based on time are unique because time, as opposed to other
attributes, generally moves forward in a continuous streaming application.
-Unlike `floor` and your UDF, window functions are
[intrinsics](https://en.wikipedia.org/wiki/Intrinsic_function), which allows
the runtime to apply additional optimizations.
-In a batch context, windows offer a convenient API for grouping records by a
timestamp attribute.
+不同于 `floor` 以及 UDF,窗口函数是
[内部的][intrinsics](https://en.wikipedia.org/wiki/Intrinsic_function),可以运行时优化。
+批环境中,如果需要按照时间属性分组数据,窗口函数也有便利的 API。
-Running the test with this implementation will also pass.
+按此逻辑实现,测试也可以通过。
-## Once More, With Streaming!
+## 再用流式处理一次!
-And that's it, a fully functional, stateful, distributed streaming application!
-The query continuously consumes the stream of transactions from Kafka,
computes the hourly spendings, and emits results as soon as they are ready.
-Since the input is unbounded, the query keeps running until it is manually
stopped.
-And because the Job uses time window-based aggregations, Flink can perform
specific optimizations such as state clean up when the framework knows that no
more records will arrive for a particular window.
+这次的编写的应用是一个功能齐全、有状态的分布式流式应用!
+查询语句持续消费 Kafka 中流式的交易数据,然后计算每小时的消费,最后当窗口结束时立刻提交结果。
Review Comment:
查询语句持续消费 Kafka 中的交易数据流,然后计算每小时的消费,最后当窗口结束时立刻提交结果。
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]