wuchong commented on a change in pull request #11423:
[Flink-16083][chinese-translation]Translate "Dynamic Table" page of "…
URL: https://github.com/apache/flink/pull/11423#discussion_r394768887
##########
File path: docs/dev/table/streaming/dynamic_tables.zh.md
##########
@@ -22,135 +22,135 @@ specific language governing permissions and limitations
under the License.
-->
-SQL and the relational algebra have not been designed with streaming data in
mind. As a consequence, there are few conceptual gaps between relational
algebra (and SQL) and stream processing.
+SQL 和关系代数在设计时并未考虑流数据。因此,在关系代数(和 SQL)之间几乎没有概念上的差异。
-This page discusses these differences and explains how Flink can achieve the
same semantics on unbounded data as a regular database engine on bounded data.
+本文会讨论这种差异,并介绍 Flink 如何在无界数据集上实现与数据库引擎在有界数据上的处理具有相同的语义。
* This will be replaced by the TOC
{:toc}
-Relational Queries on Data Streams
+DataStream 上的关系查询
----------------------------------
-The following table compares traditional relational algebra and stream
processing with respect to input data, execution, and output results.
+下表比较了传统的关系代数和流处理与输入数据、执行和输出结果的关系。
<table class="table table-bordered">
<tr>
- <th>Relational Algebra / SQL</th>
- <th>Stream Processing</th>
+ <th>关系代数 / SQL</th>
+ <th>流处理</th>
</tr>
<tr>
- <td>Relations (or tables) are bounded (multi-)sets of
tuples.</td>
- <td>A stream is an infinite sequences of tuples.</td>
+ <td>关系(或表)是有界(多)元组集合。</td>
+ <td>流是一个无限元组序列。</td>
</tr>
<tr>
- <td>A query that is executed on batch data (e.g., a table in a
relational database) has access to the complete input data.</td>
- <td>A streaming query cannot access all data when it is started
and has to "wait" for data to be streamed in.</td>
+ <td>对批数据(例如关系数据库中的表)执行的查询可以访问完整的输入数据。</td>
+ <td>流式查询在启动时不能访问所有数据,必须“等待”数据流入。</td>
</tr>
<tr>
- <td>A batch query terminates after it produced a fixed sized
result.</td>
- <td>A streaming query continuously updates its result based on
the received records and never completes.</td>
+ <td>批处理查询在产生固定大小的结果后终止。</td>
+ <td>流查询不断地根据接收到的记录更新其结果,并且始终不会结束。</td>
</tr>
</table>
-Despite these differences, processing streams with relational queries and SQL
is not impossible. Advanced relational database systems offer a feature called
*Materialized Views*. A materialized view is defined as a SQL query, just like
a regular virtual view. In contrast to a virtual view, a materialized view
caches the result of the query such that the query does not need to be
evaluated when the view is accessed. A common challenge for caching is to
prevent a cache from serving outdated results. A materialized view becomes
outdated when the base tables of its definition query are modified. *Eager View
Maintenance* is a technique to update a materialized view as soon as its base
tables are updated.
+尽管存在这些差异,但是使用关系查询和 SQL 处理流并不是不可能的。高级关系数据库系统提供了一个称为 *物化视图(Materialized Views)*
的特性。物化视图被定义为一条 SQL
查询,就像常规的虚拟视图一样。与虚拟视图相反,物化视图缓存查询的结果,因此在访问视图时不需要对查询进行计算。缓存的一个常见难题是防止缓存为过期的结果提供服务。当其定义查询的基表被修改时,物化视图将过期。
*即时视图维护(Eager View Maintenance)* 是一种一旦更新了物化视图的基表就立即更新视图的技术。
-The connection between eager view maintenance and SQL queries on streams
becomes obvious if we consider the following:
+如果我们考虑以下问题,那么即时视图维护和流上的SQL查询之间的联系就会变得显而易见:
-- A database table is the result of a *stream* of `INSERT`, `UPDATE`, and
`DELETE` DML statements, often called *changelog stream*.
-- A materialized view is defined as a SQL query. In order to update the view,
the query continuously processes the changelog streams of the view's base
relations.
-- The materialized view is the result of the streaming SQL query.
+- 数据库表是 `INSERT`、`UPDATE` 和 `DELETE` DML 语句的 *stream* 的结果,通常称为 *changelog
stream* 。
+- 物化视图被定义为一条 SQL 查询。为了更新视图,查询不断地处理视图的基本关系的changelog 流。
+- 物化视图是流式 SQL 查询的结果。
-With these points in mind, we introduce following concept of *Dynamic tables*
in the next section.
+了解了这些要点之后,我们将在下一节中介绍 *动态表(Dynamic tables)* 的概念。
-Dynamic Tables & Continuous Queries
+动态表 & 连续查询(Continuous Query)
---------------------------------------
-*Dynamic tables* are the core concept of Flink's Table API and SQL support for
streaming data. In contrast to the static tables that represent batch data,
dynamic tables are changing over time. They can be queried like static batch
tables. Querying dynamic tables yields a *Continuous Query*. A continuous query
never terminates and produces a dynamic table as result. The query continuously
updates its (dynamic) result table to reflect the changes on its (dynamic)
input tables. Essentially, a continuous query on a dynamic table is very
similar to a query that defines a materialized view.
+*动态表* 是 Flink 的支持流数据的 Table API 和 SQL
的核心概念。与表示批处理数据的静态表不同,动态表是随时间变化的。可以像查询静态批处理表一样查询它们。查询动态表将生成一个 *连续查询*
。一个连续查询永远不会终止,结果会生成一个动态表。查询不断更新其(动态)结果表,以反映其(动态)输入表上的更改。本质上,动态表上的连续查询非常类似于定义物化视图的查询。
-It is important to note that the result of a continuous query is always
semantically equivalent to the result of the same query being executed in batch
mode on a snapshot of the input tables.
+需要注意的是,连续查询的结果在语义上总是等价于以批处理模式在输入表快照上执行的相同查询的结果。
-The following figure visualizes the relationship of streams, dynamic tables,
and continuous queries:
+下图显示了流、动态表和连续查询之间的关系:
<center>
<img alt="Dynamic tables" src="{{ site.baseurl
}}/fig/table-streaming/stream-query-stream.png" width="80%">
</center>
-1. A stream is converted into a dynamic table.
-1. A continuous query is evaluated on the dynamic table yielding a new dynamic
table.
-1. The resulting dynamic table is converted back into a stream.
+1. 将流转换为动态表。
+2. 在动态表上计算一个连续查询,生成一个新的动态表。
+3. 生成的动态表被转换回流。
-**Note:** Dynamic tables are foremost a logical concept. Dynamic tables are
not necessarily (fully) materialized during query execution.
+**注意:** 动态表首先是一个逻辑概念。在查询执行期间不一定(完全)物化动态表。
-In the following, we will explain the concepts of dynamic tables and
continuous queries with a stream of click events that have the following schema:
+在下面,我们将解释动态表和连续查询的概念,并使用具有以下模式的单击事件流:
{% highlight plain %}
[
- user: VARCHAR, // the name of the user
- cTime: TIMESTAMP, // the time when the URL was accessed
- url: VARCHAR // the URL that was accessed by the user
+ user: VARCHAR, // 用户名
+ cTime: TIMESTAMP, // 访问 URL 的时间
+ url: VARCHAR // 用户访问的 URL
]
{% endhighlight %}
-Defining a Table on a Stream
+在流上定义表
----------------------------
-In order to process a stream with a relational query, it has to be converted
into a `Table`. Conceptually, each record of the stream is interpreted as an
`INSERT` modification on the resulting table. Essentially, we are building a
table from an `INSERT`-only changelog stream.
+为了使用关系查询处理流,必须将其转换成 `Table`。从概念上讲,流的每条记录都被解释为对结果表的 `INSERT` 操作。本质上我们正在从一个
`INSERT`-only 的 changelog 流构建表。
-The following figure visualizes how the stream of click event (left-hand side)
is converted into a table (right-hand side). The resulting table is
continuously growing as more records of the click stream are inserted.
+下图显示了单击事件流(左侧)如何转换为表(右侧)。当插入更多的单击流记录时,结果表将不断增长。
<center>
<img alt="Append mode" src="{{ site.baseurl
}}/fig/table-streaming/append-mode.png" width="60%">
</center>
-**Note:** A table which is defined on a stream is internally not materialized.
+**注意:** 在流上定义的表在内部没有物化。
-### Continuous Queries
+### 连续查询
----------------------
-A continuous query is evaluated on a dynamic table and produces a new dynamic
table as result. In contrast to a batch query, a continuous query never
terminates and updates its result table according to the updates on its input
tables. At any point in time, the result of a continuous query is semantically
equivalent to the result of the same query being executed in batch mode on a
snapshot of the input tables.
+在动态表上计算一个连续查询,并生成一个新的动态表。与批处理查询不同,连续查询从不终止,并根据其输入表上的更新更新其结果表。在任何时候,连续查询的结果在语义上与以批处理模式在输入表快照上执行的相同查询的结果相同。
-In the following we show two example queries on a `clicks` table that is
defined on the stream of click events.
+在接下来的代码中,我们将展示 `clicks` 表上的两个示例查询,这个表是在点击事件流上定义的。
-The first query is a simple `GROUP-BY COUNT` aggregation query. It groups the
`clicks` table on the `user` field and counts the number of visited URLs. The
following figure shows how the query is evaluated over time as the `clicks`
table is updated with additional rows.
+第一个查询是一个简单的 `GROUP-BY COUNT` 聚合查询。它基于 `user` 字段对 `clicks` 表进行分组,并统计访问的 URL
的数量。下面的图显示了当 `clicks` 表被附加的行更新时,查询是如何被评估的。
<center>
<img alt="Continuous Non-Windowed Query" src="{{ site.baseurl
}}/fig/table-streaming/query-groupBy-cnt.png" width="90%">
</center>
-When the query is started, the `clicks` table (left-hand side) is empty. The
query starts to compute the result table, when the first row is inserted into
the `clicks` table. After the first row `[Mary, ./home]` was inserted, the
result table (right-hand side, top) consists of a single row `[Mary, 1]`. When
the second row `[Bob, ./cart]` is inserted into the `clicks` table, the query
updates the result table and inserts a new row `[Bob, 1]`. The third row
`[Mary, ./prod?id=1]` yields an update of an already computed result row such
that `[Mary, 1]` is updated to `[Mary, 2]`. Finally, the query inserts a third
row `[Liz, 1]` into the result table, when the fourth row is appended to the
`clicks` table.
+当查询开始,`clicks` 表(左侧)是空的。当第一行数据被插入到 `clicks` 表时,查询开始计算结果表。第一行数据 `[Mary,./home]`
插入后,结果表(右侧,上部)由一行 `[Mary, 1]` 组成。当第二行 `[Bob, ./cart]` 插入到 `clicks`
表时,查询会更新结果表并插入了一行新数据 `[Bob, 1]`。第三行 `[Mary, ./prod?id=1]` 将产生已计算的结果行的更新,`[Mary,
1]` 更新成 `[Mary, 2]`。最后,当第四行数据加入 `clicks` 表时,查询将第三行 `[Liz, 1]` 插入到结果表中。
-The second query is similar to the first one but groups the `clicks` table in
addition to the `user` attribute also on an [hourly tumbling window]({{
site.baseurl }}/dev/table/sql/index.html#group-windows) before it counts the
number of URLs (time-based computations such as windows are based on special
[time attributes](time_attributes.html), which are discussed later.). Again,
the figure shows the input and output at different points in time to visualize
the changing nature of dynamic tables.
+第二条查询与第一条类似,但是除了用户属性之外,还将 `clicks` 分组至[每小时滚动窗口]({{ site.baseurl
}}/zh/dev/table/sql/index.html#group-windows)中,然后计算 url
数量(基于时间的计算,例如基于特定[时间属性](time_attributes.html)的窗口,后面会讨论)。同样,该图显示了不同时间点的输入和输出,以可视化动态表的变化特性。
<center>
<img alt="Continuous Group-Window Query" src="{{ site.baseurl
}}/fig/table-streaming/query-groupBy-window-cnt.png" width="100%">
</center>
-As before, the input table `clicks` is shown on the left. The query
continuously computes results every hour and updates the result table. The
clicks table contains four rows with timestamps (`cTime`) between `12:00:00`
and `12:59:59`. The query computes two results rows from this input (one for
each `user`) and appends them to the result table. For the next window between
`13:00:00` and `13:59:59`, the `clicks` table contains three rows, which
results in another two rows being appended to the result table. The result
table is updated, as more rows are appended to `clicks` over time.
+与前面一样,左边显示了输入表 `clicks`。查询每小时持续计算结果并更新结果表。clicks表包含四行带有时间戳(`cTime`)的数据,时间戳在
`12:00:00` 和 `12:59:59` 之间。查询从这个输入计算出两个结果行(每个 `user` 一个),并将它们附加到结果表中。对于
`13:00:00` 和 `13:59:59` 之间的下一个窗口,`clicks`
表包含三行,这将导致另外两行被追加到结果表。随着时间的推移,更多的行被添加到 `click` 中,结果表将被更新。
-### Update and Append Queries
+### 更新和追加查询
-Although the two example queries appear to be quite similar (both compute a
grouped count aggregate), they differ in one important aspect:
-- The first query updates previously emitted results, i.e., the changelog
stream that defines the result table contains `INSERT` and `UPDATE` changes.
-- The second query only appends to the result table, i.e., the changelog
stream of the result table only consists of `INSERT` changes.
+虽然这两个示例查询看起来非常相似(都计算分组计数聚合),但它们在一个重要方面不同:
+- 第一个查询更新先前输出的结果,即定义结果表的 changelog 流包含 `INSERT` 和 `UPDATE` 操作。
+- 第二个查询只附加到结果表,即结果表的 changelog 流只包含 `INSERT` 操作。
-Whether a query produces an append-only table or an updated table has some
implications:
-- Queries that produce update changes usually have to maintain more state (see
the following section).
-- The conversion of an append-only table into a stream is different from the
conversion of an updated table (see the [Table to Stream
Conversion](#table-to-stream-conversion) section).
+一个查询是产生一个只追加的表还是一个更新的表有一些含义:
+- 产生更新更改的查询通常必须维护更多的状态(请参阅以下部分)。
+- 将 append-only 的表转换为流与将已更新的表转换为流是不同的(参阅
[表到流的转换](#table-to-stream-conversion)章节)。
Review comment:
The anchor link is broken because we translate the title into Chinese.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services