klion26 commented on a change in pull request #12798: URL: https://github.com/apache/flink/pull/12798#discussion_r451286413
########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -180,99 +158,68 @@ FROM Ticker ) MR; {% endhighlight %} -The query partitions the `Ticker` table by the `symbol` column and orders it by the `rowtime` -time attribute. +此查询将 `Ticker` 表按照 `symbol` 列进行分区并按照 `rowtime` 属性进行排序。 -The `PATTERN` clause specifies that we are interested in a pattern with a starting event `START_ROW` -that is followed by one or more `PRICE_DOWN` events and concluded with a `PRICE_UP` event. If such -a pattern can be found, the next pattern match will be seeked at the last `PRICE_UP` event as -indicated by the `AFTER MATCH SKIP TO LAST` clause. +`PATTERN` 子句指定我们对以下模式感兴趣:该模式具有开始事件 `START_ROW`,然后是一个或多个 `PRICE_DOWN` 事件,并以 `PRICE_UP` 事件结束。如果可以找到这样的模式,如 `AFTER MATCH SKIP TO LAST` 子句所示,则从最后一个 `PRICE_UP` 事件开始寻找下一个模式匹配。 -The `DEFINE` clause specifies the conditions that need to be met for a `PRICE_DOWN` and `PRICE_UP` -event. Although the `START_ROW` pattern variable is not present it has an implicit condition that -is evaluated always as `TRUE`. +`DEFINE` 子句指定 `PRICE_DOWN` 和 `PRICE_UP` 事件需要满足的条件。尽管不存在 `START_ROW` 模式变量,但它具有一个始终被评估为 `TRUE` 隐式条件。 -A pattern variable `PRICE_DOWN` is defined as a row with a price that is smaller than the price of -the last row that met the `PRICE_DOWN` condition. For the initial case or when there is no last row -that met the `PRICE_DOWN` condition, the price of the row should be smaller than the price of the -preceding row in the pattern (referenced by `START_ROW`). +模式变量 `PRICE_DOWN` 定义为价格小于满足 `PRICE_DOWN` 条件的最后一行价格的行。对于初始情况或没有满足 `PRICE_DOWN` 条件的最后一行时,该行的价格应小于该模式中前一行(由 `START_ROW` 引用)的价格。 Review comment: `的最后一行价格的行` -> `的最后一行` 可以吗? ########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -24,28 +24,17 @@ specific language governing permissions and limitations under the License. --> -It is a common use case to search for a set of event patterns, especially in case of data streams. -Flink comes with a [complex event processing (CEP) library]({{ site.baseurl }}/dev/libs/cep.html) -which allows for pattern detection in event streams. Furthermore, Flink's SQL API provides a -relational way of expressing queries with a large set of built-in functions and rule-based -optimizations that can be used out of the box. - -In December 2016, the International Organization for Standardization (ISO) released a new version -of the SQL standard which includes _Row Pattern Recognition in SQL_ -([ISO/IEC TR 19075-5:2016](https://standards.iso.org/ittf/PubliclyAvailableStandards/c065143_ISO_IEC_TR_19075-5_2016.zip)). -It allows Flink to consolidate CEP and SQL API using the `MATCH_RECOGNIZE` clause for complex event -processing in SQL. - -A `MATCH_RECOGNIZE` clause enables the following tasks: -* Logically partition and order the data that is used with the `PARTITION BY` and `ORDER BY` - clauses. -* Define patterns of rows to seek using the `PATTERN` clause. These patterns use a syntax similar to - that of regular expressions. -* The logical components of the row pattern variables are specified in the `DEFINE` clause. -* Define measures, which are expressions usable in other parts of the SQL query, in the `MEASURES` - clause. - -The following example illustrates the syntax for basic pattern recognition: +搜索一组事件模式(event pattern)是一种常见的用例,尤其是在数据流的情况下。Flink 提供 [复杂事件处理(CEP)库]({% link dev/libs/cep.zh.md %}),该库允许在事件流中进行模式检测。此外,Flink 的 SQL API 提供了一种关系式的查询表达方式,其中包含大量内置函数和基于规则的优化,可以开箱即用。 + +2016 年 12 月,国际标准化组织(ISO)发布了新版本的 SQL 标准,其中包括在 _SQL 中的行模式识别(Row Pattern Recognition in SQL)_([ISO/IEC TR 19075-5:2016](https://standards.iso.org/ittf/PubliclyAvailableStandards/c065143_ISO_IEC_TR_19075-5_2016.zip))。它允许 Flink 使用 `MATCH_RECOGNIZE` 子句融合 CEP 和 SQL API,以便在 SQL 中进行复杂事件处理。 + +`MATCH_RECOGNIZE` 子句启用以下任务: +* 使用 `PARTITION BY` 和 `ORDER BY` 子句对数据进行逻辑分区和排序。 +* 使用 `PATTERN` 子句定义要查找的行的模式。这些模式使用类似于正则表达式的语法。 +* 行模式变量的逻辑组件在 `DEFINE` 子句中指定。 +* measures 是指在 `MEASURES` 子句中定义的表达式,这些表达式可用在 SQL 查询中的其他部分。 Review comment: 这句话能再优化一下吗?现在的描述读起来感觉少一点东西,这里的 "measures" 进行了介绍,但是没有说具体用来做什么 或者前面的 `measures` 和后面的 `表达式` 进行一下统一? ########## File path: docs/dev/table/streaming/dynamic_tables.zh.md ########## @@ -128,6 +128,8 @@ DataStream 上的关系查询 与前面一样,左边显示了输入表 `clicks`。查询每小时持续计算结果并更新结果表。clicks表包含四行带有时间戳(`cTime`)的数据,时间戳在 `12:00:00` 和 `12:59:59` 之间。查询从这个输入计算出两个结果行(每个 `user` 一个),并将它们附加到结果表中。对于 `13:00:00` 和 `13:59:59` 之间的下一个窗口,`clicks` 表包含三行,这将导致另外两行被追加到结果表。随着时间的推移,更多的行被添加到 `click` 中,结果表将被更新。 +<a name="update-and-append-queries"></a> Review comment: 这个我建议单独使用一个 hotfix 的 commit 来进行修改,当前的 PR 仅翻译 detecting patterns 这个文档 ########## File path: docs/dev/table/streaming/match_recognize.md ########## @@ -230,14 +230,14 @@ Order of Events --------------- Apache Flink allows for searching for patterns based on time; either -[processing time or event time](time_attributes.html). +[processing time or event time]({% link dev/table/streaming/time_attributes.md %}). Review comment: 这些建议另外放一个 pr 中,这个 pr 只做翻译的事情 ########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -180,99 +158,68 @@ FROM Ticker ) MR; {% endhighlight %} -The query partitions the `Ticker` table by the `symbol` column and orders it by the `rowtime` -time attribute. +此查询将 `Ticker` 表按照 `symbol` 列进行分区并按照 `rowtime` 属性进行排序。 -The `PATTERN` clause specifies that we are interested in a pattern with a starting event `START_ROW` -that is followed by one or more `PRICE_DOWN` events and concluded with a `PRICE_UP` event. If such -a pattern can be found, the next pattern match will be seeked at the last `PRICE_UP` event as -indicated by the `AFTER MATCH SKIP TO LAST` clause. +`PATTERN` 子句指定我们对以下模式感兴趣:该模式具有开始事件 `START_ROW`,然后是一个或多个 `PRICE_DOWN` 事件,并以 `PRICE_UP` 事件结束。如果可以找到这样的模式,如 `AFTER MATCH SKIP TO LAST` 子句所示,则从最后一个 `PRICE_UP` 事件开始寻找下一个模式匹配。 -The `DEFINE` clause specifies the conditions that need to be met for a `PRICE_DOWN` and `PRICE_UP` -event. Although the `START_ROW` pattern variable is not present it has an implicit condition that -is evaluated always as `TRUE`. +`DEFINE` 子句指定 `PRICE_DOWN` 和 `PRICE_UP` 事件需要满足的条件。尽管不存在 `START_ROW` 模式变量,但它具有一个始终被评估为 `TRUE` 隐式条件。 -A pattern variable `PRICE_DOWN` is defined as a row with a price that is smaller than the price of -the last row that met the `PRICE_DOWN` condition. For the initial case or when there is no last row -that met the `PRICE_DOWN` condition, the price of the row should be smaller than the price of the -preceding row in the pattern (referenced by `START_ROW`). +模式变量 `PRICE_DOWN` 定义为价格小于满足 `PRICE_DOWN` 条件的最后一行价格的行。对于初始情况或没有满足 `PRICE_DOWN` 条件的最后一行时,该行的价格应小于该模式中前一行(由 `START_ROW` 引用)的价格。 -A pattern variable `PRICE_UP` is defined as a row with a price that is larger than the price of the -last row that met the `PRICE_DOWN` condition. +模变变量 `PRICE_UP` 定义为价格大于满足 `PRICE_DOWN` 条件的最后一行价格的行。 -This query produces a summary row for each period in which the price of a stock was continuously -decreasing. +此查询为股票价格持续下跌的每个期间生成摘要行。 -The exact representation of the output rows is defined in the `MEASURES` part of the query. The -number of output rows is defined by the `ONE ROW PER MATCH` output mode. +输出行的确切表示在查询的 `MEASURES` 部分中定义。输出行数由 `ONE ROW PER MATCH` 输出方式定义。 {% highlight text %} symbol start_tstamp bottom_tstamp end_tstamp ========= ================== ================== ================== ACME 01-APR-11 10:00:04 01-APR-11 10:00:07 01-APR-11 10:00:08 {% endhighlight %} -The resulting row describes a period of falling prices that started at `01-APR-11 10:00:04` and -achieved the lowest price at `01-APR-11 10:00:07` that increased again at `01-APR-11 10:00:08`. +该行结果描述了从 `01-APR-11 10:00:04` 开始的价格下跌期,在 `01-APR-11 10:00:07` 达到最低价格,到 `01-APR-11 10:00:08` 再次上涨。 -Partitioning +<a name="partitioning"></a> + +分区 ------------ -It is possible to look for patterns in partitioned data, e.g., trends for a single ticker or a -particular user. This can be expressed using the `PARTITION BY` clause. The clause is similar to -using `GROUP BY` for aggregations. +可以在分区数据中寻找模式,例如单个股票行情或特定用户的趋势。这可以用 `PARTITION BY` 子句来表示。该子句类似于对聚合使用 `GROUP BY`。 + +<span class="label label-danger">注意</span> 强烈建议对传入的数据进行分区,否则 `MATCH_RECOGNIZE` 子句将被转换为非并行算子,以确保全局排序。 -<span class="label label-danger">Attention</span> It is highly advised to partition the incoming -data because otherwise the `MATCH_RECOGNIZE` clause will be translated into a non-parallel operator -to ensure global ordering. +<a name="order-of-events"></a> -Order of Events +事件顺序 --------------- -Apache Flink allows for searching for patterns based on time; either -[processing time or event time](time_attributes.html). +Apache Flink 可以根据时间([处理时间或者事件时间]({% link dev/table/streaming/time_attributes.zh.md %}))进行模式搜索。 -In case of event time, the events are sorted before they are passed to the internal pattern state -machine. As a consequence, the produced output will be correct regardless of the order in which -rows are appended to the table. Instead, the pattern is evaluated in the order specified by the -time contained in each row. +如果是事件时间,则在将事件传递到内部模式状态机之前对其进行排序。所以,无论行添加到表的顺序如何,生成的输出都是正确的。相反,模式是按照每行中包含的时间指定的顺序计算的。 -The `MATCH_RECOGNIZE` clause assumes a [time attribute](time_attributes.html) with ascending -ordering as the first argument to `ORDER BY` clause. +`MATCH_RECOGNIZE` 子句假定升序的 [时间属性]({% link dev/table/streaming/time_attributes.zh.md %}) 是 `ORDER BY` 子句的第一个参数。 -For the example `Ticker` table, a definition like `ORDER BY rowtime ASC, price DESC` is valid but -`ORDER BY price, rowtime` or `ORDER BY rowtime DESC, price ASC` is not. +对于示例 `Ticker` 表,诸如 `ORDER BY rowtime ASC, price DESC` 的定义是有效的,但 `ORDER BY price, rowtime` 或者 `ORDER BY rowtime DESC, price ASC` 是无效的。 Define & Measures ----------------- -The `DEFINE` and `MEASURES` keywords have similar meanings to the `WHERE` and `SELECT` clauses in a -simple SQL query. +`DEFINE` 和 `MEASURES` 关键字与简单 SQL 查询中的 `WHERE` 和 `SELECT` 子句具有相近的含义。 -The `MEASURES` clause defines what will be included in the output of a matching pattern. It can -project columns and define expressions for evaluation. The number of produced rows depends on the -[output mode](#output-mode) setting. +`MEASURES` 子句定义匹配模式的输出中要包含哪些内容。它可以投影列并定义表达式进行计算。产生的行数取决于 [output mode](#output-mode) 设置。 Review comment: 这里 `output mode` 下面进行了翻译,这里建议也翻译下。其他的词语也是一样 这种文本中不一定能很好的找出来。可以在本地翻译完成后,执行 `./docs/docker/run.sh` 然后 `./build_docs.sh -p` 查看最终的渲染结果 ########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -478,9 +406,9 @@ FROM Ticker ) {% endhighlight %} -The query detects a price drop of `10` that happens within an interval of 1 hour. +该查询检测到在1小时的间隔内价格下降了`10`。 Review comment: ```suggestion 该查询检测到在 1 小时的间隔内价格下降了 `10`。 ``` ########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -976,16 +877,15 @@ The second result matched against the rows #5, #6. XYZ 17 2018-09-17 10:00:05 2018-09-17 10:00:06 {% endhighlight %} -Again, the first result matched against the rows #1, #2, #3, #4. +同样,第一个结果与#1,#2,#3,#4行匹配。 -Compared to the previous strategy, the next match includes row #2 again for the next matching. -Therefore, the second result matched against the rows #2, #3, #4, #5. +与上一个策略相比,下一个匹配再次包含#2行匹配。因此,第二个结果与#2,#3,#4,#5行匹配。 Review comment: ```suggestion 与上一个策略相比,下一个匹配再次包含 #2 行匹配。因此,第二个结果与 #2,#3,#4,#5 行匹配。 ``` ########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -960,9 +861,9 @@ The query will produce different results based on which `AFTER MATCH` strategy w XYZ 17 2018-09-17 10:00:05 2018-09-17 10:00:06 {% endhighlight %} -The first result matched against the rows #1, #2, #3, #4. +第一个结果与#1,#2,#3,#4行匹配。 Review comment: ```suggestion 第一个结果与 #1,#2,#3,#4 行匹配。 ``` ########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -180,99 +158,68 @@ FROM Ticker ) MR; {% endhighlight %} -The query partitions the `Ticker` table by the `symbol` column and orders it by the `rowtime` -time attribute. +此查询将 `Ticker` 表按照 `symbol` 列进行分区并按照 `rowtime` 属性进行排序。 -The `PATTERN` clause specifies that we are interested in a pattern with a starting event `START_ROW` -that is followed by one or more `PRICE_DOWN` events and concluded with a `PRICE_UP` event. If such -a pattern can be found, the next pattern match will be seeked at the last `PRICE_UP` event as -indicated by the `AFTER MATCH SKIP TO LAST` clause. +`PATTERN` 子句指定我们对以下模式感兴趣:该模式具有开始事件 `START_ROW`,然后是一个或多个 `PRICE_DOWN` 事件,并以 `PRICE_UP` 事件结束。如果可以找到这样的模式,如 `AFTER MATCH SKIP TO LAST` 子句所示,则从最后一个 `PRICE_UP` 事件开始寻找下一个模式匹配。 -The `DEFINE` clause specifies the conditions that need to be met for a `PRICE_DOWN` and `PRICE_UP` -event. Although the `START_ROW` pattern variable is not present it has an implicit condition that -is evaluated always as `TRUE`. +`DEFINE` 子句指定 `PRICE_DOWN` 和 `PRICE_UP` 事件需要满足的条件。尽管不存在 `START_ROW` 模式变量,但它具有一个始终被评估为 `TRUE` 隐式条件。 -A pattern variable `PRICE_DOWN` is defined as a row with a price that is smaller than the price of -the last row that met the `PRICE_DOWN` condition. For the initial case or when there is no last row -that met the `PRICE_DOWN` condition, the price of the row should be smaller than the price of the -preceding row in the pattern (referenced by `START_ROW`). +模式变量 `PRICE_DOWN` 定义为价格小于满足 `PRICE_DOWN` 条件的最后一行价格的行。对于初始情况或没有满足 `PRICE_DOWN` 条件的最后一行时,该行的价格应小于该模式中前一行(由 `START_ROW` 引用)的价格。 -A pattern variable `PRICE_UP` is defined as a row with a price that is larger than the price of the -last row that met the `PRICE_DOWN` condition. +模变变量 `PRICE_UP` 定义为价格大于满足 `PRICE_DOWN` 条件的最后一行价格的行。 Review comment: `的最后一行价格的行` -> `的最后一行` 可以吗? ########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -976,16 +877,15 @@ The second result matched against the rows #5, #6. XYZ 17 2018-09-17 10:00:05 2018-09-17 10:00:06 {% endhighlight %} -Again, the first result matched against the rows #1, #2, #3, #4. +同样,第一个结果与#1,#2,#3,#4行匹配。 Review comment: ```suggestion 同样,第一个结果与 #1,#2,#3,#4 行匹配。 ``` ########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -180,99 +158,68 @@ FROM Ticker ) MR; {% endhighlight %} -The query partitions the `Ticker` table by the `symbol` column and orders it by the `rowtime` -time attribute. +此查询将 `Ticker` 表按照 `symbol` 列进行分区并按照 `rowtime` 属性进行排序。 -The `PATTERN` clause specifies that we are interested in a pattern with a starting event `START_ROW` -that is followed by one or more `PRICE_DOWN` events and concluded with a `PRICE_UP` event. If such -a pattern can be found, the next pattern match will be seeked at the last `PRICE_UP` event as -indicated by the `AFTER MATCH SKIP TO LAST` clause. +`PATTERN` 子句指定我们对以下模式感兴趣:该模式具有开始事件 `START_ROW`,然后是一个或多个 `PRICE_DOWN` 事件,并以 `PRICE_UP` 事件结束。如果可以找到这样的模式,如 `AFTER MATCH SKIP TO LAST` 子句所示,则从最后一个 `PRICE_UP` 事件开始寻找下一个模式匹配。 -The `DEFINE` clause specifies the conditions that need to be met for a `PRICE_DOWN` and `PRICE_UP` -event. Although the `START_ROW` pattern variable is not present it has an implicit condition that -is evaluated always as `TRUE`. +`DEFINE` 子句指定 `PRICE_DOWN` 和 `PRICE_UP` 事件需要满足的条件。尽管不存在 `START_ROW` 模式变量,但它具有一个始终被评估为 `TRUE` 隐式条件。 -A pattern variable `PRICE_DOWN` is defined as a row with a price that is smaller than the price of -the last row that met the `PRICE_DOWN` condition. For the initial case or when there is no last row -that met the `PRICE_DOWN` condition, the price of the row should be smaller than the price of the -preceding row in the pattern (referenced by `START_ROW`). +模式变量 `PRICE_DOWN` 定义为价格小于满足 `PRICE_DOWN` 条件的最后一行价格的行。对于初始情况或没有满足 `PRICE_DOWN` 条件的最后一行时,该行的价格应小于该模式中前一行(由 `START_ROW` 引用)的价格。 -A pattern variable `PRICE_UP` is defined as a row with a price that is larger than the price of the -last row that met the `PRICE_DOWN` condition. +模变变量 `PRICE_UP` 定义为价格大于满足 `PRICE_DOWN` 条件的最后一行价格的行。 -This query produces a summary row for each period in which the price of a stock was continuously -decreasing. +此查询为股票价格持续下跌的每个期间生成摘要行。 -The exact representation of the output rows is defined in the `MEASURES` part of the query. The -number of output rows is defined by the `ONE ROW PER MATCH` output mode. +输出行的确切表示在查询的 `MEASURES` 部分中定义。输出行数由 `ONE ROW PER MATCH` 输出方式定义。 {% highlight text %} symbol start_tstamp bottom_tstamp end_tstamp ========= ================== ================== ================== ACME 01-APR-11 10:00:04 01-APR-11 10:00:07 01-APR-11 10:00:08 {% endhighlight %} -The resulting row describes a period of falling prices that started at `01-APR-11 10:00:04` and -achieved the lowest price at `01-APR-11 10:00:07` that increased again at `01-APR-11 10:00:08`. +该行结果描述了从 `01-APR-11 10:00:04` 开始的价格下跌期,在 `01-APR-11 10:00:07` 达到最低价格,到 `01-APR-11 10:00:08` 再次上涨。 -Partitioning +<a name="partitioning"></a> + +分区 ------------ -It is possible to look for patterns in partitioned data, e.g., trends for a single ticker or a -particular user. This can be expressed using the `PARTITION BY` clause. The clause is similar to -using `GROUP BY` for aggregations. +可以在分区数据中寻找模式,例如单个股票行情或特定用户的趋势。这可以用 `PARTITION BY` 子句来表示。该子句类似于对聚合使用 `GROUP BY`。 + +<span class="label label-danger">注意</span> 强烈建议对传入的数据进行分区,否则 `MATCH_RECOGNIZE` 子句将被转换为非并行算子,以确保全局排序。 -<span class="label label-danger">Attention</span> It is highly advised to partition the incoming -data because otherwise the `MATCH_RECOGNIZE` clause will be translated into a non-parallel operator -to ensure global ordering. +<a name="order-of-events"></a> -Order of Events +事件顺序 --------------- -Apache Flink allows for searching for patterns based on time; either -[processing time or event time](time_attributes.html). +Apache Flink 可以根据时间([处理时间或者事件时间]({% link dev/table/streaming/time_attributes.zh.md %}))进行模式搜索。 -In case of event time, the events are sorted before they are passed to the internal pattern state -machine. As a consequence, the produced output will be correct regardless of the order in which -rows are appended to the table. Instead, the pattern is evaluated in the order specified by the -time contained in each row. +如果是事件时间,则在将事件传递到内部模式状态机之前对其进行排序。所以,无论行添加到表的顺序如何,生成的输出都是正确的。相反,模式是按照每行中包含的时间指定的顺序计算的。 -The `MATCH_RECOGNIZE` clause assumes a [time attribute](time_attributes.html) with ascending -ordering as the first argument to `ORDER BY` clause. +`MATCH_RECOGNIZE` 子句假定升序的 [时间属性]({% link dev/table/streaming/time_attributes.zh.md %}) 是 `ORDER BY` 子句的第一个参数。 -For the example `Ticker` table, a definition like `ORDER BY rowtime ASC, price DESC` is valid but -`ORDER BY price, rowtime` or `ORDER BY rowtime DESC, price ASC` is not. +对于示例 `Ticker` 表,诸如 `ORDER BY rowtime ASC, price DESC` 的定义是有效的,但 `ORDER BY price, rowtime` 或者 `ORDER BY rowtime DESC, price ASC` 是无效的。 Define & Measures Review comment: 这里我们需要翻译吗?上面对于 `measure` 应该有地方是翻译了的,全文的同一个词语最好进行统一 ########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -180,99 +158,68 @@ FROM Ticker ) MR; {% endhighlight %} -The query partitions the `Ticker` table by the `symbol` column and orders it by the `rowtime` -time attribute. +此查询将 `Ticker` 表按照 `symbol` 列进行分区并按照 `rowtime` 属性进行排序。 -The `PATTERN` clause specifies that we are interested in a pattern with a starting event `START_ROW` -that is followed by one or more `PRICE_DOWN` events and concluded with a `PRICE_UP` event. If such -a pattern can be found, the next pattern match will be seeked at the last `PRICE_UP` event as -indicated by the `AFTER MATCH SKIP TO LAST` clause. +`PATTERN` 子句指定我们对以下模式感兴趣:该模式具有开始事件 `START_ROW`,然后是一个或多个 `PRICE_DOWN` 事件,并以 `PRICE_UP` 事件结束。如果可以找到这样的模式,如 `AFTER MATCH SKIP TO LAST` 子句所示,则从最后一个 `PRICE_UP` 事件开始寻找下一个模式匹配。 -The `DEFINE` clause specifies the conditions that need to be met for a `PRICE_DOWN` and `PRICE_UP` -event. Although the `START_ROW` pattern variable is not present it has an implicit condition that -is evaluated always as `TRUE`. +`DEFINE` 子句指定 `PRICE_DOWN` 和 `PRICE_UP` 事件需要满足的条件。尽管不存在 `START_ROW` 模式变量,但它具有一个始终被评估为 `TRUE` 隐式条件。 -A pattern variable `PRICE_DOWN` is defined as a row with a price that is smaller than the price of -the last row that met the `PRICE_DOWN` condition. For the initial case or when there is no last row -that met the `PRICE_DOWN` condition, the price of the row should be smaller than the price of the -preceding row in the pattern (referenced by `START_ROW`). +模式变量 `PRICE_DOWN` 定义为价格小于满足 `PRICE_DOWN` 条件的最后一行价格的行。对于初始情况或没有满足 `PRICE_DOWN` 条件的最后一行时,该行的价格应小于该模式中前一行(由 `START_ROW` 引用)的价格。 -A pattern variable `PRICE_UP` is defined as a row with a price that is larger than the price of the -last row that met the `PRICE_DOWN` condition. +模变变量 `PRICE_UP` 定义为价格大于满足 `PRICE_DOWN` 条件的最后一行价格的行。 -This query produces a summary row for each period in which the price of a stock was continuously -decreasing. +此查询为股票价格持续下跌的每个期间生成摘要行。 -The exact representation of the output rows is defined in the `MEASURES` part of the query. The -number of output rows is defined by the `ONE ROW PER MATCH` output mode. +输出行的确切表示在查询的 `MEASURES` 部分中定义。输出行数由 `ONE ROW PER MATCH` 输出方式定义。 Review comment: “输出行的确切表示在查询的 `MEASURES` 部分中定义” 这句话有点不通顺,能否改进一下呢? ########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -435,30 +370,23 @@ DEFINE C AS NOT condB() {% endhighlight %} -<span class="label label-danger">Attention</span> The optional reluctant quantifier (`A??` or -`A{0,1}?`) is not supported right now. +<span class="label label-danger">注意</span> 目前不支持可选的勉强量词(`A??` 或者 `A{0,1}?`)。 + +<a name="time-constraint"></a> -### Time constraint +### 时间约束 -Especially for streaming use cases, it is often required that a pattern finishes within a given -period of time. This allows for limiting the overall state size that Flink has to maintain -internally, even in case of greedy quantifiers. +特别是对于流的使用场景,通常需要在给定的时间内完成模式。这要求限制 Flink 必须在内部保持的总体状态大小,即使在贪婪的量词的情况下也是如此。 Review comment: 这个地方的意思应该是,有时间限制之后,能够限制住 Flink 必须维持的 状态总体大小(已经过期的 状态不需要维护了) ########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -498,36 +426,32 @@ symbol rowtime price tax 'ACME' '01-Apr-11 13:20:00' 19 1 {% endhighlight %} -The query will produce the following results: +查询将生成以下结果: {% highlight text %} symbol dropTime dropDiff ====== ==================== ============= 'ACME' '01-Apr-11 13:00:00' 14 {% endhighlight %} -The resulting row represents a price drop from `15` (at `01-Apr-11 12:00:00`) to `1` (at -`01-Apr-11 13:00:00`). The `dropDiff` column contains the price difference. +结果行代表价格从`15`(在`01-Apr-11 12:00:00`)下降到`1`(在`01-Apr-11 13:00:00`)。`dropDiff` 列包含价格差异。 + +请注意,即使价格也下降了较高的值,例如,下降了`11`(在`01-Apr-11 10:00:00`和`01-Apr-11 11:40:00`之间),这两个事件之间的时间差大于1小时。因此,它们不会产生匹配。 Review comment: ```suggestion 请注意,即使价格也下降了较高的值,例如,下降了 `11`(在 `01-Apr-11 10:00:00` 和 `01-Apr-11 11:40:00` 之间),这两个事件之间的时间差大于 1 小时。因此,它们不会产生匹配。 ``` ########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -180,99 +158,68 @@ FROM Ticker ) MR; {% endhighlight %} -The query partitions the `Ticker` table by the `symbol` column and orders it by the `rowtime` -time attribute. +此查询将 `Ticker` 表按照 `symbol` 列进行分区并按照 `rowtime` 属性进行排序。 -The `PATTERN` clause specifies that we are interested in a pattern with a starting event `START_ROW` -that is followed by one or more `PRICE_DOWN` events and concluded with a `PRICE_UP` event. If such -a pattern can be found, the next pattern match will be seeked at the last `PRICE_UP` event as -indicated by the `AFTER MATCH SKIP TO LAST` clause. +`PATTERN` 子句指定我们对以下模式感兴趣:该模式具有开始事件 `START_ROW`,然后是一个或多个 `PRICE_DOWN` 事件,并以 `PRICE_UP` 事件结束。如果可以找到这样的模式,如 `AFTER MATCH SKIP TO LAST` 子句所示,则从最后一个 `PRICE_UP` 事件开始寻找下一个模式匹配。 -The `DEFINE` clause specifies the conditions that need to be met for a `PRICE_DOWN` and `PRICE_UP` -event. Although the `START_ROW` pattern variable is not present it has an implicit condition that -is evaluated always as `TRUE`. +`DEFINE` 子句指定 `PRICE_DOWN` 和 `PRICE_UP` 事件需要满足的条件。尽管不存在 `START_ROW` 模式变量,但它具有一个始终被评估为 `TRUE` 隐式条件。 -A pattern variable `PRICE_DOWN` is defined as a row with a price that is smaller than the price of -the last row that met the `PRICE_DOWN` condition. For the initial case or when there is no last row -that met the `PRICE_DOWN` condition, the price of the row should be smaller than the price of the -preceding row in the pattern (referenced by `START_ROW`). +模式变量 `PRICE_DOWN` 定义为价格小于满足 `PRICE_DOWN` 条件的最后一行价格的行。对于初始情况或没有满足 `PRICE_DOWN` 条件的最后一行时,该行的价格应小于该模式中前一行(由 `START_ROW` 引用)的价格。 -A pattern variable `PRICE_UP` is defined as a row with a price that is larger than the price of the -last row that met the `PRICE_DOWN` condition. +模变变量 `PRICE_UP` 定义为价格大于满足 `PRICE_DOWN` 条件的最后一行价格的行。 -This query produces a summary row for each period in which the price of a stock was continuously -decreasing. +此查询为股票价格持续下跌的每个期间生成摘要行。 -The exact representation of the output rows is defined in the `MEASURES` part of the query. The -number of output rows is defined by the `ONE ROW PER MATCH` output mode. +输出行的确切表示在查询的 `MEASURES` 部分中定义。输出行数由 `ONE ROW PER MATCH` 输出方式定义。 {% highlight text %} symbol start_tstamp bottom_tstamp end_tstamp ========= ================== ================== ================== ACME 01-APR-11 10:00:04 01-APR-11 10:00:07 01-APR-11 10:00:08 {% endhighlight %} -The resulting row describes a period of falling prices that started at `01-APR-11 10:00:04` and -achieved the lowest price at `01-APR-11 10:00:07` that increased again at `01-APR-11 10:00:08`. +该行结果描述了从 `01-APR-11 10:00:04` 开始的价格下跌期,在 `01-APR-11 10:00:07` 达到最低价格,到 `01-APR-11 10:00:08` 再次上涨。 -Partitioning +<a name="partitioning"></a> + +分区 ------------ -It is possible to look for patterns in partitioned data, e.g., trends for a single ticker or a -particular user. This can be expressed using the `PARTITION BY` clause. The clause is similar to -using `GROUP BY` for aggregations. +可以在分区数据中寻找模式,例如单个股票行情或特定用户的趋势。这可以用 `PARTITION BY` 子句来表示。该子句类似于对聚合使用 `GROUP BY`。 + +<span class="label label-danger">注意</span> 强烈建议对传入的数据进行分区,否则 `MATCH_RECOGNIZE` 子句将被转换为非并行算子,以确保全局排序。 -<span class="label label-danger">Attention</span> It is highly advised to partition the incoming -data because otherwise the `MATCH_RECOGNIZE` clause will be translated into a non-parallel operator -to ensure global ordering. +<a name="order-of-events"></a> -Order of Events +事件顺序 --------------- -Apache Flink allows for searching for patterns based on time; either -[processing time or event time](time_attributes.html). +Apache Flink 可以根据时间([处理时间或者事件时间]({% link dev/table/streaming/time_attributes.zh.md %}))进行模式搜索。 -In case of event time, the events are sorted before they are passed to the internal pattern state -machine. As a consequence, the produced output will be correct regardless of the order in which -rows are appended to the table. Instead, the pattern is evaluated in the order specified by the -time contained in each row. +如果是事件时间,则在将事件传递到内部模式状态机之前对其进行排序。所以,无论行添加到表的顺序如何,生成的输出都是正确的。相反,模式是按照每行中包含的时间指定的顺序计算的。 -The `MATCH_RECOGNIZE` clause assumes a [time attribute](time_attributes.html) with ascending -ordering as the first argument to `ORDER BY` clause. +`MATCH_RECOGNIZE` 子句假定升序的 [时间属性]({% link dev/table/streaming/time_attributes.zh.md %}) 是 `ORDER BY` 子句的第一个参数。 -For the example `Ticker` table, a definition like `ORDER BY rowtime ASC, price DESC` is valid but -`ORDER BY price, rowtime` or `ORDER BY rowtime DESC, price ASC` is not. +对于示例 `Ticker` 表,诸如 `ORDER BY rowtime ASC, price DESC` 的定义是有效的,但 `ORDER BY price, rowtime` 或者 `ORDER BY rowtime DESC, price ASC` 是无效的。 Define & Measures ----------------- -The `DEFINE` and `MEASURES` keywords have similar meanings to the `WHERE` and `SELECT` clauses in a -simple SQL query. +`DEFINE` 和 `MEASURES` 关键字与简单 SQL 查询中的 `WHERE` 和 `SELECT` 子句具有相近的含义。 -The `MEASURES` clause defines what will be included in the output of a matching pattern. It can -project columns and define expressions for evaluation. The number of produced rows depends on the -[output mode](#output-mode) setting. +`MEASURES` 子句定义匹配模式的输出中要包含哪些内容。它可以投影列并定义表达式进行计算。产生的行数取决于 [output mode](#output-mode) 设置。 -The `DEFINE` clause specifies conditions that rows have to fulfill in order to be classified to a -corresponding [pattern variable](#defining-a-pattern). If a condition is not defined for a pattern -variable, a default condition will be used which evaluates to `true` for every row. +`DEFINE` 子句指定行必须满足的条件才能被分类到相应的 [pattern variable](#defining-a-pattern)。如果没有为模式变量定义条件,则将使用对每一行的计算结果为 `true` 的默认条件。 -For a more detailed explanation about expressions that can be used in those clauses, please have a -look at the [event stream navigation](#pattern-navigation) section. +有关在这些子句中可使用的表达式的更详细的说明,请查看 [event stream navigation](#pattern-navigation) 部分。 ### Aggregations -Aggregations can be used in `DEFINE` and `MEASURES` clauses. Both -[built-in]({{ site.baseurl }}/dev/table/functions/systemFunctions.html) and custom -[user defined]({{ site.baseurl }}/dev/table/functions/udfs.html) functions are supported. +Aggregations 可以在 `DEFINE` 和 `MEASURES` 子句中使用。支持[内置函数]({% link dev/table/functions/systemFunctions.zh.md %})和[用户自定义函数]({% link dev/table/functions/udfs.zh.md %})。 -Aggregate functions are applied to each subset of rows mapped to a match. In order to understand -how those subsets are evaluated have a look at the [event stream navigation](#pattern-navigation) -section. +Aggregate functions 应用于映射到匹配项的行的每个子集。为了了解如何评估这些子集,请查看 [event stream navigation](#pattern-navigation) 部分。 Review comment: `Aggregate functions 应用于映射到匹配项的行的每个子集` 这句话能否优化一下呢?意思是可以了,不过连着两个 “的” 读起来有一点拗口 ########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -498,36 +426,32 @@ symbol rowtime price tax 'ACME' '01-Apr-11 13:20:00' 19 1 {% endhighlight %} -The query will produce the following results: +查询将生成以下结果: {% highlight text %} symbol dropTime dropDiff ====== ==================== ============= 'ACME' '01-Apr-11 13:00:00' 14 {% endhighlight %} -The resulting row represents a price drop from `15` (at `01-Apr-11 12:00:00`) to `1` (at -`01-Apr-11 13:00:00`). The `dropDiff` column contains the price difference. +结果行代表价格从`15`(在`01-Apr-11 12:00:00`)下降到`1`(在`01-Apr-11 13:00:00`)。`dropDiff` 列包含价格差异。 Review comment: ```suggestion 结果行代表价格从 `15`(在`01-Apr-11 12:00:00`)下降到 `1`(在`01-Apr-11 13:00:00`)。`dropDiff` 列包含价格差异。 ``` ########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -435,30 +370,23 @@ DEFINE C AS NOT condB() {% endhighlight %} -<span class="label label-danger">Attention</span> The optional reluctant quantifier (`A??` or -`A{0,1}?`) is not supported right now. +<span class="label label-danger">注意</span> 目前不支持可选的勉强量词(`A??` 或者 `A{0,1}?`)。 + +<a name="time-constraint"></a> -### Time constraint +### 时间约束 -Especially for streaming use cases, it is often required that a pattern finishes within a given -period of time. This allows for limiting the overall state size that Flink has to maintain -internally, even in case of greedy quantifiers. +特别是对于流的使用场景,通常需要在给定的时间内完成模式。这要求限制 Flink 必须在内部保持的总体状态大小,即使在贪婪的量词的情况下也是如此。 -Therefore, Flink SQL supports the additional (non-standard SQL) `WITHIN` clause for defining a time -constraint for a pattern. The clause can be defined after the `PATTERN` clause and takes an -interval of millisecond resolution. +因此,Flink SQL 支持附加的(非标准 SQL)`WITHIN` 子句来定义模式的时间约束。子句可以在 `PATTERN` 子句之后定义,并以毫秒为间隔进行解析。 -If the time between the first and last event of a potential match is longer than the given value, -such a match will not be appended to the result table. +如果潜在匹配的第一个和最后一个事件之间的时间长于给定值,则不会将这种匹配追加到结果表中。 -<span class="label label-info">Note</span> It is generally encouraged to use the `WITHIN` clause as -it helps Flink with efficient memory management. Underlying state can be pruned once the threshold -is reached. +<span class="label label-info">注意</span> 通常鼓励使用 `WITHIN` 子句,因为它有助于 Flink 进行有效的内存管理。一旦达到阈值,即可修剪基础状态。 -<span class="label label-danger">Attention</span> However, the `WITHIN` clause is not part of the -SQL standard. The recommended way of dealing with time constraints might change in the future. +<span class="label label-danger">注意</span> 然而,`WITHIN` 子句不是 SQL 标准的一部分。处理时间约束的方法已被提议将来可能会改变。 Review comment: `处理时间约束的方法已被提议将来可能会改变。` 能够修改一下,现在容易断句为 `处理时间` ########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -960,9 +861,9 @@ The query will produce different results based on which `AFTER MATCH` strategy w XYZ 17 2018-09-17 10:00:05 2018-09-17 10:00:06 {% endhighlight %} -The first result matched against the rows #1, #2, #3, #4. +第一个结果与#1,#2,#3,#4行匹配。 -The second result matched against the rows #5, #6. +第二个结果与#5,#6行匹配。 Review comment: ```suggestion 第二个结果与 #5,#6 行匹配。 ``` ########## File path: docs/dev/table/streaming/match_recognize.zh.md ########## @@ -558,43 +482,42 @@ For the following input rows: XYZ 2 11 2018-09-17 10:00:05 {% endhighlight %} -The query will produce the following output: +该查询将生成以下输出: {% highlight text %} symbol startPrice topPrice lastPrice ======== ============ ========== =========== XYZ 10 13 11 {% endhighlight %} -The pattern recognition is partitioned by the `symbol` column. Even though not explicitly mentioned -in the `MEASURES` clause, the partitioned column is added at the beginning of the result. +该模式识别由 `symbol` 列分区。即使在 `MEASURES` 子句中未明确提及,分区列仍会添加到结果的开头。 -Pattern Navigation +<a name="pattern-navigation"></a> + +模式导航 ------------------ -The `DEFINE` and `MEASURES` clauses allow for navigating within the list of rows that (potentially) -match a pattern. +`DEFINE` 和 `MEASURES` 子句允许在(可能)匹配模式的行列表中进行导航。 + +本节讨论用于声明条件或产生输出结果的导航。 + +<a name="pattern-variable-referencing"></a> + +### 引用模式变量 -This section discusses this navigation for declaring conditions or producing output results. +_引用模式变量_ 允许引用一组映射到 `DEFINE` 或 `MEASURES` 子句中特定模式变量的行。 -### Pattern Variable Referencing +如果 `DEFINE`/`MEASURES` 子句中的表达式需要一行(例如 `a.price` 或 `a.price>10`),它将选择属于相应集合的最后一个值。 Review comment: ```suggestion 如果 `DEFINE`/`MEASURES` 子句中的表达式需要一行(例如 `a.price` 或 `a.price > 10`),它将选择属于相应集合的最后一个值。 ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org