This is an automated email from the ASF dual-hosted git repository. jark pushed a commit to branch release-1.11 in repository https://gitbox.apache.org/repos/asf/flink.git
commit 6b1cc8660d6a7adfdb50665af78c077a387bee26 Author: Roc Marshal <[email protected]> AuthorDate: Fri Jul 17 11:58:55 2020 +0800 [FLINK-16087][docs-zh] Translate "Detecting Patterns" page of "Streaming Concepts" into Chinese This closes #12798 --- docs/dev/table/streaming/match_recognize.zh.md | 619 ++++++++++--------------- 1 file changed, 249 insertions(+), 370 deletions(-) diff --git a/docs/dev/table/streaming/match_recognize.zh.md b/docs/dev/table/streaming/match_recognize.zh.md index 54091d4..24b62d0 100644 --- a/docs/dev/table/streaming/match_recognize.zh.md +++ b/docs/dev/table/streaming/match_recognize.zh.md @@ -1,7 +1,7 @@ --- -title: 'Detecting Patterns in Tables' +title: '表中的模式检测' nav-parent_id: streaming_tableapi -nav-title: 'Detecting Patterns' +nav-title: '模式检测' nav-pos: 5 --- <!-- @@ -23,28 +23,17 @@ specific language governing permissions and limitations under the License. --> -It is a common use case to search for a set of event patterns, especially in case of data streams. -Flink comes with a [complex event processing (CEP) library]({{ site.baseurl }}/dev/libs/cep.html) -which allows for pattern detection in event streams. Furthermore, Flink's SQL API provides a -relational way of expressing queries with a large set of built-in functions and rule-based -optimizations that can be used out of the box. - -In December 2016, the International Organization for Standardization (ISO) released a new version -of the SQL standard which includes _Row Pattern Recognition in SQL_ -([ISO/IEC TR 19075-5:2016](https://standards.iso.org/ittf/PubliclyAvailableStandards/c065143_ISO_IEC_TR_19075-5_2016.zip)). -It allows Flink to consolidate CEP and SQL API using the `MATCH_RECOGNIZE` clause for complex event -processing in SQL. - -A `MATCH_RECOGNIZE` clause enables the following tasks: -* Logically partition and order the data that is used with the `PARTITION BY` and `ORDER BY` - clauses. -* Define patterns of rows to seek using the `PATTERN` clause. These patterns use a syntax similar to - that of regular expressions. -* The logical components of the row pattern variables are specified in the `DEFINE` clause. -* Define measures, which are expressions usable in other parts of the SQL query, in the `MEASURES` - clause. - -The following example illustrates the syntax for basic pattern recognition: +搜索一组事件模式(event pattern)是一种常见的用例,尤其是在数据流情景中。Flink 提供[复杂事件处理(CEP)库]({% link dev/libs/cep.zh.md %}),该库允许在事件流中进行模式检测。此外,Flink 的 SQL API 提供了一种关系式的查询表达方式,其中包含大量内置函数和基于规则的优化,可以开箱即用。 + +2016 年 12 月,国际标准化组织(ISO)发布了新版本的 SQL 标准,其中包括在 _SQL 中的行模式识别(Row Pattern Recognition in SQL)_([ISO/IEC TR 19075-5:2016](https://standards.iso.org/ittf/PubliclyAvailableStandards/c065143_ISO_IEC_TR_19075-5_2016.zip))。它允许 Flink 使用 `MATCH_RECOGNIZE` 子句融合 CEP 和 SQL API,以便在 SQL 中进行复杂事件处理。 + +`MATCH_RECOGNIZE` 子句启用以下任务: +* 使用 `PARTITION BY` 和 `ORDER BY` 子句对数据进行逻辑分区和排序。 +* 使用 `PATTERN` 子句定义要查找的行模式。这些模式使用类似于正则表达式的语法。 +* 在 `DEFINE` 子句中指定行模式变量的逻辑组合。 +* measures 是指在 `MEASURES` 子句中定义的表达式,这些表达式可用于 SQL 查询中的其他部分。 + +下面的示例演示了基本模式识别的语法: {% highlight sql %} SELECT T.aid, T.bid, T.cid @@ -64,24 +53,23 @@ FROM MyTable ) AS T {% endhighlight %} -This page will explain each keyword in more detail and will illustrate more complex examples. +本页将更详细地解释每个关键字,并演示说明更复杂的示例。 -{% info Notice %} Flink's implementation of the `MATCH_RECOGNIZE` -clause is a subset of the full standard. Only those features documented in the following sections -are supported. Additional features may be supported based on community feeback, please also take a look at the -[known limitations](#known-limitations). +{% info 注意 %} Flink 的 `MATCH_RECOGNIZE` 子句实现是一个完整标准子集。仅支持以下部分中记录的功能。基于社区反馈,可能会支持其他功能,请查看[已知的局限](#known-limitations)。 * This will be replaced by the TOC {:toc} -Introduction and Examples +<a name="introduction-and-examples"></a> + +介绍和示例 ------------------------- -### Installation Guide +<a name="installation-guide"></a> -The pattern recognition feature uses the Apache Flink's CEP library internally. In order to be able -to use the `MATCH_RECOGNIZE` clause, the library needs to be added as a dependency to your Maven -project. +### 安装指南 + +模式识别特性使用 Apache Flink 内部的 CEP 库。为了能够使用 `MATCH_RECOGNIZE` 子句,需要将库作为依赖项添加到 Maven 项目中。 {% highlight xml %} <dependency> @@ -91,52 +79,43 @@ project. </dependency> {% endhighlight %} -Alternatively, you can also add the dependency to the cluster classpath (see the -[dependency section]({{ site.baseurl}}/dev/project-configuration.html) for more information). +或者,也可以将依赖项添加到集群的 classpath(查看 [dependency section]({% link dev/project-configuration.zh.md %}) 获取更多相关依赖信息)。 + +如果你想在 [SQL Client]({% link dev/table/sqlClient.zh.md %}) 中使用 `MATCH_RECOGNIZE` 子句,你无需执行任何操作,因为默认情况下包含所有依赖项。 + +<a name="sql-semantics"></a> -If you want to use the `MATCH_RECOGNIZE` clause in the -[SQL Client]({{ site.baseurl}}/dev/table/sqlClient.html), you don't have to do anything as all the -dependencies are included by default. +### SQL 语义 -### SQL Semantics +每个 `MATCH_RECOGNIZE` 查询都包含以下子句: -Every `MATCH_RECOGNIZE` query consists of the following clauses: +* [PARTITION BY](#partitioning) - 定义表的逻辑分区;类似于 `GROUP BY` 操作。 +* [ORDER BY](#order-of-events) - 指定传入行的排序方式;这是必须的,因为模式依赖于顺序。 +* [MEASURES](#define--measures) - 定义子句的输出;类似于 `SELECT` 子句。 +* [ONE ROW PER MATCH](#output-mode) - 输出方式,定义每个匹配项应产生多少行。 +* [AFTER MATCH SKIP](#after-match-strategy) - 指定下一个匹配的开始位置;这也是控制单个事件可以属于多少个不同匹配项的方法。 +* [PATTERN](#defining-a-pattern) - 允许使用类似于 _正则表达式_ 的语法构造搜索的模式。 +* [DEFINE](#define--measures) - 本部分定义了模式变量必须满足的条件。 -* [PARTITION BY](#partitioning) - defines the logical partitioning of the table; similar to a - `GROUP BY` operation. -* [ORDER BY](#order-of-events) - specifies how the incoming rows should be ordered; this is - essential as patterns depend on an order. -* [MEASURES](#define--measures) - defines output of the clause; similar to a `SELECT` clause. -* [ONE ROW PER MATCH](#output-mode) - output mode which defines how many rows per match should be - produced. -* [AFTER MATCH SKIP](#after-match-strategy) - specifies where the next match should start; this is - also a way to control how many distinct matches a single event can belong to. -* [PATTERN](#defining-a-pattern) - allows constructing patterns that will be searched for using a - _regular expression_-like syntax. -* [DEFINE](#define--measures) - this section defines the conditions that the pattern variables must - satisfy. +<span class="label label-danger">注意</span> 目前,`MATCH_RECOGNIZE` 子句只能应用于[追加表]({% link dev/table/streaming/dynamic_tables.zh.md %}#update-and-append-queries)。此外,它也总是生成一个追加表。 -<span class="label label-danger">Attention</span> Currently, the `MATCH_RECOGNIZE` clause can only -be applied to an [append table](dynamic_tables.html#update-and-append-queries). Furthermore, it -always produces an append table as well. +<a name="examples"></a> -### Examples +### 示例 -For our examples, we assume that a table `Ticker` has been registered. The table contains prices of -stocks at a particular point in time. +对于我们的示例,我们假设已经注册了一个表 `Ticker`。该表包含特定时间点的股票价格。 -The table has a following schema: +这张表的 schema 如下: {% highlight text %} Ticker - |-- symbol: String # symbol of the stock - |-- price: Long # price of the stock - |-- tax: Long # tax liability of the stock - |-- rowtime: TimeIndicatorTypeInfo(rowtime) # point in time when the change to those values happened + |-- symbol: String # 股票的代号 + |-- price: Long # 股票的价格 + |-- tax: Long # 股票应纳税额 + |-- rowtime: TimeIndicatorTypeInfo(rowtime) # 更改这些值的时间点 {% endhighlight %} -For simplification, we only consider the incoming data for a single stock `ACME`. A ticker could -look similar to the following table where rows are continuously appended. +为了简化,我们只考虑单个股票 `ACME` 的传入数据。Ticker 可以类似于下表,其中的行是连续追加的。 {% highlight text %} symbol rowtime price tax @@ -154,8 +133,7 @@ symbol rowtime price tax 'ACME' '01-Apr-11 10:00:10' 19 1 {% endhighlight %} -The task is now to find periods of a constantly decreasing price of a single ticker. For this, one -could write a query like: +现在的任务是找出一个单一股票价格不断下降的时期。为此,可以编写如下查询: {% highlight sql %} SELECT * @@ -179,31 +157,19 @@ FROM Ticker ) MR; {% endhighlight %} -The query partitions the `Ticker` table by the `symbol` column and orders it by the `rowtime` -time attribute. +此查询将 `Ticker` 表按照 `symbol` 列进行分区并按照 `rowtime` 属性进行排序。 -The `PATTERN` clause specifies that we are interested in a pattern with a starting event `START_ROW` -that is followed by one or more `PRICE_DOWN` events and concluded with a `PRICE_UP` event. If such -a pattern can be found, the next pattern match will be seeked at the last `PRICE_UP` event as -indicated by the `AFTER MATCH SKIP TO LAST` clause. +`PATTERN` 子句指定我们对以下模式感兴趣:该模式具有开始事件 `START_ROW`,然后是一个或多个 `PRICE_DOWN` 事件,并以 `PRICE_UP` 事件结束。如果可以找到这样的模式,如 `AFTER MATCH SKIP TO LAST` 子句所示,则从最后一个 `PRICE_UP` 事件开始寻找下一个模式匹配。 -The `DEFINE` clause specifies the conditions that need to be met for a `PRICE_DOWN` and `PRICE_UP` -event. Although the `START_ROW` pattern variable is not present it has an implicit condition that -is evaluated always as `TRUE`. +`DEFINE` 子句指定 `PRICE_DOWN` 和 `PRICE_UP` 事件需要满足的条件。尽管不存在 `START_ROW` 模式变量,但它具有一个始终被评估为 `TRUE` 隐式条件。 -A pattern variable `PRICE_DOWN` is defined as a row with a price that is smaller than the price of -the last row that met the `PRICE_DOWN` condition. For the initial case or when there is no last row -that met the `PRICE_DOWN` condition, the price of the row should be smaller than the price of the -preceding row in the pattern (referenced by `START_ROW`). +模式变量 `PRICE_DOWN` 定义为价格小于满足 `PRICE_DOWN` 条件的最后一行。对于初始情况或没有满足 `PRICE_DOWN` 条件的最后一行时,该行的价格应小于该模式中前一行(由 `START_ROW` 引用)的价格。 -A pattern variable `PRICE_UP` is defined as a row with a price that is larger than the price of the -last row that met the `PRICE_DOWN` condition. +模式变量 `PRICE_UP` 定义为价格大于满足 `PRICE_DOWN` 条件的最后一行。 -This query produces a summary row for each period in which the price of a stock was continuously -decreasing. +此查询为股票价格持续下跌的每个期间生成摘要行。 -The exact representation of the output rows is defined in the `MEASURES` part of the query. The -number of output rows is defined by the `ONE ROW PER MATCH` output mode. +在查询的 `MEASURES` 子句部分定义确切的输出行信息。输出行数由 `ONE ROW PER MATCH` 输出方式定义。 {% highlight text %} symbol start_tstamp bottom_tstamp end_tstamp @@ -211,67 +177,48 @@ number of output rows is defined by the `ONE ROW PER MATCH` output mode. ACME 01-APR-11 10:00:04 01-APR-11 10:00:07 01-APR-11 10:00:08 {% endhighlight %} -The resulting row describes a period of falling prices that started at `01-APR-11 10:00:04` and -achieved the lowest price at `01-APR-11 10:00:07` that increased again at `01-APR-11 10:00:08`. +该行结果描述了从 `01-APR-11 10:00:04` 开始的价格下跌期,在 `01-APR-11 10:00:07` 达到最低价格,到 `01-APR-11 10:00:08` 再次上涨。 -Partitioning +<a name="partitioning"></a> + +分区 ------------ -It is possible to look for patterns in partitioned data, e.g., trends for a single ticker or a -particular user. This can be expressed using the `PARTITION BY` clause. The clause is similar to -using `GROUP BY` for aggregations. +可以在分区数据中寻找模式,例如单个股票行情或特定用户的趋势。这可以用 `PARTITION BY` 子句来表示。该子句类似于对 aggregation 使用 `GROUP BY`。 + +<span class="label label-danger">注意</span> 强烈建议对传入的数据进行分区,否则 `MATCH_RECOGNIZE` 子句将被转换为非并行算子,以确保全局排序。 -<span class="label label-danger">Attention</span> It is highly advised to partition the incoming -data because otherwise the `MATCH_RECOGNIZE` clause will be translated into a non-parallel operator -to ensure global ordering. +<a name="order-of-events"></a> -Order of Events +事件顺序 --------------- -Apache Flink allows for searching for patterns based on time; either -[processing time or event time](time_attributes.html). +Apache Flink 可以根据时间([处理时间或者事件时间]({% link dev/table/streaming/time_attributes.zh.md %}))进行模式搜索。 -In case of event time, the events are sorted before they are passed to the internal pattern state -machine. As a consequence, the produced output will be correct regardless of the order in which -rows are appended to the table. Instead, the pattern is evaluated in the order specified by the -time contained in each row. +如果是事件时间,则在将事件传递到内部模式状态机之前对其进行排序。所以,无论行添加到表的顺序如何,生成的输出都是正确的。而模式是按照每行中所包含的时间指定顺序计算的。 -The `MATCH_RECOGNIZE` clause assumes a [time attribute](time_attributes.html) with ascending -ordering as the first argument to `ORDER BY` clause. +`MATCH_RECOGNIZE` 子句假定升序的 [时间属性]({% link dev/table/streaming/time_attributes.zh.md %}) 是 `ORDER BY` 子句的第一个参数。 -For the example `Ticker` table, a definition like `ORDER BY rowtime ASC, price DESC` is valid but -`ORDER BY price, rowtime` or `ORDER BY rowtime DESC, price ASC` is not. +对于示例 `Ticker` 表,诸如 `ORDER BY rowtime ASC, price DESC` 的定义是有效的,但 `ORDER BY price, rowtime` 或者 `ORDER BY rowtime DESC, price ASC` 是无效的。 Define & Measures ----------------- -The `DEFINE` and `MEASURES` keywords have similar meanings to the `WHERE` and `SELECT` clauses in a -simple SQL query. +`DEFINE` 和 `MEASURES` 关键字与简单 SQL 查询中的 `WHERE` 和 `SELECT` 子句具有相近的含义。 -The `MEASURES` clause defines what will be included in the output of a matching pattern. It can -project columns and define expressions for evaluation. The number of produced rows depends on the -[output mode](#output-mode) setting. +`MEASURES` 子句定义匹配模式的输出中要包含哪些内容。它可以投影列并定义表达式进行计算。产生的行数取决于[输出方式](#output-mode)设置。 -The `DEFINE` clause specifies conditions that rows have to fulfill in order to be classified to a -corresponding [pattern variable](#defining-a-pattern). If a condition is not defined for a pattern -variable, a default condition will be used which evaluates to `true` for every row. +`DEFINE` 子句指定行必须满足的条件才能被分类到相应的[模式变量](#defining-a-pattern)。如果没有为模式变量定义条件,则将对每一行使用计算结果为 `true` 的默认条件。 -For a more detailed explanation about expressions that can be used in those clauses, please have a -look at the [event stream navigation](#pattern-navigation) section. +有关在这些子句中可使用的表达式的更详细的说明,请查看[事件流导航](#pattern-navigation)部分。 ### Aggregations -Aggregations can be used in `DEFINE` and `MEASURES` clauses. Both -[built-in]({{ site.baseurl }}/dev/table/functions/systemFunctions.html) and custom -[user defined]({{ site.baseurl }}/dev/table/functions/udfs.html) functions are supported. +Aggregations 可以在 `DEFINE` 和 `MEASURES` 子句中使用。支持[内置函数]({% link dev/table/functions/systemFunctions.zh.md %})和[用户自定义函数]({% link dev/table/functions/udfs.zh.md %})。 -Aggregate functions are applied to each subset of rows mapped to a match. In order to understand -how those subsets are evaluated have a look at the [event stream navigation](#pattern-navigation) -section. +对相应匹配项的行子集可以使用 Aggregate functions。请查看[事件流导航](#pattern-navigation)部分以了解如何计算这些子集。 -The task of the following example is to find the longest period of time for which the average price -of a ticker did not go below certain threshold. It shows how expressible `MATCH_RECOGNIZE` can -become with aggregations. This task can be performed with the following query: +下面这个示例的任务是找出股票平均价格没有低于某个阈值的最长时间段。它展示了 `MATCH_RECOGNIZE` 在 aggregation 中的可表达性。可以使用以下查询执行此任务: {% highlight sql %} SELECT * @@ -291,7 +238,7 @@ FROM Ticker ) MR; {% endhighlight %} -Given this query and following input values: +给定此查询和以下输入值: {% highlight text %} symbol rowtime price tax @@ -310,10 +257,8 @@ symbol rowtime price tax 'ACME' '01-Apr-11 10:00:11' 30 1 {% endhighlight %} -The query will accumulate events as part of the pattern variable `A` as long as the average price -of them does not exceed `15`. For example, such a limit exceeding happens at `01-Apr-11 10:00:04`. -The following period exceeds the average price of `15` again at `01-Apr-11 10:00:11`. Thus the -results for said query will be: +只要事件的平均价格不超过 `15`,查询就会将事件作为模式变量 `A` 的一部分进行累积。 +例如,这种限制发生在 `01-Apr-11 10:00:04`。接下来的时间段在 `01-Apr-11 10:00:11` 再次超过平均价格 `15`。因此,所述查询的结果将是: {% highlight text %} symbol start_tstamp end_tstamp avgPrice @@ -322,52 +267,47 @@ ACME 01-APR-11 10:00:00 01-APR-11 10:00:03 14.5 ACME 01-APR-11 10:00:05 01-APR-11 10:00:10 13.5 {% endhighlight %} -<span class="label label-info">Note</span> Aggregations can be applied to expressions, but only if -they reference a single pattern variable. Thus `SUM(A.price * A.tax)` is a valid one, but -`AVG(A.price * B.tax)` is not. +<span class="label label-info">注意</span> Aggregation 可以应用于表达式,但前提是它们引用单个模式变量。因此,`SUM(A.price * A.tax)` 是有效的,而 `AVG(A.price * B.tax)` 则是无效的。 -<span class="label label-danger">Attention</span> `DISTINCT` aggregations are not supported. +<span class="label label-danger">注意</span> 不支持 `DISTINCT` aggregation。 -Defining a Pattern +<a name="defining-a-pattern"></a> + +定义模式 ------------------ -The `MATCH_RECOGNIZE` clause allows users to search for patterns in event streams using a powerful -and expressive syntax that is somewhat similar to the widespread regular expression syntax. +`MATCH_RECOGNIZE` 子句允许用户在事件流中使用功能强大、表达力强的语法搜索模式,这种语法与广泛使用的正则表达式语法有些相似。 -Every pattern is constructed from basic building blocks, called _pattern variables_, to which -operators (quantifiers and other modifiers) can be applied. The whole pattern must be enclosed in -brackets. +每个模式都是由基本的构建块构造的,称为 _模式变量_,可以应用算子(量词和其他修饰符)到这些模块中。整个模式必须用括号括起来。 -An example pattern could look like: +示例模式如下所示: {% highlight sql %} PATTERN (A B+ C* D) {% endhighlight %} -One may use the following operators: +可以使用以下算子: + +* _Concatenation_ - 像 `(A B)` 这样的模式意味着 `A` 和 `B` 之间的连接是严格的。因此,在它们之间不能存在没有映射到 `A` 或 `B` 的行。 +* _Quantifiers_ - 修改可以映射到模式变量的行数。 + * `*` — _0_ 或者多行 + * `+` — _1_ 或者多行 + * `?` — _0_ 或者 _1_ 行 + * `{ n }` — 严格 _n_ 行(_n > 0_) + * `{ n, }` — _n_ 或者更多行(_n ≥ 0_) + * `{ n, m }` — 在 _n_ 到 _m_(包含)行之间(_0 ≤ n ≤ m,0 < m_) + * `{ , m }` — 在 _0_ 到 _m_(包含)行之间(_m > 0_) -* _Concatenation_ - a pattern like `(A B)` means that the contiguity is strict between `A` and `B`. - Therefore, there can be no rows that were not mapped to `A` or `B` in between. -* _Quantifiers_ - modify the number of rows that can be mapped to the pattern variable. - * `*` — _0_ or more rows - * `+` — _1_ or more rows - * `?` — _0_ or _1_ rows - * `{ n }` — exactly _n_ rows (_n > 0_) - * `{ n, }` — _n_ or more rows (_n ≥ 0_) - * `{ n, m }` — between _n_ and _m_ (inclusive) rows (_0 ≤ n ≤ m, 0 < m_) - * `{ , m }` — between _0_ and _m_ (inclusive) rows (_m > 0_) -<span class="label label-danger">Attention</span> Patterns that can potentially produce an empty -match are not supported. Examples of such patterns are `PATTERN (A*)`, `PATTERN (A? B*)`, -`PATTERN (A{0,} B{0,} C*)`, etc. +<span class="label label-danger">注意</span> 不支持可能产生空匹配的模式。此类模式的示例如 `PATTERN (A*)`,`PATTERN (A? B*)`,`PATTERN (A{0,} B{0,} C*)` 等。 -### Greedy & Reluctant Quantifiers +<a name="greedy--reluctant-quantifiers"></a> -Each quantifier can be either _greedy_ (default behavior) or _reluctant_. Greedy quantifiers try to -match as many rows as possible while reluctant quantifiers try to match as few as possible. +### 贪婪量词和勉强量词 -In order to illustrate the difference, one can view the following example with a query where a -greedy quantifier is applied to the `B` variable: +每一个量词可以是 _贪婪_(默认行为)的或者 _勉强_ 的。贪婪的量词尝试匹配尽可能多的行,而勉强的量词则尝试匹配尽可能少的行。 + +为了说明区别,可以通过查询查看以下示例,其中贪婪量词应用于 `B` 变量: {% highlight sql %} SELECT * @@ -387,7 +327,7 @@ FROM Ticker ) {% endhighlight %} -Given we have the following input: +假设我们有以下输入: {% highlight text %} symbol tax price rowtime @@ -400,7 +340,7 @@ Given we have the following input: XYZ 2 16 2018-09-17 10:00:07 {% endhighlight %} -The pattern above will produce the following output: +上面的模式将产生以下输出: {% highlight text %} symbol lastPrice @@ -408,8 +348,7 @@ The pattern above will produce the following output: XYZ 16 {% endhighlight %} -The same query where `B*` is modified to `B*?`, which means that `B*` should be reluctant, will -produce: +将 `B*` 修改为 `B*?` 的同一查询,这意味着 `B*` 应该是勉强的,将产生: {% highlight text %} symbol lastPrice @@ -418,13 +357,9 @@ produce: XYZ 16 {% endhighlight %} -The pattern variable `B` matches only to the row with price `12` instead of swallowing the rows -with prices `12`, `13`, and `14`. +模式变量 `B` 只匹配价格为 `12` 的行,而不是包含价格为 `12`、`13` 和 `14` 的行。 -<span class="label label-danger">Attention</span> It is not possible to use a greedy quantifier for -the last variable of a pattern. Thus, a pattern like `(A B*)` is not allowed. This can be easily -worked around by introducing an artificial state (e.g. `C`) that has a negated condition of `B`. So -you could use a query like: +<span class="label label-danger">注意</span> 模式的最后一个变量不能使用贪婪量词。因此,不允许使用类似 `(A B*)` 的模式。通过引入条件为 `B` 的人工状态(例如 `C`),可以轻松解决此问题。因此,你可以使用类似以下的查询: {% highlight sql %} PATTERN (A B* C) @@ -434,30 +369,23 @@ DEFINE C AS NOT condB() {% endhighlight %} -<span class="label label-danger">Attention</span> The optional reluctant quantifier (`A??` or -`A{0,1}?`) is not supported right now. +<span class="label label-danger">注意</span> 目前不支持可选的勉强量词(`A??` 或者 `A{0,1}?`)。 + +<a name="time-constraint"></a> -### Time constraint +### 时间约束 -Especially for streaming use cases, it is often required that a pattern finishes within a given -period of time. This allows for limiting the overall state size that Flink has to maintain -internally, even in case of greedy quantifiers. +特别是对于流的使用场景,通常需要在给定的时间内完成模式。这要求限制住 Flink 在内部必须保持的状态总体大小(即已经过期的状态就不需要再维护了),即使在贪婪的量词的情况下也是如此。 -Therefore, Flink SQL supports the additional (non-standard SQL) `WITHIN` clause for defining a time -constraint for a pattern. The clause can be defined after the `PATTERN` clause and takes an -interval of millisecond resolution. +因此,Flink SQL 支持附加的(非标准 SQL)`WITHIN` 子句来定义模式的时间约束。子句可以在 `PATTERN` 子句之后定义,并以毫秒为间隔进行解析。 -If the time between the first and last event of a potential match is longer than the given value, -such a match will not be appended to the result table. +如果潜在匹配的第一个和最后一个事件之间的时间长于给定值,则不会将这种匹配追加到结果表中。 -<span class="label label-info">Note</span> It is generally encouraged to use the `WITHIN` clause as -it helps Flink with efficient memory management. Underlying state can be pruned once the threshold -is reached. +<span class="label label-info">注意</span> 通常鼓励使用 `WITHIN` 子句,因为它有助于 Flink 进行有效的内存管理。一旦达到阈值,即可修剪基础状态。 -<span class="label label-danger">Attention</span> However, the `WITHIN` clause is not part of the -SQL standard. The recommended way of dealing with time constraints might change in the future. +<span class="label label-danger">注意</span> 然而,`WITHIN` 子句不是 SQL 标准的一部分。时间约束处理的方法已被提议将来可能会改变。 -The use of the `WITHIN` clause is illustrated in the following example query: +下面的示例查询说明了 `WITHIN` 子句的用法: {% highlight sql %} SELECT * @@ -477,9 +405,9 @@ FROM Ticker ) {% endhighlight %} -The query detects a price drop of `10` that happens within an interval of 1 hour. +该查询检测到在 1 小时的间隔内价格下降了 `10`。 -Let's assume the query is used to analyze the following ticker data: +假设该查询用于分析以下股票数据: {% highlight text %} symbol rowtime price tax @@ -497,7 +425,7 @@ symbol rowtime price tax 'ACME' '01-Apr-11 13:20:00' 19 1 {% endhighlight %} -The query will produce the following results: +查询将生成以下结果: {% highlight text %} symbol dropTime dropDiff @@ -505,28 +433,24 @@ symbol dropTime dropDiff 'ACME' '01-Apr-11 13:00:00' 14 {% endhighlight %} -The resulting row represents a price drop from `15` (at `01-Apr-11 12:00:00`) to `1` (at -`01-Apr-11 13:00:00`). The `dropDiff` column contains the price difference. +结果行代表价格从 `15`(在`01-Apr-11 12:00:00`)下降到 `1`(在`01-Apr-11 13:00:00`)。`dropDiff` 列包含价格差异。 + +请注意,即使价格也下降了较高的值,例如,下降了 `11`(在 `01-Apr-11 10:00:00` 和 `01-Apr-11 11:40:00` 之间),这两个事件之间的时间差大于 1 小时。因此,它们不会产生匹配。 -Notice that even though prices also drop by higher values, for example, by `11` (between -`01-Apr-11 10:00:00` and `01-Apr-11 11:40:00`), the time difference between those two events is -larger than 1 hour. Thus, they don't produce a match. +<a name="output-mode"></a> -Output Mode +输出方式 ----------- -The _output mode_ describes how many rows should be emitted for every found match. The SQL standard -describes two modes: +_输出方式_ 描述每个找到的匹配项应该输出多少行。SQL 标准描述了两种方式: - `ALL ROWS PER MATCH` -- `ONE ROW PER MATCH`. +- `ONE ROW PER MATCH` -Currently, the only supported output mode is `ONE ROW PER MATCH` that will always produce one -output summary row for each found match. +目前,唯一支持的输出方式是 `ONE ROW PER MATCH`,它将始终为每个找到的匹配项生成一个输出摘要行。 -The schema of the output row will be a concatenation of -`[partitioning columns] + [measures columns]` in that particular order. +输出行的 schema 将是按特定顺序连接 `[partitioning columns] + [measures columns]`。 -The following example shows the output of a query defined as: +以下示例显示了所定义的查询的输出: {% highlight sql %} SELECT * @@ -546,7 +470,7 @@ FROM Ticker ) {% endhighlight %} -For the following input rows: +对于以下输入行: {% highlight text %} symbol tax price rowtime @@ -557,7 +481,7 @@ For the following input rows: XYZ 2 11 2018-09-17 10:00:05 {% endhighlight %} -The query will produce the following output: +该查询将生成以下输出: {% highlight text %} symbol startPrice topPrice lastPrice @@ -565,35 +489,32 @@ The query will produce the following output: XYZ 10 13 11 {% endhighlight %} -The pattern recognition is partitioned by the `symbol` column. Even though not explicitly mentioned -in the `MEASURES` clause, the partitioned column is added at the beginning of the result. +该模式识别由 `symbol` 列分区。即使在 `MEASURES` 子句中未明确提及,分区列仍会添加到结果的开头。 -Pattern Navigation +<a name="pattern-navigation"></a> + +模式导航 ------------------ -The `DEFINE` and `MEASURES` clauses allow for navigating within the list of rows that (potentially) -match a pattern. +`DEFINE` 和 `MEASURES` 子句允许在(可能)匹配模式的行列表中进行导航。 + +本节讨论用于声明条件或产生输出结果的导航。 -This section discusses this navigation for declaring conditions or producing output results. +<a name="pattern-variable-referencing"></a> -### Pattern Variable Referencing +### 引用模式变量 -A _pattern variable reference_ allows a set of rows mapped to a particular pattern variable in the -`DEFINE` or `MEASURES` clauses to be referenced. +_引用模式变量_ 允许引用一组映射到 `DEFINE` 或 `MEASURES` 子句中特定模式变量的行。 -For example, the expression `A.price` describes a set of rows mapped so far to `A` plus the current -row if we try to match the current row to `A`. If an expression in the `DEFINE`/`MEASURES` clause -requires a single row (e.g. `A.price` or `A.price > 10`), it selects the last value belonging to -the corresponding set. +例如,如果我们尝试将当前行与 `A` 进行匹配,则表达式 `A.price` 描述了目前为止已映射到 `A` 的一组行加上当前行。如果 `DEFINE`/`MEASURES` 子句中的表达式需要一行(例如 `a.price` 或 `a.price > 10`),它将选择属于相应集合的最后一个值。 -If no pattern variable is specified (e.g. `SUM(price)`), an expression references the default -pattern variable `*` which references all variables in the pattern. In other words, it creates a -list of all the rows mapped so far to any variable plus the current row. +如果没有指定模式变量(例如 `SUM(price)`),则表达式引用默认模式变量 `*`,该变量引用模式中的所有变量。换句话说,它创建了一个列表,其中列出了迄今为止映射到任何变量的所有行以及当前行。 -#### Example +<a name="example"></a> -For a more thorough example, one can take a look at the following pattern and corresponding -conditions: +#### 示例 + +对于更全面的示例,可以查看以下模式和相应的条件: {% highlight sql %} PATTERN (A B+) @@ -602,18 +523,14 @@ DEFINE B AS B.price > A.price AND SUM(price) < 100 AND SUM(B.price) < 80 {% endhighlight %} -The following table describes how those conditions are evaluated for each incoming event. +下表描述了如何为每个传入事件计算这些条件。 -The table consists of the following columns: - * `#` - the row identifier that uniquely identifies an incoming row in the lists - `[A.price]`/`[B.price]`/`[price]`. - * `price` - the price of the incoming row. - * `[A.price]`/`[B.price]`/`[price]` - describe lists of rows which are used in the `DEFINE` - clause to evaluate conditions. - * `Classifier` - the classifier of the current row which indicates the pattern variable the row - is mapped to. - * `A.price`/`B.price`/`SUM(price)`/`SUM(B.price)` - describes the result after those expressions - have been evaluated. +该表由以下列组成: + * `#` - 行标识符,用于唯一标识列表中的传入行 `[A.price]`/`[B.price]`/`[price]`。 + * `price` - 传入行的价格。 + * `[A.price]`/`[B.price]`/`[price]` - 描述 `DEFINE` 子句中用于计算条件的行列表。 + * `Classifier` - 当前行的分类器,指示该行映射到的模式变量。 + * `A.price`/`B.price`/`SUM(price)`/`SUM(B.price)` - 描述了这些表达式求值后的结果。 <table class="table table-bordered"> <thead> @@ -694,21 +611,17 @@ The table consists of the following columns: </tbody> </table> -As can be seen in the table, the first row is mapped to pattern variable `A` and subsequent rows -are mapped to pattern variable `B`. However, the last row does not fulfill the `B` condition -because the sum over all mapped rows `SUM(price)` and the sum over all rows in `B` exceed the -specified thresholds. +从表中可以看出,第一行映射到模式变量 `A`,随后的行映射到模式变量 `B`。但是,最后一行不满足 `B` 条件,因为所有映射行 `SUM(price)` 的总和与 `B` 中所有行的总和都超过了指定的阈值。 ### Logical Offsets -_Logical offsets_ enable navigation within the events that were mapped to a particular pattern -variable. This can be expressed with two corresponding functions: +_Logical offsets_ 在映射到指定模式变量的事件启用导航。这可以用两个相应的函数表示: <table class="table table-bordered"> <thead> <tr> <th class="text-left" style="width: 40%">Offset functions</th> - <th class="text-center">Description</th> + <th class="text-center">描述</th> </tr> </thead> <tbody> @@ -719,8 +632,7 @@ LAST(variable.field, n) {% endhighlight %} </td> <td> - <p>Returns the value of the field from the event that was mapped to the <i>n</i>-th - <i>last</i> element of the variable. The counting starts at the last element mapped.</p> + <p>返回映射到变量最后 n 个元素的事件中的字段值。计数从映射的最后一个元素开始。</p> </td> </tr> <tr> @@ -730,17 +642,17 @@ FIRST(variable.field, n) {% endhighlight %} </td> <td> - <p>Returns the value of the field from the event that was mapped to the <i>n</i>-th element - of the variable. The counting starts at the first element mapped.</p> + <p>返回映射到变量的第 <i>n</i> 个元素的事件中的字段值。计数从映射的第一个元素开始。</p> </td> </tr> </tbody> </table> -#### Examples +<a name="examples-1"></a> + +#### 示例 -For a more thorough example, one can take a look at the following pattern and corresponding -conditions: +对于更全面的示例,可以参考以下模式和相应的条件: {% highlight sql %} PATTERN (A B+) @@ -750,14 +662,12 @@ DEFINE (LAST(B.price, 2) IS NULL OR B.price > 2 * LAST(B.price, 2)) {% endhighlight %} -The following table describes how those conditions are evaluated for each incoming event. +下表描述了如何为每个传入事件计算这些条件。 -The table consists of the following columns: - * `price` - the price of the incoming row. - * `Classifier` - the classifier of the current row which indicates the pattern variable the row - is mapped to. - * `LAST(B.price, 1)`/`LAST(B.price, 2)` - describes the result after those expressions have been - evaluated. +该表包括以下列: + * `price` - 传入行的价格。 + * `Classifier` - 当前行的分类器,指示该行映射到的模式变量。 + * `LAST(B.price, 1)`/`LAST(B.price, 2)` - 描述对这些表达式求值后的结果。 <table class="table table-bordered"> <thead> @@ -782,8 +692,7 @@ The table consists of the following columns: <td>-> B</td> <td>null</td> <td>null</td> - <td>Notice that <code>LAST(A.price, 1)</code> is null because there is still nothing mapped - to <code>B</code>.</td> + <td>注意 <code>LAST(A.price, 1)</code> 为空,因为仍然没有映射到 <code>B</code>。</td> </tr> <tr> <td>20</td> @@ -804,14 +713,14 @@ The table consists of the following columns: <td></td> <td>31</td> <td>20</td> - <td>Not mapped because <code>35 < 2 * 20</code>.</td> + <td>因为 <code>35 < 2 * 20</code> 没有映射。</td> </tr> </tbody> </table> -It might also make sense to use the default pattern variable with logical offsets. +将默认模式变量与 logical offsets 一起使用也可能很有意义。 -In this case, an offset considers all the rows mapped so far: +在这种情况下,offset 会包含到目前为止映射的所有行: {% highlight sql %} PATTERN (A B? C) @@ -846,13 +755,12 @@ DEFINE <td>20</td> <td>-> C</td> <td>15</td> - <td><code>LAST(price, 1)</code> is evaluated as the price of the row mapped to the - <code>B</code> variable.</td> + <td><code>LAST(price, 1)</code> 被计算为映射到 <code>B</code> 变量的行的价格。</td> </tr> </tbody> </table> -If the second row did not map to the `B` variable, we would have the following results: +如果第二行没有映射到 `B` 变量,则会得到以下结果: <table class="table table-bordered"> <thead> @@ -874,45 +782,37 @@ If the second row did not map to the `B` variable, we would have the following r <td>20</td> <td>-> C</td> <td>10</td> - <td><code>LAST(price, 1)</code> is evaluated as the price of the row mapped to the - <code>A</code> variable.</td> + <td><code>LAST(price, 1)</code> 被计算为映射到 <code>A</code> 变量的行的价格。</td> </tr> </tbody> </table> -It is also possible to use multiple pattern variable references in the first argument of the -`FIRST/LAST` functions. This way, one can write an expression that accesses multiple columns. -However, all of them must use the same pattern variable. In other words, the value of the -`LAST`/`FIRST` function must be computed in a single row. +也可以在 `FIRST/LAST` 函数的第一个参数中使用多个模式变量引用。这样,可以编写访问多个列的表达式。但是,它们都必须使用相同的模式变量。换句话说,必须在一行中计算 `LAST`/`FIRST` 函数的值。 -Thus, it is possible to use `LAST(A.price * A.tax)`, but an expression like `LAST(A.price * B.tax)` -is not allowed. +因此,可以使用 `LAST(A.price * A.tax)`,但不允许使用类似 `LAST(A.price * B.tax)` 的表达式。 -After Match Strategy +<a name="after-match-strategy"></a> + +匹配后的策略 -------------------- -The `AFTER MATCH SKIP` clause specifies where to start a new matching procedure after a complete -match was found. +`AFTER MATCH SKIP` 子句指定在找到完全匹配后从何处开始新的匹配过程。 + +有四种不同的策略: +* `SKIP PAST LAST ROW` - 在当前匹配的最后一行之后的下一行继续模式匹配。 +* `SKIP TO NEXT ROW` - 继续从匹配项开始行后的下一行开始搜索新匹配项。 +* `SKIP TO LAST variable` - 恢复映射到指定模式变量的最后一行的模式匹配。 +* `SKIP TO FIRST variable` - 在映射到指定模式变量的第一行继续模式匹配。 -There are four different strategies: -* `SKIP PAST LAST ROW` - resumes the pattern matching at the next row after the last row of the - current match. -* `SKIP TO NEXT ROW` - continues searching for a new match starting at the next row after the - starting row of the match. -* `SKIP TO LAST variable` - resumes the pattern matching at the last row that is mapped to the - specified pattern variable. -* `SKIP TO FIRST variable` - resumes the pattern matching at the first row that is mapped to the - specified pattern variable. +这也是一种指定单个事件可以属于多少个匹配项的方法。例如,使用 `SKIP PAST LAST ROW` 策略,每个事件最多只能属于一个匹配项。 -This is also a way to specify how many matches a single event can belong to. For example, with the -`SKIP PAST LAST ROW` strategy every event can belong to at most one match. +<a name="examples-2"></a> -#### Examples +#### 示例 -In order to better understand the differences between those strategies one can take a look at the -following example. +为了更好地理解这些策略之间的差异,我们可以看看下面的例子。 -For the following input rows: +对于以下输入行: {% highlight text %} symbol tax price rowtime @@ -925,7 +825,7 @@ For the following input rows: XYZ 2 14 2018-09-17 10:00:06 {% endhighlight %} -We evaluate the following query with different strategies: +我们使用不同的策略评估以下查询: {% highlight sql %} SELECT * @@ -945,10 +845,9 @@ FROM Ticker ) {% endhighlight %} -The query returns the sum of the prices of all rows mapped to `A` and the first and last timestamp -of the overall match. +该查询返回映射到 `A` 的总体匹配的第一个和最后一个时间戳所有行的价格之和。 -The query will produce different results based on which `AFTER MATCH` strategy was used: +查询将根据使用的 `AFTER MATCH` 策略产生不同的结果: ##### `AFTER MATCH SKIP PAST LAST ROW` @@ -959,9 +858,9 @@ The query will produce different results based on which `AFTER MATCH` strategy w XYZ 17 2018-09-17 10:00:05 2018-09-17 10:00:06 {% endhighlight %} -The first result matched against the rows #1, #2, #3, #4. +第一个结果与 #1,#2,#3,#4 行匹配。 -The second result matched against the rows #5, #6. +第二个结果与 #5,#6 行匹配。 ##### `AFTER MATCH SKIP TO NEXT ROW` @@ -975,16 +874,15 @@ The second result matched against the rows #5, #6. XYZ 17 2018-09-17 10:00:05 2018-09-17 10:00:06 {% endhighlight %} -Again, the first result matched against the rows #1, #2, #3, #4. +同样,第一个结果与 #1,#2,#3,#4 行匹配。 -Compared to the previous strategy, the next match includes row #2 again for the next matching. -Therefore, the second result matched against the rows #2, #3, #4, #5. +与上一个策略相比,下一个匹配再次包含 #2 行匹配。因此,第二个结果与 #2,#3,#4,#5 行匹配。 -The third result matched against the rows #3, #4, #5. +第三个结果与 #3,#4,#5 行匹配。 -The forth result matched against the rows #4, #5, #6. +第四个结果与 #4,#5,#6 行匹配。 -The last result matched against the rows #5, #6. +最后一个结果与 #5,#6 行匹配。 ##### `AFTER MATCH SKIP TO LAST A` @@ -997,29 +895,26 @@ The last result matched against the rows #5, #6. XYZ 17 2018-09-17 10:00:05 2018-09-17 10:00:06 {% endhighlight %} -Again, the first result matched against the rows #1, #2, #3, #4. +同样,第一个结果与 #1,#2,#3,#4 行匹配。 -Compared to the previous strategy, the next match includes only row #3 (mapped to `A`) again for -the next matching. Therefore, the second result matched against the rows #3, #4, #5. +与前一个策略相比,下一个匹配只包含 #3 行(对应 `A`)用于下一个匹配。因此,第二个结果与 #3,#4,#5 行匹配。 -The third result matched against the rows #4, #5, #6. +第三个结果与 #4,#5,#6 行匹配。 -The last result matched against the rows #5, #6. +最后一个结果与 #5,#6 行匹配。 ##### `AFTER MATCH SKIP TO FIRST A` -This combination will produce a runtime exception because one would always try to start a new match -where the last one started. This would produce an infinite loop and, thus, is prohibited. +这种组合将产生一个运行时异常,因为人们总是试图在上一个开始的地方开始一个新的匹配。这将产生一个无限循环,因此是禁止的。 + +必须记住,在 `SKIP TO FIRST/LAST variable` 策略的场景下,可能没有映射到该变量的行(例如,对于模式 `A*`)。在这种情况下,将抛出一个运行时异常,因为标准要求一个有效的行来继续匹配。 -One has to keep in mind that in case of the `SKIP TO FIRST/LAST variable` strategy it might be -possible that there are no rows mapped to that variable (e.g. for pattern `A*`). In such cases, a -runtime exception will be thrown as the standard requires a valid row to continue the matching. +<a name="time-attributes"></a> -Time attributes +时间属性 --------------- -In order to apply some subsequent queries on top of the `MATCH_RECOGNIZE` it might be required to -use [time attributes](time_attributes.html). To select those there are available two functions: +为了在 `MATCH_RECOGNIZE` 之上应用一些后续查询,可能需要使用[时间属性]({% link dev/table/streaming/time_attributes.zh.md %})。有两个函数可供选择: <table class="table table-bordered"> <thead> @@ -1034,34 +929,30 @@ use [time attributes](time_attributes.html). To select those there are available <td> <code>MATCH_ROWTIME()</code><br/> </td> - <td><p>Returns the timestamp of the last row that was mapped to the given pattern.</p> - <p>The resulting attribute is a <a href="time_attributes.html">rowtime attribute</a> - that can be used in subsequent time-based operations such as - <a href="#joins">interval joins</a> and <a href="#aggregations">group window or over - window aggregations</a>.</p></td> + <td> + <p>返回映射到给定模式的最后一行的时间戳。</p> + <p>结果属性是<a href="{% link dev/table/streaming/time_attributes.zh.md %}">行时间属性</a>,可用于后续基于时间的操作,例如 <a href="{% link dev/table/streaming/joins.zh.md %}#interval-joins">interval joins</a> 和 <a href="#aggregations">group window or over window aggregations</a>。</p> + </td> </tr> <tr> <td> <code>MATCH_PROCTIME()</code><br/> </td> - <td><p>Returns a <a href="time_attributes.html#processing-time">proctime attribute</a> - that can be used in subsequent time-based operations such as - <a href="#joins">interval joins</a> and <a href="#aggregations">group window or over - window aggregations</a>.</p></td> + <td> + <p>返回<a href="{% link dev/table/streaming/time_attributes.zh.md %}#processing-time">处理时间属性</a>,该属性可用于随后的基于时间的操作,例如 <a href="{% link dev/table/streaming/joins.zh.md %}#interval-joins">interval joins</a> 和 <a href="#aggregations">group window or over window aggregations</a>。</p> + </td> </tr> </tbody> </table> -Controlling Memory Consumption +<a name="controlling-memory-consumption"></a> + +控制内存消耗 ------------------------------ -Memory consumption is an important consideration when writing `MATCH_RECOGNIZE` queries, as the -space of potential matches is built in a breadth-first-like manner. Having that in mind, one must -make sure that the pattern can finish. Preferably with a reasonable number of rows mapped to the -match as they have to fit into memory. +在编写 `MATCH_RECOGNIZE` 查询时,内存消耗是一个重要的考虑因素,因为潜在匹配的空间是以宽度优先的方式构建的。鉴于此,我们必须确保模式能够完成。最好使用映射到匹配项的合理数量的行,因为它们必须内存相适。 -For example, the pattern must not have a quantifier without an upper limit that accepts every -single row. Such a pattern could look like this: +例如,该模式不能有没有接受每一行上限的量词。这种模式可以是这样的: {% highlight sql %} PATTERN (A B+ C) @@ -1070,8 +961,7 @@ DEFINE C as C.price > 20 {% endhighlight %} -The query will map every incoming row to the `B` variable and thus will never finish. This query -could be fixed, e.g., by negating the condition for `C`: +查询将每个传入行映射到 `B` 变量,因此永远不会完成。可以纠正此查询,例如,通过否定 `C` 的条件: {% highlight sql %} PATTERN (A B+ C) @@ -1081,7 +971,7 @@ DEFINE C as C.price > 20 {% endhighlight %} -Or by using the [reluctant quantifier](#greedy--reluctant-quantifiers): +或者使用 [reluctant quantifier](#greedy--reluctant-quantifiers): {% highlight sql %} PATTERN (A B+? C) @@ -1090,42 +980,31 @@ DEFINE C as C.price > 20 {% endhighlight %} -<span class="label label-danger">Attention</span> Please note that the `MATCH_RECOGNIZE` clause -does not use a configured [state retention time](query_configuration.html#idle-state-retention-time). -One may want to use the `WITHIN` [clause](#time-constraint) for this purpose. +<span class="label label-danger">注意</span> 请注意,`MATCH_RECOGNIZE` 子句未使用配置的 [state retention time]({% link dev/table/streaming/query_configuration.zh.md %}#idle-state-retention-time)。为此,可能需要使用 `WITHIN` [子句](#time-constraint)。 + +<a name="known-limitations"></a> -Known Limitations +已知的局限 ----------------- -Flink's implementation of the `MATCH_RECOGNIZE` clause is an ongoing effort, and some features of -the SQL standard are not yet supported. - -Unsupported features include: -* Pattern expressions: - * Pattern groups - this means that e.g. quantifiers can not be applied to a subsequence of the - pattern. Thus, `(A (B C)+)` is not a valid pattern. - * Alterations - patterns like `PATTERN((A B | C D) E)`, which means that either a subsequence - `A B` or `C D` has to be found before looking for the `E` row. - * `PERMUTE` operator - which is equivalent to all permutations of variables that it was applied - to e.g. `PATTERN (PERMUTE (A, B, C))` = `PATTERN (A B C | A C B | B A C | B C A | C A B | C B A)`. - * Anchors - `^, $`, which denote beginning/end of a partition, those do not make sense in the - streaming context and will not be supported. - * Exclusion - `PATTERN ({- A -} B)` meaning that `A` will be looked for but will not participate - in the output. This works only for the `ALL ROWS PER MATCH` mode. - * Reluctant optional quantifier - `PATTERN A??` only the greedy optional quantifier is supported. -* `ALL ROWS PER MATCH` output mode - which produces an output row for every row that participated - in the creation of a found match. This also means: - * that the only supported semantic for the `MEASURES` clause is `FINAL` - * `CLASSIFIER` function, which returns the pattern variable that a row was mapped to, is not yet - supported. -* `SUBSET` - which allows creating logical groups of pattern variables and using those groups in - the `DEFINE` and `MEASURES` clauses. -* Physical offsets - `PREV/NEXT`, which indexes all events seen rather than only those that were - mapped to a pattern variable (as in [logical offsets](#logical-offsets) case). -* Extracting time attributes - there is currently no possibility to get a time attribute for - subsequent time-based operations. -* `MATCH_RECOGNIZE` is supported only for SQL. There is no equivalent in the Table API. +Flink 对 `MATCH_RECOGNIZE` 子句实现是一项长期持续的工作,目前尚不支持 SQL 标准的某些功能。 + +不支持的功能包括: +* 模式表达式: + * Pattern groups - 这意味着量词不能应用于模式的子序列。因此,`(A (B C)+)` 不是有效的模式。 + * Alterations - 像 `PATTERN((A B | C D) E)`这样的模式,这意味着在寻找 `E` 行之前必须先找到子序列 `A B` 或者 `C D`。 + * `PERMUTE` operator - 这等同于它应用于所示的所有变量的排列 `PATTERN (PERMUTE (A, B, C))` = `PATTERN (A B C | A C B | B A C | B C A | C A B | C B A)`。 + * Anchors - `^, $`,表示分区的开始/结束,在流上下文中没有意义,将不被支持。 + * Exclusion - `PATTERN ({- A -} B)` 表示将查找 `A`,但是不会参与输出。这只适用于 `ALL ROWS PER MATCH` 方式。 + * Reluctant optional quantifier - `PATTERN A??` 只支持贪婪的可选量词。 +* `ALL ROWS PER MATCH` 输出方式 - 为参与创建匹配项的每一行产生一个输出行。这也意味着: + * `MEASURES` 子句唯一支持的语义是 `FINAL` + * `CLASSIFIER` 函数,尚不支持返回行映射到的模式变量。 +* `SUBSET` - 它允许创建模式变量的逻辑组,并在 `DEFINE` 和 `MEASURES` 子句中使用这些组。 +* Physical offsets - `PREV/NEXT`,它为所有可见事件建立索引,而不是仅将那些映射到模式变量的事件编入索引(如 [logical offsets](#logical-offsets) 的情况)。 +* 提取时间属性 - 目前无法为后续基于时间的操作提取时间属性。 +* `MATCH_RECOGNIZE` 仅 SQL 支持。Table API 中没有等效项。 * Aggregations: - * distinct aggregations are not supported. + * 不支持 distinct aggregations。 {% top %}
