[jira] [Commented] (CALCITE-1641) Base functions support for MATCH_RECOGNIZE

Julian Hyde (JIRA) Tue, 28 Feb 2017 17:13:09 -0800

    [ 
https://issues.apache.org/jira/browse/CALCITE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889259#comment-15889259
 ]


Julian Hyde commented on CALCITE-1641:
--------------------------------------

I have rebased your changes onto the latest, done some fix up (mainly cosmetic) 
and pushed to https://github.com/julianhyde/calcite/tree/1641-match. Here are 
the things I think still need to be done:
* In validator, don't use "Preconditions.checkArgument" for user
      errors. Use "newValidatorError(..., RESOURCE.messageName)". Add at
      least one test for each error, and make sure that the error occurs
      at the right position in the query text.
* Add javadoc comments for the functions added to SqlStdOperatorTable
* Move PatternValidator and other inner classes further up
      SqlValidatorImpl.java, out of the "enums" section
* Fix SqlValidatorTest.testMatchRecognizeInternals (operator "FIRST" is being 
seen in regular SQL)
* Should we rename MatchRecognize and LogicalMatchRecognize to Match?
* In Parser.jj, move the rules further down the file
* In reference.md, describe measureColumn and condition
* Run "mvn site" under JDK 1.7 and 9 and fix javadoc errors

It seems like a long list, but I think we're close. Your work is very high 
quality.

Do later:
* Use session's case-sensitivity for matching names

> Base functions support for MATCH_RECOGNIZE
> ------------------------------------------
>
>                 Key: CALCITE-1641
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1641
>             Project: Calcite
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 1.11.0
>            Reporter: Zhiqiang He
>            Assignee: Zhiqiang He
>              Labels: features
>
> MATCH_RECOGNIZE syntax like 
> this:https://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8980
> Only pattern and define is supported in first step.
> h1. PATTERN: Defining the Row Pattern to Be Matched
> The PATTERN keyword specifies the pattern to be recognized in the ordered 
> sequence of rows in a partition. Each variable name in a pattern corresponds 
> to a Boolean condition, which is specified later using the DEFINE component 
> of the syntax.
> The PATTERN clause is used to specify a regular expression. It is outside the 
> scope of this material to explain regular expression concepts and details. If 
> you are not familiar with regular expressions, you are encouraged to 
> familiarize yourself with the topic using other sources.
> The regular expression in a PATTERN clause is enclosed in parentheses. 
> PATTERN may use the following operators:
> * Concatenation
> Concatenation is used to list two or more items in a pattern to be matched in 
> that order. Items are concatenated when there is no operator sign between two 
> successive items. For example: PATTERN (A B C).
> * Quantifiers
> Quantifiers define the number of iterations accepted for a match. Quantifiers 
> are postfix operators with the following choices:
> ** \* — 0 or more iterations
> ** + — 1 or more iterations
> ** ? — 0 or 1 iterations
> ** {n} — n iterations (n > 0)
> ** {n,} — n or more iterations (n >= 0)
> ** {n,m} — between n and m (inclusive) iterations (0 <= n <= m, 0 < m)
> ** {,m} — between 0 and m (inclusive) iterations (m > 0)
> ** reluctant quantifiers — indicated by an additional question mark following 
> a quantifier (*?, +?, ??, {n,}?, { n, m }?, {,m}?). See "Reluctant Versus 
> Greedy Quantifier" for the difference between reluctant and non-reluctant 
> quantifiers.
> The following are examples of using quantifier operators:
> ** A* matches 0 or more iterations of A
> ** A{3,6} matches 3 to 6 iterations of A
> ** A{,4} matches 0 to 4 iterations of A
> * Alternation
> Alternation matches a single regular expression from a list of several 
> possible regular expressions. The alternation list is created by placing a 
> vertical bar (|) between each regular expression. Alternatives are preferred 
> in the order they are specified. As an example, PATTERN (A | B | C) attempts 
> to match A first. If A is not matched, it attempts to match B. If B is not 
> matched, it attempts to match C.
> * Grouping
> Grouping treats a portion of the regular expression as a single unit, 
> enabling you to apply regular expression operators such as quantifiers to 
> that group. Grouping is created with parentheses. As an example, PATTERN ((A 
> B){3} C) attempts to match the group (A B) three times and then seeks one 
> occurrence of C.
> * PERMUTE
> See "How to Express All Permutations" for more information.
> * Exclusion
> Parts of the pattern to be excluded from the output of ALL ROWS PER MATCH are 
> enclosed between {- and -}. See "How to Exclude Portions of the Pattern from 
> the Output".
> * Anchors
> Anchors work in terms of positions rather than rows. They match a position 
> either at the start or end of a partition.
> ** ^ matches the position before the first row in the partition.
> ** $ matches the position after the last row in the partition.
> As an example, PATTERN (^A+$) will match only if all rows in a partition 
> satisfy the condition for A. The resulting match spans the entire partition.
> * Empty pattern (), matches an empty set of rows
> This section contains the following topics:
>     Reluctant Versus Greedy Quantifier
>     Operator Precedence
> h2. Reluctant Versus Greedy Quantifier
> Pattern quantifiers are referred to as greedy; they will attempt to match as 
> many instances of the regular expression on which they are applied as 
> possible. The exception is pattern quantifiers that have a question mark ? as 
> a suffix, and those are referred to as reluctant. They will attempt to match 
> as few instances as possible of the regular expression on which they are 
> applied.
> The difference between greedy and reluctant quantifiers appended to a single 
> pattern variable is illustrated as follows: A* tries to map as many rows as 
> possible to A, whereas A*? tries to map as few rows as possible to A. For 
> example:
> {code:sql}
> PATTERN (X Y* Z)
> {code}
> The pattern consists of three variable names, X, Y, and Z, with Y quantified 
> with *. This means a pattern match will be recognized and reported when the 
> following condition is met by consecutive incoming input rows:
> ** A row satisfies the condition that defines variable X followed by zero or 
> more rows that satisfy the condition that defines the variable Y followed by 
> a row that satisfies the condition that defines the variable Z.
> During the pattern matching process, after a row was mapped to X and 0 or 
> more rows were mapped to Y, if the following row can be mapped to both 
> variables Y and Z (which satisfies the defining condition of both Y and Z), 
> then, because the quantifier * for Y is greedy, the row is preferentially 
> mapped to Y instead of Z. Due to this greedy property, Y gets preference over 
> Z and a greater number of rows to Y are mapped. If the pattern expression was 
> PATTERN (X Y*? Z), which uses a reluctant quantifier *? over Y, then Z gets 
> preference over Y.
> h2. Operator Precedence
> The precedence of the elements in a regular expression, in decreasing order, 
> is as follows:
> * row_pattern_primary
> These elements include primary pattern variables (pattern variables not 
> created with the SUBSET clause described in "SUBSET: Defining Union Row 
> Pattern Variables"), anchors, PERMUTE, parenthetic expressions, exclusion 
> syntax, and empty pattern
> * Quantifier
> A row_pattern_primary may have zero or one quantifier.
> * Concatenation
> * Alternation
> Precedence of alternation is illustrated by PATTERN(A B | C D), which is 
> equivalent to PATTERN ((A B) | (C D)). It is not, however, equivalent to 
> PATTERN (A (B | C) D).
> Precedence of quantifiers is illustrated by PATTERN (A B *), which is 
> equivalent to PATTERN (A (B*)). It is not, however, PATTERN ((A B)*).
> A quantifier may not immediately follow another quantifier. For example, 
> PATTERN(A**) is prohibited.
> It is permitted for a primary pattern variable to occur more than once in a 
> pattern, for example, PATTERN (X Y X).
> h1. **DEFINE**: Defining Primary Pattern Variables
> DEFINE is a mandatory clause, used to specify the conditions that define 
> primary pattern variables. In the example:
> {code:sql}
> DEFINE UP AS UP.Price > PREV(UP.Price),
> DOWN AS DOWN.Price < PREV(DOWN.Price)
> {code}
> UP is defined by the condition UP.Price > PREV (UP.Price), and DOWN is 
> defined by the condition DOWN.Price < PREV (DOWN.Price). (PREV is a row 
> pattern navigation operation which evaluates an expression in the previous 
> row; see "Row Pattern Navigation Operations" regarding the complete set of 
> row pattern navigation operations.)
> A pattern variable does not require a definition; if there is no definition, 
> any row can be mapped to the pattern variable.
> A union row pattern variable (see discussion of SUBSET in "SUBSET: Defining 
> Union Row Pattern Variables") cannot be defined by DEFINE, but can be 
> referenced in the definition of a pattern variable.
> The definition of a pattern variable can reference another pattern variable, 
> which is illustrated in Example 20-6.
> Example 20-6 Defining Pattern Variables
> {code:sql}
> SELECT *
> FROM Ticker MATCH_RECOGNIZE (
>      PARTITION BY Symbol
>      FROM Ticker
>      MATCH_RECOGNIZE (
>      PARTITION BY Symbol
>      ORDER BY tstamp
>      MEASURES FIRST (A.tstamp) AS A_Firstday,
>               LAST (D.tstamp) AS D_Lastday,
>               AVG (B.Price) AS B_Avgprice,
>               AVG (D.Price) AS D_Avgprice
>      PATTERN (A B+ C+ D)
>      SUBSET BC = (B,C)
>      DEFINE A AS Price > 100,
>             B AS B.Price > A.Price,
>             C AS C.Price < AVG (B.Price),
>             D AS D.Price > MAX (BC.Price)
> ) M
> {code}
> In this example:
> The definition of A implicitly references the universal row pattern variable 
> (because of the unqualified column reference Price).
> The definition of B references the pattern variable A.
> The definition of C references the pattern variable B.
> The definition of D references the union row pattern variable BC.
> The conditions are evaluated on successive rows of a partition in a trial 
> match, with the current row being tentatively mapped to a pattern variable as 
> permitted by the pattern. To be successfully mapped, the condition must 
> evaluate to true.
> In the previous example:
> {code:sql}
> A AS Price > 100
> {code}
> Price refers to the Price in the current row, because the last row mapped to 
> any primary row pattern variable is the current row, which is tentatively 
> mapped to A. Alternatively, in this example, using A.Price would have led to 
> the same results.
> {code:sql}
> B AS B.Price > A.Price
> {code}
> B.Price refers to the Price in the current row (because B is being defined), 
> whereas A.Price refers to the last row mapped to A. In view of the pattern, 
> the only row mapped to A is the first row to be mapped.
> {code:sql}
> C AS C.Price < AVG(B.Price)
> {code}
> Here C.Price refers to the Price in the current row, because C is being 
> defined. The aggregate AVG (that is, insert Price) is computed as the average 
> of all rows that are already mapped to B.
> {code:sql}
> D AS D.Price > MAX(BC.Price)
> {code}
> The pattern variable D is similar to pattern variable C, though it 
> illustrates the use of a union row pattern variable in the Boolean condition. 
> In this case, MAX(BC.Price) returns the maximum price value of the rows 
> matched to variable B or variable C. The semantics of Boolean conditions are 
> discussed in more detail in "Expressions in MEASURES and DEFINE".
> h1. MATCH_RECOGNIZE syntax :
> {code:sql}
> table_reference ::=
>   {only (query_table_expression) | query_table_expression 
> }[flashback_query_clause]
>    [pivot_clause|unpivot_clause|row_pattern_recognition_clause] [t_alias]
> row_pattern_recognition_clause ::=
>   MATCH_RECOGNIZE (
>    PATTERN (row_pattern)
>    DEFINE row_pattern_definition_list
>    )
> row_pattern ::=
>    row_pattern_term
>   | row_pattern "|" row_pattern_term
> row_pattern_term ::=
>    row_pattern_factor
>   | row_pattern_term row_pattern_factor
> row_pattern_factor ::=
>    row_pattern_primary [row_pattern_quantifier]
> row_pattern_quantifier ::=
>     *[?]
>    |+[?]
>    |?[?]
>    |"{"[unsigned_integer ],[unsigned_integer]"}"[?]
>    |"{"unsigned_integer "}"
> row_pattern_primary ::=
>    variable_name
>    |$
>    |^
>    |([row_pattern])
>    |"{-" row_pattern"-}"
>    | row_pattern_permute
> row_pattern_permute ::=
>    PERMUTE (row_pattern [, row_pattern] ...)
> row_pattern_subset_clause ::=
>    SUBSET row_pattern_subset_item [, row_pattern_subset_item] ...
> row_pattern_subset_item ::=
>    variable_name = (variable_name[ , variable_name]...)
> row_pattern_definition_list ::=
>    row_pattern_definition[, row_pattern_definition]...
> row_pattern_definition ::=
>    variable_name AS condition
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (CALCITE-1641) Base functions support for MATCH_RECOGNIZE

Reply via email to