[jira] [Comment Edited] (CALCITE-1641) Base functions support for MATCH_RECOGNIZE

Zhiqiang He (JIRA) Thu, 02 Mar 2017 22:22:55 -0800

    [ 
https://issues.apache.org/jira/browse/CALCITE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893803#comment-15893803
 ]


Zhiqiang He edited comment on CALCITE-1641 at 3/3/17 6:20 AM:
--------------------------------------------------------------

[~julianhyde] I add a commit for this request. please review it.
1,2,3,5,6 already finished.
4:Fix SqlValidatorTest.testMatchRecognizeInternals (operator "FIRST" is being 
seen in regular SQL)
I am not understand your mean . I add some test case for FIRST in 
sqlvalidatorTest.
you can give me more details is this test case it not right.
7.In reference.md, describe measureColumn and condition
I already add measure column define in refernce.md. and more details infomation 
or match recognize I will write it after all functions finished.
8:Run "mvn site" under JDK 1.7 and 9 and fix javadoc errors
I am already execute mvn sit in JDK 7 , but not found some error. can you give 
me some details or example?


was (Author: ransom):
I add a commit for this request. please review it.
1,2,3,5,6 already finished.
4:Fix SqlValidatorTest.testMatchRecognizeInternals (operator "FIRST" is being 
seen in regular SQL)
I am not understand your mean . I add some test case for FIRST in 
sqlvalidatorTest.
you can give me more details is this test case it not right.
7.In reference.md, describe measureColumn and condition
I already add measure column define in refernce.md. and more details infomation 
or match recognize I will write it after all functions finished.
8:Run "mvn site" under JDK 1.7 and 9 and fix javadoc errors
I am already execute mvn sit in JDK 7 , but not found some error. can you give 
me some details or example?

> Base functions support for MATCH_RECOGNIZE
> ------------------------------------------
>
>                 Key: CALCITE-1641
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1641
>             Project: Calcite
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 1.11.0
>            Reporter: Zhiqiang He
>            Assignee: Zhiqiang He
>              Labels: features
>
> MATCH_RECOGNIZE syntax like 
> this:https://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8980
> Only pattern and define is supported in first step.
> h1. PATTERN: Defining the Row Pattern to Be Matched
> The PATTERN keyword specifies the pattern to be recognized in the ordered 
> sequence of rows in a partition. Each variable name in a pattern corresponds 
> to a Boolean condition, which is specified later using the DEFINE component 
> of the syntax.
> The PATTERN clause is used to specify a regular expression. It is outside the 
> scope of this material to explain regular expression concepts and details. If 
> you are not familiar with regular expressions, you are encouraged to 
> familiarize yourself with the topic using other sources.
> The regular expression in a PATTERN clause is enclosed in parentheses. 
> PATTERN may use the following operators:
> * Concatenation
> Concatenation is used to list two or more items in a pattern to be matched in 
> that order. Items are concatenated when there is no operator sign between two 
> successive items. For example: PATTERN (A B C).
> * Quantifiers
> Quantifiers define the number of iterations accepted for a match. Quantifiers 
> are postfix operators with the following choices:
> ** \* — 0 or more iterations
> ** + — 1 or more iterations
> ** ? — 0 or 1 iterations
> ** {n} — n iterations (n > 0)
> ** {n,} — n or more iterations (n >= 0)
> ** {n,m} — between n and m (inclusive) iterations (0 <= n <= m, 0 < m)
> ** {,m} — between 0 and m (inclusive) iterations (m > 0)
> ** reluctant quantifiers — indicated by an additional question mark following 
> a quantifier (*?, +?, ??, {n,}?, { n, m }?, {,m}?). See "Reluctant Versus 
> Greedy Quantifier" for the difference between reluctant and non-reluctant 
> quantifiers.
> The following are examples of using quantifier operators:
> ** A* matches 0 or more iterations of A
> ** A{3,6} matches 3 to 6 iterations of A
> ** A{,4} matches 0 to 4 iterations of A
> * Alternation
> Alternation matches a single regular expression from a list of several 
> possible regular expressions. The alternation list is created by placing a 
> vertical bar (|) between each regular expression. Alternatives are preferred 
> in the order they are specified. As an example, PATTERN (A | B | C) attempts 
> to match A first. If A is not matched, it attempts to match B. If B is not 
> matched, it attempts to match C.
> * Grouping
> Grouping treats a portion of the regular expression as a single unit, 
> enabling you to apply regular expression operators such as quantifiers to 
> that group. Grouping is created with parentheses. As an example, PATTERN ((A 
> B){3} C) attempts to match the group (A B) three times and then seeks one 
> occurrence of C.
> * PERMUTE
> See "How to Express All Permutations" for more information.
> * Exclusion
> Parts of the pattern to be excluded from the output of ALL ROWS PER MATCH are 
> enclosed between {- and -}. See "How to Exclude Portions of the Pattern from 
> the Output".
> * Anchors
> Anchors work in terms of positions rather than rows. They match a position 
> either at the start or end of a partition.
> ** ^ matches the position before the first row in the partition.
> ** $ matches the position after the last row in the partition.
> As an example, PATTERN (^A+$) will match only if all rows in a partition 
> satisfy the condition for A. The resulting match spans the entire partition.
> * Empty pattern (), matches an empty set of rows
> This section contains the following topics:
>     Reluctant Versus Greedy Quantifier
>     Operator Precedence
> h2. Reluctant Versus Greedy Quantifier
> Pattern quantifiers are referred to as greedy; they will attempt to match as 
> many instances of the regular expression on which they are applied as 
> possible. The exception is pattern quantifiers that have a question mark ? as 
> a suffix, and those are referred to as reluctant. They will attempt to match 
> as few instances as possible of the regular expression on which they are 
> applied.
> The difference between greedy and reluctant quantifiers appended to a single 
> pattern variable is illustrated as follows: A* tries to map as many rows as 
> possible to A, whereas A*? tries to map as few rows as possible to A. For 
> example:
> {code:sql}
> PATTERN (X Y* Z)
> {code}
> The pattern consists of three variable names, X, Y, and Z, with Y quantified 
> with *. This means a pattern match will be recognized and reported when the 
> following condition is met by consecutive incoming input rows:
> ** A row satisfies the condition that defines variable X followed by zero or 
> more rows that satisfy the condition that defines the variable Y followed by 
> a row that satisfies the condition that defines the variable Z.
> During the pattern matching process, after a row was mapped to X and 0 or 
> more rows were mapped to Y, if the following row can be mapped to both 
> variables Y and Z (which satisfies the defining condition of both Y and Z), 
> then, because the quantifier * for Y is greedy, the row is preferentially 
> mapped to Y instead of Z. Due to this greedy property, Y gets preference over 
> Z and a greater number of rows to Y are mapped. If the pattern expression was 
> PATTERN (X Y*? Z), which uses a reluctant quantifier *? over Y, then Z gets 
> preference over Y.
> h2. Operator Precedence
> The precedence of the elements in a regular expression, in decreasing order, 
> is as follows:
> * row_pattern_primary
> These elements include primary pattern variables (pattern variables not 
> created with the SUBSET clause described in "SUBSET: Defining Union Row 
> Pattern Variables"), anchors, PERMUTE, parenthetic expressions, exclusion 
> syntax, and empty pattern
> * Quantifier
> A row_pattern_primary may have zero or one quantifier.
> * Concatenation
> * Alternation
> Precedence of alternation is illustrated by PATTERN(A B | C D), which is 
> equivalent to PATTERN ((A B) | (C D)). It is not, however, equivalent to 
> PATTERN (A (B | C) D).
> Precedence of quantifiers is illustrated by PATTERN (A B *), which is 
> equivalent to PATTERN (A (B*)). It is not, however, PATTERN ((A B)*).
> A quantifier may not immediately follow another quantifier. For example, 
> PATTERN(A**) is prohibited.
> It is permitted for a primary pattern variable to occur more than once in a 
> pattern, for example, PATTERN (X Y X).
> h1. **DEFINE**: Defining Primary Pattern Variables
> DEFINE is a mandatory clause, used to specify the conditions that define 
> primary pattern variables. In the example:
> {code:sql}
> DEFINE UP AS UP.Price > PREV(UP.Price),
> DOWN AS DOWN.Price < PREV(DOWN.Price)
> {code}
> UP is defined by the condition UP.Price > PREV (UP.Price), and DOWN is 
> defined by the condition DOWN.Price < PREV (DOWN.Price). (PREV is a row 
> pattern navigation operation which evaluates an expression in the previous 
> row; see "Row Pattern Navigation Operations" regarding the complete set of 
> row pattern navigation operations.)
> A pattern variable does not require a definition; if there is no definition, 
> any row can be mapped to the pattern variable.
> A union row pattern variable (see discussion of SUBSET in "SUBSET: Defining 
> Union Row Pattern Variables") cannot be defined by DEFINE, but can be 
> referenced in the definition of a pattern variable.
> The definition of a pattern variable can reference another pattern variable, 
> which is illustrated in Example 20-6.
> Example 20-6 Defining Pattern Variables
> {code:sql}
> SELECT *
> FROM Ticker MATCH_RECOGNIZE (
>      PARTITION BY Symbol
>      FROM Ticker
>      MATCH_RECOGNIZE (
>      PARTITION BY Symbol
>      ORDER BY tstamp
>      MEASURES FIRST (A.tstamp) AS A_Firstday,
>               LAST (D.tstamp) AS D_Lastday,
>               AVG (B.Price) AS B_Avgprice,
>               AVG (D.Price) AS D_Avgprice
>      PATTERN (A B+ C+ D)
>      SUBSET BC = (B,C)
>      DEFINE A AS Price > 100,
>             B AS B.Price > A.Price,
>             C AS C.Price < AVG (B.Price),
>             D AS D.Price > MAX (BC.Price)
> ) M
> {code}
> In this example:
> The definition of A implicitly references the universal row pattern variable 
> (because of the unqualified column reference Price).
> The definition of B references the pattern variable A.
> The definition of C references the pattern variable B.
> The definition of D references the union row pattern variable BC.
> The conditions are evaluated on successive rows of a partition in a trial 
> match, with the current row being tentatively mapped to a pattern variable as 
> permitted by the pattern. To be successfully mapped, the condition must 
> evaluate to true.
> In the previous example:
> {code:sql}
> A AS Price > 100
> {code}
> Price refers to the Price in the current row, because the last row mapped to 
> any primary row pattern variable is the current row, which is tentatively 
> mapped to A. Alternatively, in this example, using A.Price would have led to 
> the same results.
> {code:sql}
> B AS B.Price > A.Price
> {code}
> B.Price refers to the Price in the current row (because B is being defined), 
> whereas A.Price refers to the last row mapped to A. In view of the pattern, 
> the only row mapped to A is the first row to be mapped.
> {code:sql}
> C AS C.Price < AVG(B.Price)
> {code}
> Here C.Price refers to the Price in the current row, because C is being 
> defined. The aggregate AVG (that is, insert Price) is computed as the average 
> of all rows that are already mapped to B.
> {code:sql}
> D AS D.Price > MAX(BC.Price)
> {code}
> The pattern variable D is similar to pattern variable C, though it 
> illustrates the use of a union row pattern variable in the Boolean condition. 
> In this case, MAX(BC.Price) returns the maximum price value of the rows 
> matched to variable B or variable C. The semantics of Boolean conditions are 
> discussed in more detail in "Expressions in MEASURES and DEFINE".
> h1. MATCH_RECOGNIZE syntax :
> {code:sql}
> table_reference ::=
>   {only (query_table_expression) | query_table_expression 
> }[flashback_query_clause]
>    [pivot_clause|unpivot_clause|row_pattern_recognition_clause] [t_alias]
> row_pattern_recognition_clause ::=
>   MATCH_RECOGNIZE (
>    PATTERN (row_pattern)
>    DEFINE row_pattern_definition_list
>    )
> row_pattern ::=
>    row_pattern_term
>   | row_pattern "|" row_pattern_term
> row_pattern_term ::=
>    row_pattern_factor
>   | row_pattern_term row_pattern_factor
> row_pattern_factor ::=
>    row_pattern_primary [row_pattern_quantifier]
> row_pattern_quantifier ::=
>     *[?]
>    |+[?]
>    |?[?]
>    |"{"[unsigned_integer ],[unsigned_integer]"}"[?]
>    |"{"unsigned_integer "}"
> row_pattern_primary ::=
>    variable_name
>    |$
>    |^
>    |([row_pattern])
>    |"{-" row_pattern"-}"
>    | row_pattern_permute
> row_pattern_permute ::=
>    PERMUTE (row_pattern [, row_pattern] ...)
> row_pattern_subset_clause ::=
>    SUBSET row_pattern_subset_item [, row_pattern_subset_item] ...
> row_pattern_subset_item ::=
>    variable_name = (variable_name[ , variable_name]...)
> row_pattern_definition_list ::=
>    row_pattern_definition[, row_pattern_definition]...
> row_pattern_definition ::=
>    variable_name AS condition
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (CALCITE-1641) Base functions support for MATCH_RECOGNIZE

Reply via email to