Hi everyone,



After investigating the usage of MATCH_RECOGNIZE, I have created a JIRA ticket 
'[CALCITE-5202] Support for MATCH_RECOGNIZE functionality enhancement'.




A MATCH_RECOGNIZE clause enables the following tasks:




- Logically partition and order the data that is used with the PARTITION BY and 
ORDER BY clauses.




- Define patterns of rows to seek using the PATTERN clause. These patterns use 
a syntax similar to that of regular expressions.




- The logical components of the row pattern variables are specified in the 
DEFINE clause.




- Define measures, which are expressions usable in other parts of the SQL 
query, in the MEASURES clause.




MATCH_RECOGNIZE doesn't support to output the timeout matches at present, which 
is a common requirement in CEP scenarios. Meanwhile MATCH_RECOGNIZE doesn't 
support notNext, opposite of consecutive and until semantics:




- notNext represents that the new pattern enforces that there is no event 
matching this pattern right after the preceding matched event.




- consecutive means that works in conjunction with mutiple times matching, 
which specifies that any not matching element breaks the loop.




- until applies a stop condition for a looping state that allows cleaning the 
underlying state.




The syntax of enhanced MATCH_RECOGNIZE is proposed as follows:




MATCH_RECOGNIZE (

    [ PARTITION BY <expr> [, ... ] ]

    [ ORDER BY <expr> [, ... ] ]

    [ MEASURES <expr> [AS] <alias> [, ... ] ]

    [ ONE ROW PER MATCH [ { SHOW TIMEOUT MATCHES } ] |

      ALL ROWS PER MATCH [ { SHOW TIMEOUT MATCHES } ]

    ]

    [ AFTER MATCH SKIP

          {

          PAST LAST ROW   |

          TO NEXT ROW   |

          TO [ { FIRST | LAST} ] <symbol>

          }

    ]

    PATTERN ( <pattern> )

    DEFINE <symbol> AS <expr> [, ... ]

)




- SHOW TIMEOUT MATCHES is introduced to add timeout matches to the output.




- [^ <symbol>] is proposed in <pattern> to express the notNext semantic. For 
example, A [^B] is translated to A.notNext(B).




Usage Example:




MEASURES

A.id as aid

ONE ROW PER MATCH

PATTERN (A [^B])

DEFINE

    A as A.id = 'a'

    B as B.id = 'b'




- ?? is introduced in <pattern> to support the opposite of consecutive 
semantic. For example, A B+?? is translated to A.next(B).oneOrMore(). On the 
contrary, A B+ is translated to A.next(B).oneOrMore().consecutive().




Usage Example:




MEASURES

SUM(B.price) as amount

ONE ROW PER MATCH

PATTERN (A B+??)

DEFINE

    A as A.id = 'a'

    A as B.id = 'b'




- {<symbol>} is proposed in <pattern> to represent the until semantic. For 
example, A {- B*? -} C+ {D} is translated to 
A.followedBy(C).oneOrMore().until(D).




Usage Example:




MEASURES

A.id as aid

SUM(C.price) as amount

ONE ROW PER MATCH

PATTERN (A {- B*? -} C+{D})

DEFINE

    A as A.id = 'a'

    C as C.id = 'c',

    D as SUM(C.price) > 100




The above is the syntax of the functional enhancement design of 
MATCH_RECOGNIZE. Looking forward to any feedback of the enhanced 
MATCH_RECOGNIZE syntax.




 Best Regards,

 Nicholas Jiang

Reply via email to