Hi everyone,
After investigating the usage of MATCH_RECOGNIZE, I have created a JIRA ticket
'[CALCITE-5202] Support for MATCH_RECOGNIZE functionality enhancement'.
A MATCH_RECOGNIZE clause enables the following tasks:
- Logically partition and order the data that is used with the PARTITION BY and
ORDER BY clauses.
- Define patterns of rows to seek using the PATTERN clause. These patterns use
a syntax similar to that of regular expressions.
- The logical components of the row pattern variables are specified in the
DEFINE clause.
- Define measures, which are expressions usable in other parts of the SQL
query, in the MEASURES clause.
MATCH_RECOGNIZE doesn't support to output the timeout matches at present, which
is a common requirement in CEP scenarios. Meanwhile MATCH_RECOGNIZE doesn't
support notNext, opposite of consecutive and until semantics:
- notNext represents that the new pattern enforces that there is no event
matching this pattern right after the preceding matched event.
- consecutive means that works in conjunction with mutiple times matching,
which specifies that any not matching element breaks the loop.
- until applies a stop condition for a looping state that allows cleaning the
underlying state.
The syntax of enhanced MATCH_RECOGNIZE is proposed as follows:
MATCH_RECOGNIZE (
[ PARTITION BY <expr> [, ... ] ]
[ ORDER BY <expr> [, ... ] ]
[ MEASURES <expr> [AS] <alias> [, ... ] ]
[ ONE ROW PER MATCH [ { SHOW TIMEOUT MATCHES } ] |
ALL ROWS PER MATCH [ { SHOW TIMEOUT MATCHES } ]
]
[ AFTER MATCH SKIP
{
PAST LAST ROW |
TO NEXT ROW |
TO [ { FIRST | LAST} ] <symbol>
}
]
PATTERN ( <pattern> )
DEFINE <symbol> AS <expr> [, ... ]
)
- SHOW TIMEOUT MATCHES is introduced to add timeout matches to the output.
- [^ <symbol>] is proposed in <pattern> to express the notNext semantic. For
example, A [^B] is translated to A.notNext(B).
Usage Example:
MEASURES
A.id as aid
ONE ROW PER MATCH
PATTERN (A [^B])
DEFINE
A as A.id = 'a'
B as B.id = 'b'
- ?? is introduced in <pattern> to support the opposite of consecutive
semantic. For example, A B+?? is translated to A.next(B).oneOrMore(). On the
contrary, A B+ is translated to A.next(B).oneOrMore().consecutive().
Usage Example:
MEASURES
SUM(B.price) as amount
ONE ROW PER MATCH
PATTERN (A B+??)
DEFINE
A as A.id = 'a'
A as B.id = 'b'
- {<symbol>} is proposed in <pattern> to represent the until semantic. For
example, A {- B*? -} C+ {D} is translated to
A.followedBy(C).oneOrMore().until(D).
Usage Example:
MEASURES
A.id as aid
SUM(C.price) as amount
ONE ROW PER MATCH
PATTERN (A {- B*? -} C+{D})
DEFINE
A as A.id = 'a'
C as C.id = 'c',
D as SUM(C.price) > 100
The above is the syntax of the functional enhancement design of
MATCH_RECOGNIZE. Looking forward to any feedback of the enhanced
MATCH_RECOGNIZE syntax.
Best Regards,
Nicholas Jiang