Hi Julian Hyde,
Thanks for your feedback about the MATCH_RECOGNIZE functionality enhancement.
I'll give some explanations for the enhancement:
- In the MATCH_RECOGNIZE, the WITHIN clause is an optional clause that outputs
a pattern_clause match if and only if the match occurs within the specified
time duration. Hence if the match occurs beyond the specified time, an optional
clause that outputs a timeout pattern_clause match should be introduced for
this situation.
- [^ <symbol>], ?? and {<symbol>} are proposed as enhancement of the Pattern
expression to support notNext, opposite of consecutive and until semantics for
CEP scenarios. PTAL.
Regards,
Nicholas Jiang
On 2022/07/13 18:17:59 Julian Hyde wrote:
> I couldn’t tell whether timeout is the only enhancement proposed. If there
> are others let me know.
>
> Timeout is controversial. Some streaming systems use timeout, whereas others
> have more declarative ways of making progress, such as watermarks. In my
> experience, timeout-based logic in distributed systems tends to accumulate
> like duct tape. Therefore I would like to see evidence that the timeout-based
> approach is the right one for a significant fraction of Calcite projects.
>
> Julian
>
>
> > On Jul 12, 2022, at 7:09 PM, Nicholas <[email protected]> wrote:
> >
> > Hi everyone,
> >
> >
> >
> >
> > After investigating the usage of MATCH_RECOGNIZE, I have created a JIRA
> > ticket '[CALCITE-5202] Support for MATCH_RECOGNIZE functionality
> > enhancement'.
> >
> >
> >
> >
> > A MATCH_RECOGNIZE clause enables the following tasks:
> >
> >
> >
> >
> > - Logically partition and order the data that is used with the PARTITION BY
> > and ORDER BY clauses.
> >
> >
> >
> >
> > - Define patterns of rows to seek using the PATTERN clause. These patterns
> > use a syntax similar to that of regular expressions.
> >
> >
> >
> >
> > - The logical components of the row pattern variables are specified in the
> > DEFINE clause.
> >
> >
> >
> >
> > - Define measures, which are expressions usable in other parts of the SQL
> > query, in the MEASURES clause.
> >
> >
> >
> >
> > MATCH_RECOGNIZE doesn't support to output the timeout matches at present,
> > which is a common requirement in CEP scenarios. Meanwhile MATCH_RECOGNIZE
> > doesn't support notNext, opposite of consecutive and until semantics:
> >
> >
> >
> >
> > - notNext represents that the new pattern enforces that there is no event
> > matching this pattern right after the preceding matched event.
> >
> >
> >
> >
> > - consecutive means that works in conjunction with mutiple times matching,
> > which specifies that any not matching element breaks the loop.
> >
> >
> >
> >
> > - until applies a stop condition for a looping state that allows cleaning
> > the underlying state.
> >
> >
> >
> >
> > The syntax of enhanced MATCH_RECOGNIZE is proposed as follows:
> >
> >
> >
> >
> > MATCH_RECOGNIZE (
> >
> > [ PARTITION BY <expr> [, ... ] ]
> >
> > [ ORDER BY <expr> [, ... ] ]
> >
> > [ MEASURES <expr> [AS] <alias> [, ... ] ]
> >
> > [ ONE ROW PER MATCH [ { SHOW TIMEOUT MATCHES } ] |
> >
> > ALL ROWS PER MATCH [ { SHOW TIMEOUT MATCHES } ]
> >
> > ]
> >
> > [ AFTER MATCH SKIP
> >
> > {
> >
> > PAST LAST ROW |
> >
> > TO NEXT ROW |
> >
> > TO [ { FIRST | LAST} ] <symbol>
> >
> > }
> >
> > ]
> >
> > PATTERN ( <pattern> )
> >
> > DEFINE <symbol> AS <expr> [, ... ]
> >
> > )
> >
> >
> >
> >
> > - SHOW TIMEOUT MATCHES is introduced to add timeout matches to the output.
> >
> >
> >
> >
> > - [^ <symbol>] is proposed in <pattern> to express the notNext semantic.
> > For example, A [^B] is translated to A.notNext(B).
> >
> >
> >
> >
> > Usage Example:
> >
> >
> >
> >
> > MEASURES
> >
> > A.id as aid
> >
> > ONE ROW PER MATCH
> >
> > PATTERN (A [^B])
> >
> > DEFINE
> >
> > A as A.id = 'a'
> >
> > B as B.id = 'b'
> >
> >
> >
> >
> > - ?? is introduced in <pattern> to support the opposite of consecutive
> > semantic. For example, A B+?? is translated to A.next(B).oneOrMore(). On
> > the contrary, A B+ is translated to A.next(B).oneOrMore().consecutive().
> >
> >
> >
> >
> > Usage Example:
> >
> >
> >
> >
> > MEASURES
> >
> > SUM(B.price) as amount
> >
> > ONE ROW PER MATCH
> >
> > PATTERN (A B+??)
> >
> > DEFINE
> >
> > A as A.id = 'a'
> >
> > A as B.id = 'b'
> >
> >
> >
> >
> > - {<symbol>} is proposed in <pattern> to represent the until semantic. For
> > example, A {- B*? -} C+ {D} is translated to
> > A.followedBy(C).oneOrMore().until(D).
> >
> >
> >
> >
> > Usage Example:
> >
> >
> >
> >
> > MEASURES
> >
> > A.id as aid
> >
> > SUM(C.price) as amount
> >
> > ONE ROW PER MATCH
> >
> > PATTERN (A {- B*? -} C+{D})
> >
> > DEFINE
> >
> > A as A.id = 'a'
> >
> > C as C.id = 'c',
> >
> > D as SUM(C.price) > 100
> >
> >
> >
> >
> > The above is the syntax of the functional enhancement design of
> > MATCH_RECOGNIZE. Looking forward to any feedback of the enhanced
> > MATCH_RECOGNIZE syntax.
> >
> >
> >
> >
> > Best Regards,
> >
> > Nicholas Jiang
>
>