Hi Julian Hyde,

Thanks for your feedback about the MATCH_RECOGNIZE functionality enhancement. 
I'll give some explanations for the enhancement:

- In the MATCH_RECOGNIZE, the WITHIN clause is an optional clause that outputs 
a pattern_clause match if and only if the match occurs within the specified 
time duration. Hence if the match occurs beyond the specified time, an optional 
clause that outputs a timeout pattern_clause match should be introduced for 
this situation. 

- [^ <symbol>], ?? and {<symbol>} are proposed as enhancement of the Pattern 
expression to support notNext, opposite of consecutive and until semantics for 
CEP scenarios. PTAL.

Regards,
Nicholas Jiang

On 2022/07/13 18:17:59 Julian Hyde wrote:
> I couldn’t tell whether timeout is the only enhancement proposed. If there 
> are others let me know.
> 
> Timeout is controversial. Some streaming systems use timeout, whereas others 
> have more declarative ways of making progress, such as watermarks. In my 
> experience, timeout-based logic in distributed systems tends to accumulate 
> like duct tape. Therefore I would like to see evidence that the timeout-based 
> approach is the right one for a significant fraction of Calcite projects.
> 
> Julian
> 
> 
> > On Jul 12, 2022, at 7:09 PM, Nicholas <[email protected]> wrote:
> > 
> > Hi everyone,
> > 
> > 
> > 
> > 
> > After investigating the usage of MATCH_RECOGNIZE, I have created a JIRA 
> > ticket '[CALCITE-5202] Support for MATCH_RECOGNIZE functionality 
> > enhancement'.
> > 
> > 
> > 
> > 
> > A MATCH_RECOGNIZE clause enables the following tasks:
> > 
> > 
> > 
> > 
> > - Logically partition and order the data that is used with the PARTITION BY 
> > and ORDER BY clauses.
> > 
> > 
> > 
> > 
> > - Define patterns of rows to seek using the PATTERN clause. These patterns 
> > use a syntax similar to that of regular expressions.
> > 
> > 
> > 
> > 
> > - The logical components of the row pattern variables are specified in the 
> > DEFINE clause.
> > 
> > 
> > 
> > 
> > - Define measures, which are expressions usable in other parts of the SQL 
> > query, in the MEASURES clause.
> > 
> > 
> > 
> > 
> > MATCH_RECOGNIZE doesn't support to output the timeout matches at present, 
> > which is a common requirement in CEP scenarios. Meanwhile MATCH_RECOGNIZE 
> > doesn't support notNext, opposite of consecutive and until semantics:
> > 
> > 
> > 
> > 
> > - notNext represents that the new pattern enforces that there is no event 
> > matching this pattern right after the preceding matched event.
> > 
> > 
> > 
> > 
> > - consecutive means that works in conjunction with mutiple times matching, 
> > which specifies that any not matching element breaks the loop.
> > 
> > 
> > 
> > 
> > - until applies a stop condition for a looping state that allows cleaning 
> > the underlying state.
> > 
> > 
> > 
> > 
> > The syntax of enhanced MATCH_RECOGNIZE is proposed as follows:
> > 
> > 
> > 
> > 
> > MATCH_RECOGNIZE (
> > 
> >    [ PARTITION BY <expr> [, ... ] ]
> > 
> >    [ ORDER BY <expr> [, ... ] ]
> > 
> >    [ MEASURES <expr> [AS] <alias> [, ... ] ]
> > 
> >    [ ONE ROW PER MATCH [ { SHOW TIMEOUT MATCHES } ] |
> > 
> >      ALL ROWS PER MATCH [ { SHOW TIMEOUT MATCHES } ]
> > 
> >    ]
> > 
> >    [ AFTER MATCH SKIP
> > 
> >          {
> > 
> >          PAST LAST ROW   |
> > 
> >          TO NEXT ROW   |
> > 
> >          TO [ { FIRST | LAST} ] <symbol>
> > 
> >          }
> > 
> >    ]
> > 
> >    PATTERN ( <pattern> )
> > 
> >    DEFINE <symbol> AS <expr> [, ... ]
> > 
> > )
> > 
> > 
> > 
> > 
> > - SHOW TIMEOUT MATCHES is introduced to add timeout matches to the output.
> > 
> > 
> > 
> > 
> > - [^ <symbol>] is proposed in <pattern> to express the notNext semantic. 
> > For example, A [^B] is translated to A.notNext(B).
> > 
> > 
> > 
> > 
> > Usage Example:
> > 
> > 
> > 
> > 
> > MEASURES
> > 
> > A.id as aid
> > 
> > ONE ROW PER MATCH
> > 
> > PATTERN (A [^B])
> > 
> > DEFINE
> > 
> >    A as A.id = 'a'
> > 
> >    B as B.id = 'b'
> > 
> > 
> > 
> > 
> > - ?? is introduced in <pattern> to support the opposite of consecutive 
> > semantic. For example, A B+?? is translated to A.next(B).oneOrMore(). On 
> > the contrary, A B+ is translated to A.next(B).oneOrMore().consecutive().
> > 
> > 
> > 
> > 
> > Usage Example:
> > 
> > 
> > 
> > 
> > MEASURES
> > 
> > SUM(B.price) as amount
> > 
> > ONE ROW PER MATCH
> > 
> > PATTERN (A B+??)
> > 
> > DEFINE
> > 
> >    A as A.id = 'a'
> > 
> >    A as B.id = 'b'
> > 
> > 
> > 
> > 
> > - {<symbol>} is proposed in <pattern> to represent the until semantic. For 
> > example, A {- B*? -} C+ {D} is translated to 
> > A.followedBy(C).oneOrMore().until(D).
> > 
> > 
> > 
> > 
> > Usage Example:
> > 
> > 
> > 
> > 
> > MEASURES
> > 
> > A.id as aid
> > 
> > SUM(C.price) as amount
> > 
> > ONE ROW PER MATCH
> > 
> > PATTERN (A {- B*? -} C+{D})
> > 
> > DEFINE
> > 
> >    A as A.id = 'a'
> > 
> >    C as C.id = 'c',
> > 
> >    D as SUM(C.price) > 100
> > 
> > 
> > 
> > 
> > The above is the syntax of the functional enhancement design of 
> > MATCH_RECOGNIZE. Looking forward to any feedback of the enhanced 
> > MATCH_RECOGNIZE syntax.
> > 
> > 
> > 
> > 
> > Best Regards,
> > 
> > Nicholas Jiang
> 
> 

Reply via email to