Hi everyone, I very much would value these improvements, thanks for bringing this up for discussion!
I recalled that a discussion on timeouts was once brought up before, so I looked up the thread. See https://lists.apache.org/thread/tw0q4cno3og7hjo5ok1o5w4ytogh5o7l for details. Does anyone know if timeouts and its syntax was discussed during the ISO/IEC review https://www.iso.org/standard/84485.html or perhaps in the currently under development https://www.iso.org/standard/76583.html ? Thanks, Martijn Op ma 18 jul. 2022 om 07:53 schreef Nicholas Jiang <[email protected] >: > Hi Julian Hyde, > > Thanks for your feedback about the MATCH_RECOGNIZE functionality > enhancement. I'll give some explanations for the enhancement: > > - In the MATCH_RECOGNIZE, the WITHIN clause is an optional clause that > outputs a pattern_clause match if and only if the match occurs within the > specified time duration. Hence if the match occurs beyond the specified > time, an optional clause that outputs a timeout pattern_clause match should > be introduced for this situation. > > - [^ <symbol>], ?? and {<symbol>} are proposed as enhancement of the > Pattern expression to support notNext, opposite of consecutive and until > semantics for CEP scenarios. PTAL. > > Regards, > Nicholas Jiang > > On 2022/07/13 18:17:59 Julian Hyde wrote: > > I couldn’t tell whether timeout is the only enhancement proposed. If > there are others let me know. > > > > Timeout is controversial. Some streaming systems use timeout, whereas > others have more declarative ways of making progress, such as watermarks. > In my experience, timeout-based logic in distributed systems tends to > accumulate like duct tape. Therefore I would like to see evidence that the > timeout-based approach is the right one for a significant fraction of > Calcite projects. > > > > Julian > > > > > > > On Jul 12, 2022, at 7:09 PM, Nicholas <[email protected]> wrote: > > > > > > Hi everyone, > > > > > > > > > > > > > > > After investigating the usage of MATCH_RECOGNIZE, I have created a > JIRA ticket '[CALCITE-5202] Support for MATCH_RECOGNIZE functionality > enhancement'. > > > > > > > > > > > > > > > A MATCH_RECOGNIZE clause enables the following tasks: > > > > > > > > > > > > > > > - Logically partition and order the data that is used with the > PARTITION BY and ORDER BY clauses. > > > > > > > > > > > > > > > - Define patterns of rows to seek using the PATTERN clause. These > patterns use a syntax similar to that of regular expressions. > > > > > > > > > > > > > > > - The logical components of the row pattern variables are specified in > the DEFINE clause. > > > > > > > > > > > > > > > - Define measures, which are expressions usable in other parts of the > SQL query, in the MEASURES clause. > > > > > > > > > > > > > > > MATCH_RECOGNIZE doesn't support to output the timeout matches at > present, which is a common requirement in CEP scenarios. Meanwhile > MATCH_RECOGNIZE doesn't support notNext, opposite of consecutive and until > semantics: > > > > > > > > > > > > > > > - notNext represents that the new pattern enforces that there is no > event matching this pattern right after the preceding matched event. > > > > > > > > > > > > > > > - consecutive means that works in conjunction with mutiple times > matching, which specifies that any not matching element breaks the loop. > > > > > > > > > > > > > > > - until applies a stop condition for a looping state that allows > cleaning the underlying state. > > > > > > > > > > > > > > > The syntax of enhanced MATCH_RECOGNIZE is proposed as follows: > > > > > > > > > > > > > > > MATCH_RECOGNIZE ( > > > > > > [ PARTITION BY <expr> [, ... ] ] > > > > > > [ ORDER BY <expr> [, ... ] ] > > > > > > [ MEASURES <expr> [AS] <alias> [, ... ] ] > > > > > > [ ONE ROW PER MATCH [ { SHOW TIMEOUT MATCHES } ] | > > > > > > ALL ROWS PER MATCH [ { SHOW TIMEOUT MATCHES } ] > > > > > > ] > > > > > > [ AFTER MATCH SKIP > > > > > > { > > > > > > PAST LAST ROW | > > > > > > TO NEXT ROW | > > > > > > TO [ { FIRST | LAST} ] <symbol> > > > > > > } > > > > > > ] > > > > > > PATTERN ( <pattern> ) > > > > > > DEFINE <symbol> AS <expr> [, ... ] > > > > > > ) > > > > > > > > > > > > > > > - SHOW TIMEOUT MATCHES is introduced to add timeout matches to the > output. > > > > > > > > > > > > > > > - [^ <symbol>] is proposed in <pattern> to express the notNext > semantic. For example, A [^B] is translated to A.notNext(B). > > > > > > > > > > > > > > > Usage Example: > > > > > > > > > > > > > > > MEASURES > > > > > > A.id as aid > > > > > > ONE ROW PER MATCH > > > > > > PATTERN (A [^B]) > > > > > > DEFINE > > > > > > A as A.id = 'a' > > > > > > B as B.id = 'b' > > > > > > > > > > > > > > > - ?? is introduced in <pattern> to support the opposite of consecutive > semantic. For example, A B+?? is translated to A.next(B).oneOrMore(). On > the contrary, A B+ is translated to A.next(B).oneOrMore().consecutive(). > > > > > > > > > > > > > > > Usage Example: > > > > > > > > > > > > > > > MEASURES > > > > > > SUM(B.price) as amount > > > > > > ONE ROW PER MATCH > > > > > > PATTERN (A B+??) > > > > > > DEFINE > > > > > > A as A.id = 'a' > > > > > > A as B.id = 'b' > > > > > > > > > > > > > > > - {<symbol>} is proposed in <pattern> to represent the until semantic. > For example, A {- B*? -} C+ {D} is translated to > A.followedBy(C).oneOrMore().until(D). > > > > > > > > > > > > > > > Usage Example: > > > > > > > > > > > > > > > MEASURES > > > > > > A.id as aid > > > > > > SUM(C.price) as amount > > > > > > ONE ROW PER MATCH > > > > > > PATTERN (A {- B*? -} C+{D}) > > > > > > DEFINE > > > > > > A as A.id = 'a' > > > > > > C as C.id = 'c', > > > > > > D as SUM(C.price) > 100 > > > > > > > > > > > > > > > The above is the syntax of the functional enhancement design of > MATCH_RECOGNIZE. Looking forward to any feedback of the enhanced > MATCH_RECOGNIZE syntax. > > > > > > > > > > > > > > > Best Regards, > > > > > > Nicholas Jiang > > > > >
