Is there a better way?

I'm am idealist with regard to streaming SQL semantics, and I'm going
to make the 'slippery slope' argument that if we add a TIMEOUT
parameter to MATCH_RECOGNIZE, won't we also need to add it to GROUP BY
and JOIN? (Because those are also "blocking" operators.)

Maybe JOIN and GROUP BY are simpler because (absent retractions) they
are monotonic. If more data arrives, it will not cause rows to
disappear from your result. So, maybe anti-join is the best
comparison. How does Flink deal with, say "show me all orders from
customers who have not made a product return in the last 3 months"?
You'd need a timeout on the PRODUCT_RETURNS stream, right?

My hunch is that Flink can express these semantics without extending
the syntax of JOIN, and if so, we could use the same approach to make
MATCH_RECOGNIZE work with late data.

Julian

On Mon, Sep 21, 2020 at 12:05 AM Kosma Grochowski
<kosma.grochow...@getindata.com> wrote:
>
> Hi Jark,
>
> Thank you for your e-mail. I agree, let's engage all interested parties in 
> this discussion - I'm writing this e-mail to both Flink and Calcite dev 
> mailing lists.
>
> I'll repeat myself to present the proposal to the Calcite community.
>
> I would like to propose an enrichment of existing Flink SQL MATCH_RECOGNIZE 
> syntax to cover for the case of the absence of an event. Such an enrichment 
> would help our company solve a business case containing timed-out patterns 
> handling. An example of usage of such a clause from Flink training exercises 
> could be a task of identification of taxi rides with a START event that is 
> not followed by an END event within two hours. Currently, a solution to such 
> a task could be achieved with the use of CEP and a timeout handler. However, 
> as far as I know, it is impossible to take advantage of Flink SQL syntax for 
> this task.
>
> I can think of two ways for such a feature to be incorporated into existing 
> MATCH_RECOGNIZE syntax:
> - In analogy to CEP, a keyword could be added which would determine, if timed 
> out matches should be dropped altogether or available either through side 
> output or main output. SQL usage could be similar to the current WITHIN 
> clause, f.e. "PATTERN (A B C) TIMEOUT INTERVAL '30' SECOND" would output 
> partially matched patterns 30 seconds after A event appearance.
>
> - Add possibility to define absence of event inside pattern definition - for 
> example "PATTERN (A B !C) WITHIN INTERVAL '30' SECOND" would output partially 
> matched patterns with the occurrence of A and B event 30 seconds after A 
> event appearance.
>
> In our company we did some basic testing of this concept - we modified 
> existing MatchCodeGenerator to add processTimedOutMatch function based on a 
> boolean trigger and tested it against the aforementioned business case 
> containing timed-out patterns handling.
>
> I'm interested to hear your thoughts about how we could help Flink SQL be 
> able to express these kinds of cases.
>
> With regards,
> Kosma Grochowski
>
>
>
> > On 21 Sep 2020, at 05:12, Jark Wu <imj...@gmail.com> wrote:
> >
> > Hi Kosma,
> >
> > Thanks for the proposal. I like it and we also have supported similar
> > syntax in our company.
> > The problem is that Flink SQL leverages Calcite as the query parser, so if
> > we want to support this syntax, we may have to push this syntax back to the
> > Calcite community.
> > Besides, the SQL standard doesn't define the timeout syntax for MATCH
> > RECOGNIZE. So we have to extend the standard and this is usually not
> > trivial.
> >
> > So I think it would be better to have a joint discussion with the Calcite
> > and Flink community together. What do you think?
> >
> > Best,
> > Jark
> >
> >
> >
> >
> >
> > On Fri, 18 Sep 2020 at 22:48, Kosma Grochowski <
> > kosma.grochow...@getindata.com> wrote:
> >
> >> Hello,
> >>
> >> I would like to propose an enrichment of existing Flink SQL
> >> MATCH_RECOGNIZE syntax to cover for the case of the absence of an event.
> >> Such an enrichment would help our company solve a business case containing
> >> timed-out patterns handling. An example of usage of such a clause from
> >> Flink training exercises could be a task of identification of taxi rides
> >> with a START event that is not followed by an END event within two hours.
> >> Currently, a solution to such a task could be achieved with the use of CEP
> >> and a timeout handler. However, as far as I know, it is impossible to take
> >> advantage of Flink SQL syntax for this task.
> >>
> >> I can think of two ways for such a feature to be incorporated into
> >> existing MATCH_RECOGNIZE syntax:
> >> - In analogy to CEP, a keyword could be added which would determine, if
> >> timed out matches should be dropped altogether or available either through
> >> side output or main output. SQL usage could be similar to the current
> >> WITHIN clause, f.e. "PATTERN (A B C) TIMEOUT INTERVAL '30' SECOND" would
> >> output partially matched patterns 30 seconds after A event appearance.
> >>
> >> - Add possibility to define absence of event inside pattern definition -
> >> for example "PATTERN (A B !C) WITHIN INTERVAL '30' SECOND" would output
> >> partially matched patterns with the occurrence of A and B event 30 seconds
> >> after A event appearance.
> >>
> >> In our company we did some basic testing of this concept - we modified
> >> existing MatchCodeGenerator to add processTimedOutMatch function based on a
> >> boolean trigger and tested it against the aforementioned business case
> >> containing timed-out patterns handling.
> >>
> >>
> >> I'm interested to hear your thoughts about how we could help Flink SQL be
> >> able to express these kinds of cases.
> >>
> >> With regards,
> >> Kosma Grochowski
> >>
> >>
> >>
> >>
>

Reply via email to