[
https://issues.apache.org/jira/browse/FLINK-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103166#comment-16103166
]
ASF GitHub Bot commented on FLINK-7169:
---------------------------------------
Github user dawidwys commented on the issue:
https://github.com/apache/flink/pull/4331
Hi @yestinchen ,
Thanks for the update.
After second round of review. I found many problems with current approach.
It returns only the first match in a stream in most cases.
1. Let's analyze a pattern `A B C` with `SKIP_TO_FIRST C` and a sequence
`a1 b1 c1 a2 b2 c2`. It will return only `a1 b1 c1` and will left the NFA
without any valid `ComputationalStates` which results in stopping processing.
2. Another problem is we do not handle a matches that can potentially
finish before previously started. E.g. for Pattern
```
Pattern<Event, ?> pattern = Pattern.<Event>begin("ab").where(new
SimpleCondition<Event>() {
@Override
public boolean filter(Event value) throws Exception {
return value.getName().equals("a") ||
value.getName().equals("b");
}
}).followedBy("c").where(new IterativeCondition<Event>() {
@Override
public boolean filter(Event value, Context<Event> ctx) throws Exception
{
return value.getName().equals("c") &&
ctx.getEventsForPattern("ab").iterator().next().getPrice() == value.getPrice();
}
}).setAfterMatchSkipStrategy(new
AfterMatchSkipStrategy(AfterMatchSkipStrategy.SkipStrategy.SKIP_PAST_LAST_EVENT));
```
and a sequence `a(price = 1) b(price = 2) c(price = 2)`. I think a desired
behaviour would be to start new matching after `c` event, but it won't as the
matching started at `a` and will not start at `b`.
---
Some general notes:
1. I think the SQL's specification does not suits well into CEP's library
as we do not operate on a partition/bounded collection of events. The
specification on the other hand assumes such bounded data. I think we would
benefit from some additional documentation how the AFTER_MATCH clause works in
case of unbounded data. E.g. what does **_empty match_** mean:
> Note that the AFTER MATCH SKIP syntax only determines the point to resume
scanning for a match after a non-empty match. When an empty match is found, one
row is skipped (as if SKIP TO NEXT ROW had been speci ed). Thus an empty match
never causes one of these exceptions.
etc.
2. I really don't like the idea of so many cases when `RuntimeException`
can be thrown. I feel the reason for using CEP is a constantly running jobs
that search for patterns in a stream rather than ad-hoc queries.
E.g in case of a Pattern like `A B? C` with `SKIP_TO_LAST B` a sequence
like `a c` results in an exception and the job being killed. In my opinion it
does not suits well into constantly running job. From operational side running
such Patterns would be at least interesting ;), as they depend so much on the
arriving data.
3. I don't know the reasoning, but Esper, that was mentioned as the
other(besides Oracle) library that supports `MATCH_RECOGNIZE` clause does not
support `AFTER MATCH` at all.
4. I found out there was already an ongoing work to introduce part of the
`AFTER MATCH` (the `SKIP_PAST_LAST`). The corresponding jira:
https://issues.apache.org/jira/browse/FLINK-3703 and closed PR: #2367 .
To sum up thanks @yestinchen for the work. Unfortunately I think the clause
needs a little bit more conceptual discussion before we can introduce this
change. I think the `SKIP_PAST_LAST` behaviour would be very helpful (in fact
there were alread requests for it in the mailing list) and the most straight
forward to implement. I would love to here your opinions @yestinchen as well as
@kl0u and @dianfu.
> Support AFTER MATCH SKIP function in CEP library API
> ----------------------------------------------------
>
> Key: FLINK-7169
> URL: https://issues.apache.org/jira/browse/FLINK-7169
> Project: Flink
> Issue Type: Sub-task
> Components: CEP
> Reporter: Yueting Chen
> Assignee: Yueting Chen
>
> In order to support Oracle's MATCH_RECOGNIZE on top of the CEP library, we
> need to support AFTER MATCH SKIP function in CEP API.
> There're four options in AFTER MATCH SKIP, listed as follows:
> 1. AFTER MATCH SKIP TO NEXT ROW: resume pattern matching at the row after the
> first row of the current match.
> 2. AFTER MATCH SKIP PAST LAST ROW: resume pattern matching at the next row
> after the last row of the current match.
> 3. AFTER MATCH SKIP TO FIST *RPV*: resume pattern matching at the first row
> that is mapped to the row pattern variable RPV.
> 4. AFTER MATCH SKIP TO LAST *RPV*: resume pattern matching at the last row
> that is mapped to the row pattern variable RPV.
> I think we can introduce a new function to `CEP` class, which takes a new
> parameter as AfterMatchSKipStrategy.
> The new API may looks like this
> {code}
> public static <T> PatternStream<T> pattern(DataStream<T> input, Pattern<T, ?>
> pattern, AfterMatchSkipStrategy afterMatchSkipStrategy)
> {code}
> We can also make `SKIP TO NEXT ROW` as the default option, because that's
> what CEP library behaves currently.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)