I’m delighted that Flink is getting full SQL support for MATCH_RECOGNIZE.
Sounds like it might be challenging to share the implementation, but could we perhaps share the test suite? (I.e. a set of SQL queries and their expected results.) I added a simple test in https://github.com/julianhyde/calcite/commit/ee460847643ec17544f310088affd99be4028bb6 <https://github.com/julianhyde/calcite/commit/ee460847643ec17544f310088affd99be4028bb6> that could be extended. Julian > On Jul 31, 2018, at 8:07 AM, Fabian Hueske <[email protected]> wrote: > > Hi everyone, > > I'd like to share the plans for MATCH_RECOGNIZE support in Flink. > > Flink features a so-called CEP library for quite some time [1]. The CEP > features is a popular feature and frequently used. > In a nutshell, the library provides a domain-specific API to define event > patterns. The patterns are translated into a state machine and evaluated in > a streaming program. > > Even before, we learned about about MATCH_RECOGNIZE, Till (another Flink > committer) and I gave a few talks about unifying SQL and CEP [2]. > Hence, we were quite excited when we learned about MATCH_RECOGNIZE and even > more when it was added to Calcite. > Shortly after that, we got a PR [3] which translated the parsed > MATCH_RECOGNIZE clause into patterns of our CEP library. > However, we never really got to the point of merging that contribution, > mainly because there were some inconsistencies in the semantics of > MATCH_RECOGNIZE and Flink's CEP library. > > Recently, a Flink committers picked up this feature again, validated the > the semantics, and made a few corrections [4]. > The CEP library is now ready to support a subset of the MATCH_RECOGNIZE > features. > Unfortunately, MATCH_RECOGNIZE support won't make it into the upcoming > 1.6.0 release, but the plans are to add it for the 1.7.0 release. > > Regarding the idea of sharing parts of the evaluation logic. > Flink has runtime support for a subset of the MATCH_RECOGNIZE clause. > Unfortunately, I am not familiar with the internals of Flink's CEP library > and don't know how portable it is. > > Best, Fabian > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/cep.html > <https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/cep.html> > [2] > https://www.slideshare.net/tillrohrmann/streaming-analytics-cep-two-sides-of-the-same-coin > > <https://www.slideshare.net/tillrohrmann/streaming-analytics-cep-two-sides-of-the-same-coin> > [3] https://github.com/apache/flink/pull/4502 > <https://github.com/apache/flink/pull/4502> > [4] https://issues.apache.org/jira/browse/FLINK-9593 > <https://issues.apache.org/jira/browse/FLINK-9593> > > 2018-07-23 21:03 GMT+02:00 Sergey Nuyanzin <[email protected] > <mailto:[email protected]>>: > >> looks exciting. >> If it is possible I would like to take a part of it however I'm not sure >> about this week (I could since August) >> >> On Mon, Jul 23, 2018 at 9:10 PM, Michael Mior <[email protected] >> <mailto:[email protected]>> wrote: >> >>> This does sound like my idea of fun, but unfortunately I won't have >>> the time to contribute in the near future. I'll keep this on my radar >>> though. I also shared this message with all the students in our >>> research group and I wouldn't be surprised if there was someone >>> willing to jump in. Thanks for keeping this moving Julian! >>> >>> -- >>> Michael Mior >>> [email protected] <mailto:[email protected]> >>> Le lun. 23 juil. 2018 à 13:54, Julian Hyde <[email protected] >>> <mailto:[email protected]>> a écrit : >>>> >>>> For quite a while we have had partial support for MATCH_RECOGNIZE. We >>> support it in the parser and validator, but there is no runtime >>> implementation. It’s a shame, because MATCH_RECOGNIZE is an incredibly >>> powerful SQL feature for both traditional SQL (it’s in Oracle 12c) and >> for >>> continuous query (aka complex event processing - CEP). >>>> >>>> I figure it’s time to change that. My plan is to implement it >>> incrementally, getting simple queries working to start with, then allow >>> people to add more complex queries. >>>> >>>> In a dev branch [1], I’ve added a method Enumerables.match[2]. The idea >>> is that if you supply an Enumerable of input data, a finite state machine >>> to figure out when a sequence of rows makes a match (represented by a >>> transition function: (state, row) -> state), and a function to convert a >>> matched set of rows to a set of output rows. The match method is fairly >>> straightforward, and I almost have it finished. >>>> >>>> The complexity is in generating the finite state machine, emitter >>> function, and so forth. >>>> >>>> Can someone help me with this task? If your idea of fun is implementing >>> database algorithms, this is about as much fun as it gets. You learned >>> about finite state machines in college - this is your chance to actually >>> write one! >>>> >>>> This might be a good joint project with the Flink community. I know >>> Flink are thinking of implementing CEP, and the algorithm we write here >>> could be shared with Flink (for use via Flink SQL or via the Flink API). >>>> >>>> Julian >>>> >>>> [1] https://github.com/julianhyde/calcite/commits/1935-match-recognize >> < >>> https://github.com/julianhyde/calcite/commits/1935-match-recognize >>> <https://github.com/julianhyde/calcite/commits/1935-match-recognize>> >>>> >>>> [2] https://github.com/julianhyde/calcite/commit/ >>>> <https://github.com/julianhyde/calcite/commit/> >>> 4dfaf1bbee718aa6694a8ce67d829c32d04c7e87#diff- >>> 8a97a64204db631471c563df7551f408R73 <https://github.com/ >>> <https://github.com/> >>> julianhyde/calcite/commit/4dfaf1bbee718aa6694a8ce67d829c32d04c7e87#diff- >>> 8a97a64204db631471c563df7551f408R73> >>> >> >> >> >> -- >> Best regards, >> Sergey
