Hey,
it's once again me, JulianF.
I started work on the Automaton / Matcher and implemented OR and OPTIONAL ("?")
to get started with the code.
I would highly appreciate if you (Julian H) could check this code (I made a PR
to your branch).
Then, what else did you consider as necessary for the implementation?
I thought about anchors ("^", "$") but this would need a little bit of extra
changes in the PartitionStates, as far as I see it (to check when we "enter" a
partition and when we "leave".
Best
JulianF
Am 25.12.18, 20:38 schrieb "Julian Feinauer" <[email protected]>:
Hi Julian,
as I already declared my interest in MATCH_RECOGNIZE and offered my help, I
plan to do some things in the next one or two weeks.
Thus, I wanted to start based on your branch (“1935-match-recognize”).
I have some problems getting it to run.
Is it possible that there are some files missing in the commit or are there
some things to consider?
Thanks!
Julian (F)
On 2018/11/26 20:09:00, Julian Hyde
<[email protected]<mailto:[email protected]>> wrote:
> Over thanksgiving, I started working on MATCH_RECOGNIZE again. I wrote a
standalone class called Automaton that allows you to build patterns (basically
regular expressions, but sufficient for the PATTERN sub-clause of
MATCH_RECOGNIZE), and execute them in a unit test.>
>
> Would someone like to help me develop this? We have support for “*” (zero
or more repeats), “+” (1 or more repeats) and “{m,n}” (bounded repeats) but
need “|” (or) and several others. It should be fairly straightforward
test-driven development: add tests to AutomatonTest.java [1], then change
Automaton, AutomatonBuilder, Pattern or Matcher until they pass.>
>
> We also need lots of SQL tests. Could someone write queries against
Oracle’s “ticker” table and paste the queries and results into match.iq?>
>
> See CALCITE-1935 [2], and my branch [3].>
>
> I have cherry-picked commits from Zhiqiang He’s branch [4] into my
branch, so this will be a joint effort when it is finished.>
>
> Julian>
>
> [1]
https://github.com/julianhyde/calcite/blob/1935-match-recognize/core/src/test/java/org/apache/calcite/runtime/AutomatonTest.java
<https://github.com/julianhyde/calcite/blob/1935-match-recognize/core/src/test/java/org/apache/calcite/runtime/AutomatonTest.java><https://github.com/julianhyde/calcite/blob/1935-match-recognize/core/src/test/java/org/apache/calcite/runtime/AutomatonTest.java%3e>>
>
> [2] https://issues.apache.org/jira/browse/CALCITE-1935
<https://issues.apache.org/jira/browse/CALCITE-1935><https://issues.apache.org/jira/browse/CALCITE-1935%3e>>
>
> [3] https://github.com/julianhyde/calcite/tree/1935-match-recognize/
<https://github.com/julianhyde/calcite/tree/1935-match-recognize/><https://github.com/julianhyde/calcite/tree/1935-match-recognize/%3e>>
>
> [4]
https://github.com/Zhiqiang-He/calcite/tree/calcite-1935-MR-Implementation3
<https://github.com/Zhiqiang-He/calcite/tree/calcite-1935-MR-Implementation3><https://github.com/Zhiqiang-He/calcite/tree/calcite-1935-MR-Implementation3%3e>>
>
>
> > On Nov 21, 2018, at 8:45 AM, Julian Feinauer
<[email protected]<mailto:[email protected]>> wrote:>
> > >
> > Sorry, this is an old mail which got sent accidentally again by my mail
program.>
> > Please ignore this and excuse this.>
> > >
> > Julian>
> > >
> > Am 21.11.18, 16:34 schrieb "Julian Feinauer"
<[email protected]<mailto:[email protected]>>:>
> > >
> > Hi Julian,>
> > >
> > I decided to reply to this (old) email, because here some facts are
noted.>
> > Funnily, Apache Flink released their MATCH_RECOGNIZE Implementation
yesterday.>
> > >
> > So I recall that you and Zhigiang He did something on this.>
> > I would like to have such a feature in Calcite (as stated in the
other mail) and could try to go into this a bit with a colleague of mine and
give a bit of support on this topic (In fact, it sounds like fun to us…).>
> > Perhaps theres also the chance to learn something from Flinks
implementation, as you already had some contacts with them, I think?>
> > >
> > Best>
> > Julian>
> > >
> > On 2018/07/23 17:53:57, Julian Hyde
<[email protected]<mailto:[email protected]>> wrote:>
> >> For quite a while we have had partial support for MATCH_RECOGNIZE. We
support it in the parser and validator, but there is no runtime implementation.
It’s a shame, because MATCH_RECOGNIZE is an incredibly powerful SQL feature for
both traditional SQL (it’s in Oracle 12c) and for continuous query (aka complex
event processing - CEP).>>
> >> >
> >> I figure it’s time to change that. My plan is to implement it
incrementally, getting simple queries working to start with, then allow people
to add more complex queries.>>
> >> >
> >> In a dev branch [1], I’ve added a method Enumerables.match[2]. The
idea is that if you supply an Enumerable of input data, a finite state machine
to figure out when a sequence of rows makes a match (represented by a
transition function: (state, row) -> state), and a function to convert a
matched set of rows to a set of output rows. The match method is fairly
straightforward, and I almost have it finished.>>
> >> >
> >> The complexity is in generating the finite state machine, emitter
function, and so forth.>>
> >> >
> >> Can someone help me with this task? If your idea of fun is
implementing database algorithms, this is about as much fun as it gets. You
learned about finite state machines in college - this is your chance to
actually write one!>>
> >> >
> >> This might be a good joint project with the Flink community. I know
Flink are thinking of implementing CEP, and the algorithm we write here could
be shared with Flink (for use via Flink SQL or via the Flink API).>>
> >> >
> >> Julian>>
> >> >
> >> [1] https://github.com/julianhyde/calcite/commits/1935-match-recognize
<https://github.com/julianhyde/calcite/commits/1935-match-recognize>><https://github.com/julianhyde/calcite/commits/1935-match-recognize%3e%3e>>
> >> >
> >> [2]
https://github.com/julianhyde/calcite/commit/4dfaf1bbee718aa6694a8ce67d829c32d04c7e87#diff-8a97a64204db631471c563df7551f408R73
<https://github.com/julianhyde/calcite/commit/4dfaf1bbee718aa6694a8ce67d829c32d04c7e87#diff-8a97a64204db631471c563df7551f408R73>>>
> > >
> > >
>
>