Re: Current State of MATCH_RECOGNIZE Implementation

Julian Hyde Wed, 09 Jan 2019 10:24:11 -0800

I saw your PR https://github.com/julianhyde/calcite/pull/16 
<https://github.com/julianhyde/calcite/pull/16>. Can you please create a PR 
against Apache, then I’ll rebase it onto my 1935 branch.


Are we able to run a SQL query yet? My plan was to get a very basic query 
working end-to-end, then start adding features to the engine. I still think 
that’s a good plan.

Julian


> On Jan 9, 2019, at 9:13 AM, Julian Feinauer <[email protected]> 
> wrote:
> 
> Hi all,
> 
> as discussed in earlier exchanges (see [1]), I started to work on 
> implementing MATCH_RECOGNIZE based on Julian (Hydes) work [2].
> I think I made some progress and resolved some of the problems but I’m at a 
> stage now where I’d need some advice from more seasoned Calcite dev’s on how 
> to continue.
> 
> Basically, I improved the Matching (based on a DFA now) and ensured that we 
> have the full path of symbols (not just rows) as base for the “Emitter”. The 
> Matcher code should also be working now EXCEPT for the PREV / NEXT Commands.
> I think we could handle them pretty easily as I introduced a 
> “MemoryEnumerable”, i.e., an Enumerable which keeps all records in a window 
> around the current record (history and future). I think this should work 
> here, as we have NO unbounded windows like for regular window functions.
> 
> So, basically the Matcher gets for each step a “Memory” Object which has a 
> “get(n)” method to get the current (n = 0), present (n < 0) or future (n > 0) 
> row.
> So, an expression of the Form “PREV(UP.$4, $5)” should be converted to 
> something like “row.get($5).$4”.
> I have no real clue how to do this the “right” way, perhaps by a custom 
> InputGetter which automatically introduces the “.get()”?
> When this is implemented the Matcher should be finished (?).
> 
> Another thing the current implementation is missing is the ordering inside 
> the partitions (which is similar to window functions). Do you think we can 
> simply reuse the code from there?
> Generally, MATCH_RECOGNIZE could be implemented as regular WINDOW Function in 
> the situation where one output row is generated for each input row, but I 
> think this does not help us much as there  also is the other “MODE” where it 
> outputs a single row for each (possibly arbitrary long) match.
> 
> Parallel to this mail I submitted a PR to merge my branch back to Julians 
> work to have a common “checkpoint” for the next steps.
> I would really value if someone could step in (Julian?) with either implement 
> parts of the problems stated above or give me some hints on how to address 
> this properly so that I can try to go further.
> 
> I don’t know what the usual way is but if it helps perhaps we can arrange a 
> Screen sharing session or something to walk through the new code, if 
> necessary.
> 
> Best
> JulianF
> 
> [1] 
> https://lists.apache.org/thread.html/98f67c4534c32b544e48d54abca19f0e89fe8a163e5d5b822d80e6f0@%3Cdev.calcite.apache.org%3E
> [2] https://github.com/julianhyde/calcite/tree/1935-match-recognize

Re: Current State of MATCH_RECOGNIZE Implementation

Reply via email to