Hi all,

as discussed in earlier exchanges (see [1]), I started to work on implementing 
MATCH_RECOGNIZE based on Julian (Hydes) work [2].
I think I made some progress and resolved some of the problems but I’m at a 
stage now where I’d need some advice from more seasoned Calcite dev’s on how to 
continue.

Basically, I improved the Matching (based on a DFA now) and ensured that we 
have the full path of symbols (not just rows) as base for the “Emitter”. The 
Matcher code should also be working now EXCEPT for the PREV / NEXT Commands.
I think we could handle them pretty easily as I introduced a 
“MemoryEnumerable”, i.e., an Enumerable which keeps all records in a window 
around the current record (history and future). I think this should work here, 
as we have NO unbounded windows like for regular window functions.

So, basically the Matcher gets for each step a “Memory” Object which has a 
“get(n)” method to get the current (n = 0), present (n < 0) or future (n > 0) 
row.
So, an expression of the Form “PREV(UP.$4, $5)” should be converted to 
something like “row.get($5).$4”.
I have no real clue how to do this the “right” way, perhaps by a custom 
InputGetter which automatically introduces the “.get()”?
When this is implemented the Matcher should be finished (?).

Another thing the current implementation is missing is the ordering inside the 
partitions (which is similar to window functions). Do you think we can simply 
reuse the code from there?
Generally, MATCH_RECOGNIZE could be implemented as regular WINDOW Function in 
the situation where one output row is generated for each input row, but I think 
this does not help us much as there  also is the other “MODE” where it outputs 
a single row for each (possibly arbitrary long) match.

Parallel to this mail I submitted a PR to merge my branch back to Julians work 
to have a common “checkpoint” for the next steps.
I would really value if someone could step in (Julian?) with either implement 
parts of the problems stated above or give me some hints on how to address this 
properly so that I can try to go further.

I don’t know what the usual way is but if it helps perhaps we can arrange a 
Screen sharing session or something to walk through the new code, if necessary.

Best
JulianF

[1] 
https://lists.apache.org/thread.html/98f67c4534c32b544e48d54abca19f0e89fe8a163e5d5b822d80e6f0@%3Cdev.calcite.apache.org%3E
[2] https://github.com/julianhyde/calcite/tree/1935-match-recognize

Reply via email to