[
https://issues.apache.org/jira/browse/CALCITE-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16699512#comment-16699512
]
Julian Hyde commented on CALCITE-1935:
--------------------------------------
Over Thanksgiving, I started working on {{MATCH_RECOGNIZE}} again. I wrote a
standalone class called {{Automaton}} that allows you to build patterns
(basically regular expressions, but sufficient for the {{PATTERN}} sub-clause
of {{MATCH_RECOGNIZE}}), and execute them in a unit test.
Would someone like to help me develop this? We have support for {{*}} (zero or
more repeats), {{+}} (1 or more repeats) and \{m,n\} (bounded repeats) but
need {{|}} (or) and several others. It should be fairly straightforward
test-driven development: add tests to
[AutomatonTest|https://github.com/julianhyde/calcite/blob/1935-match-recognize/core/src/test/java/org/apache/calcite/runtime/AutomatonTest.java],
then change {{Automaton}}, {{AutomatonBuilder}}, {{Pattern}} or {{Matcher}}
until they pass.
We also need lots of SQL tests. Could someone write queries against Oracle’s
“ticker” table and paste the queries and results into {{match.iq}}?
There is some trickier integration to make {{JdbcTest.testMatch}} work
end-to-end; I am working on that.
See [my dev
branch|https://github.com/julianhyde/calcite/tree/1935-match-recognize/].
I have cherry-picked commits from [Zhiqiang He’s
branch|https://github.com/Zhiqiang-He/calcite/tree/calcite-1935-MR-Implementation3]
into my branch, so this will be a joint effort when it is finished.
> Reference implementation for MATCH_RECOGNIZE
> --------------------------------------------
>
> Key: CALCITE-1935
> URL: https://issues.apache.org/jira/browse/CALCITE-1935
> Project: Calcite
> Issue Type: Bug
> Reporter: Julian Hyde
> Assignee: Julian Hyde
> Priority: Major
> Labels: match
>
> We now have comprehensive support for parsing and validating MATCH_RECOGNIZE
> queries (see CALCITE-1570 and sub-tasks) but we cannot execute them. I know
> the purpose of this work is to do CEP within Flink, but a reference
> implementation that works on non-streaming data would be valuable.
> I propose that we add a class EnumerableMatch that can generate Java code to
> evaluate MATCH_RECOGNIZE queries on Enumerable data. It does not need to be
> efficient. I don't mind if it (say) buffers all the data in memory and makes
> O(n ^ 3) passes over it. People can make it more efficient over time.
> When we have a reference implementation, people can start playing with this
> feature. And we can start building a corpus of data sets, queries, and their
> expected result. The Flink implementation will be able to test against those
> same queries, and should give the same results, even though Flink will be
> reading streaming data.
> Let's create {{match.iq}} with the following query based on
> https://oracle-base.com/articles/12c/pattern-matching-in-oracle-database-12cr1:
> {code}
> !set outputformat mysql
> !use match
> SELECT *
> FROM sales_history MATCH_RECOGNIZE (
> PARTITION BY product
> ORDER BY tstamp
> MEASURES STRT.tstamp AS start_tstamp,
> LAST(UP.tstamp) AS peak_tstamp,
> LAST(DOWN.tstamp) AS end_tstamp,
> MATCH_NUMBER() AS mno
> ONE ROW PER MATCH
> AFTER MATCH SKIP TO LAST DOWN
> PATTERN (STRT UP+ FLAT* DOWN+)
> DEFINE
> UP AS UP.units_sold > PREV(UP.units_sold),
> FLAT AS FLAT.units_sold = PREV(FLAT.units_sold),
> DOWN AS DOWN.units_sold < PREV(DOWN.units_sold)
> ) MR
> ORDER BY MR.product, MR.start_tstamp;
> PRODUCT START_TSTAM PEAK_TSTAMP END_TSTAMP MNO
> ---------- ----------- ----------- ----------- ----------
> TWINKIES 01-OCT-2014 03-OCT-2014 06-OCT-2014 1
> TWINKIES 06-OCT-2014 08-OCT-2014 09-OCT-2014 2
> TWINKIES 09-OCT-2014 13-OCT-2014 16-OCT-2014 3
> TWINKIES 16-OCT-2014 18-OCT-2014 20-OCT-2014 4
> 4 rows selected.
> !ok
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)