[jira] [Commented] (CALCITE-1935) Reference implementation for MATCH_RECOGNIZE

Julian Feinauer (JIRA) Tue, 30 Jul 2019 07:46:52 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896198#comment-16896198
 ]


Julian Feinauer commented on CALCITE-1935:
------------------------------------------

Hey [~julianhyde], I submitted a PR which needs a bit of polishing but I think 
I need some help for the final steps. See my mail on the list 
(https://lists.apache.org/thread.html/475eec9bc95fc05ddabe3721aa0fffd545f77bf4fc5f7a9ea79c690e@%3Cdev.calcite.apache.org%3E).
 The PR can be found here https://github.com/apache/calcite/pull/1343.

I hope that I can manage to get these first steps done with your (or others) 
help
Thanks already!

> Reference implementation for MATCH_RECOGNIZE
> --------------------------------------------
>
>                 Key: CALCITE-1935
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1935
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>            Priority: Major
>              Labels: match
>
> We now have comprehensive support for parsing and validating MATCH_RECOGNIZE 
> queries (see CALCITE-1570 and sub-tasks) but we cannot execute them. I know 
> the purpose of this work is to do CEP within Flink, but a reference 
> implementation that works on non-streaming data would be valuable.
> I propose that we add a class EnumerableMatch that can generate Java code to 
> evaluate MATCH_RECOGNIZE queries on Enumerable data. It does not need to be 
> efficient. I don't mind if it (say) buffers all the data in memory and makes 
> O(n ^ 3) passes over it. People can make it more efficient over time.
> When we have a reference implementation, people can start playing with this 
> feature. And we can start building a corpus of data sets, queries, and their 
> expected result. The Flink implementation will be able to test against those 
> same queries, and should give the same results, even though Flink will be 
> reading streaming data.
> Let's create {{match.iq}} with the following query based on 
> https://oracle-base.com/articles/12c/pattern-matching-in-oracle-database-12cr1:
> {code}
> !set outputformat mysql
> !use match
> SELECT *
> FROM sales_history MATCH_RECOGNIZE (
>          PARTITION BY product
>          ORDER BY tstamp
>          MEASURES  STRT.tstamp AS start_tstamp,
>                    LAST(UP.tstamp) AS peak_tstamp,
>                    LAST(DOWN.tstamp) AS end_tstamp,
>                    MATCH_NUMBER() AS mno
>          ONE ROW PER MATCH
>          AFTER MATCH SKIP TO LAST DOWN
>          PATTERN (STRT UP+ FLAT* DOWN+)
>          DEFINE
>            UP AS UP.units_sold > PREV(UP.units_sold),
>            FLAT AS FLAT.units_sold = PREV(FLAT.units_sold),
>            DOWN AS DOWN.units_sold < PREV(DOWN.units_sold)
>        ) MR
> ORDER BY MR.product, MR.start_tstamp;
> PRODUCT    START_TSTAM PEAK_TSTAMP END_TSTAMP         MNO
> ---------- ----------- ----------- ----------- ----------
> TWINKIES   01-OCT-2014 03-OCT-2014 06-OCT-2014          1
> TWINKIES   06-OCT-2014 08-OCT-2014 09-OCT-2014          2
> TWINKIES   09-OCT-2014 13-OCT-2014 16-OCT-2014          3
> TWINKIES   16-OCT-2014 18-OCT-2014 20-OCT-2014          4
> 4 rows selected.
> !ok
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (CALCITE-1935) Reference implementation for MATCH_RECOGNIZE

Reply via email to