[jira] [Comment Edited] (CALCITE-1935) Reference implementation for MATCH_RECOGNIZE

Julian Feinauer (JIRA) Tue, 16 Apr 2019 04:21:47 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816658#comment-16816658
 ]


Julian Feinauer edited comment on CALCITE-1935 at 4/16/19 11:20 AM:
--------------------------------------------------------------------

Hi [~julianhyde] thanks for your offer. And I think I have to come back to it.
I'm trying to prepare a PR for master but the branch is pretty huge as it 
builds on a lot of preivous work from you and others.
So, for example there are some tests failing where I do not know how to fix 
them properly.
Overall I would be grateful to have you looking into the code and help me to 
bring the efforts to a PR-eable state.

Some issues I currently have are:
- 8 or 9 MR related tests in SqlToRelConverterTest fail (which were implemented 
by [~ransom]) but only due to another representation in the string, so this 
should be easy to fix if one knows how this Test Setup works.
- It would be good to check if there are "leftovers" of non MR related code in 
these branches which was introduced by me or previous authors while "playing 
around"

The branch I'm currently working on is 
https://github.com/JulianFeinauer/calcite/tree/1935-mr-prepare-pr.

I think if we manage to get everything "green" the next step is to go to the PR 
phase and check all my solutions and if they are fine or things should be 
solved differently. But first things first, I guess.


was (Author: julian.feinauer):
Hi [~julianhyde] thanks for your offer. And I think I have to come back to it.
I'm trying to prepare a PR for master but the branch is pretty huge as it 
builds on a lot of preivous work from you and others.
So, for example there are some tests failing where I do not know how to fix 
them properly.
Overall I would be grateful to have you looking into the code and help me to 
bring the efforts to a PR-eable state.

Some issues I currently have are:
- 8 or 9 MR related tests in SqlToRelConverterTest fail (which were implemented 
by [~ransom]) but only due to another representation in the string, so this 
should be easy to fix if one knows how this Test Setup works.
- The largest Issue I currently have is related to CALCITE-2966. From the 
Codegeneration I do it seems like there is a Bug in RexImplTable. But when I 
"fix" it so that the JdbcTest.testMatch test becomes green a whole lot of other 
tests fail.
I try to investigate this behavior since two or three days but I do not 
understand completly why this is the case. I think this could be some kind of 
edge case (where the others rely on Boolean fields or primitives or something) 
and I do not, or do something different.
- It would be good to check if there are "leftovers" of non MR related code in 
these branches which was introduced by me or previous authors while "playing 
around"

The branch I'm currently working on is 
https://github.com/JulianFeinauer/calcite/tree/1935-mr-prepare-pr.

I think if we manage to get everything "green" the next step is to go to the PR 
phase and check all my solutions and if they are fine or things should be 
solved differently. But first things first, I guess.

> Reference implementation for MATCH_RECOGNIZE
> --------------------------------------------
>
>                 Key: CALCITE-1935
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1935
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Priority: Major
>              Labels: match
>
> We now have comprehensive support for parsing and validating MATCH_RECOGNIZE 
> queries (see CALCITE-1570 and sub-tasks) but we cannot execute them. I know 
> the purpose of this work is to do CEP within Flink, but a reference 
> implementation that works on non-streaming data would be valuable.
> I propose that we add a class EnumerableMatch that can generate Java code to 
> evaluate MATCH_RECOGNIZE queries on Enumerable data. It does not need to be 
> efficient. I don't mind if it (say) buffers all the data in memory and makes 
> O(n ^ 3) passes over it. People can make it more efficient over time.
> When we have a reference implementation, people can start playing with this 
> feature. And we can start building a corpus of data sets, queries, and their 
> expected result. The Flink implementation will be able to test against those 
> same queries, and should give the same results, even though Flink will be 
> reading streaming data.
> Let's create {{match.iq}} with the following query based on 
> https://oracle-base.com/articles/12c/pattern-matching-in-oracle-database-12cr1:
> {code}
> !set outputformat mysql
> !use match
> SELECT *
> FROM sales_history MATCH_RECOGNIZE (
>          PARTITION BY product
>          ORDER BY tstamp
>          MEASURES  STRT.tstamp AS start_tstamp,
>                    LAST(UP.tstamp) AS peak_tstamp,
>                    LAST(DOWN.tstamp) AS end_tstamp,
>                    MATCH_NUMBER() AS mno
>          ONE ROW PER MATCH
>          AFTER MATCH SKIP TO LAST DOWN
>          PATTERN (STRT UP+ FLAT* DOWN+)
>          DEFINE
>            UP AS UP.units_sold > PREV(UP.units_sold),
>            FLAT AS FLAT.units_sold = PREV(FLAT.units_sold),
>            DOWN AS DOWN.units_sold < PREV(DOWN.units_sold)
>        ) MR
> ORDER BY MR.product, MR.start_tstamp;
> PRODUCT    START_TSTAM PEAK_TSTAMP END_TSTAMP         MNO
> ---------- ----------- ----------- ----------- ----------
> TWINKIES   01-OCT-2014 03-OCT-2014 06-OCT-2014          1
> TWINKIES   06-OCT-2014 08-OCT-2014 09-OCT-2014          2
> TWINKIES   09-OCT-2014 13-OCT-2014 16-OCT-2014          3
> TWINKIES   16-OCT-2014 18-OCT-2014 20-OCT-2014          4
> 4 rows selected.
> !ok
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (CALCITE-1935) Reference implementation for MATCH_RECOGNIZE

Reply via email to