For the not working SESSION_END(), I had an investigation on it: https://issues.apache.org/jira/browse/BEAM-5799 https://issues.apache.org/jira/browse/CALCITE-2645
According to the reply in the Calcite JIRA, there might be some other way to implement SESSION_END. I haven't looked into it though. -Rui On Thu, Nov 15, 2018 at 11:56 AM Mingmin Xu <[email protected]> wrote: > 1. Window start/end: Actually this is already provided in other ways and > the window in the SQL environment is unused and just waiting to be deleted. > So you can still access TUMBLE_START, etc. This is well-defined as a part > of the row so there's no semantic problem, but I think it should already > work. > *MM: Others work except SESSION_END();* > > 2. Pane information: I don't think access to pane info is enough for > correct results for a SQL join that triggers more than once. The pane info > is part of a Beam element, but these records just represent a kind of > changelog of the aggregation/join. The general solution is retractions. > Until we finish that, you need to follow the Join/CoGBK with custom logic , > often a stateful DoFn to get the join results right. For example, if both > inputs are append-only relations and it is an equijoin, then you can do > this with a dedupe when you unpack the CoGbkResult. I am guessing this is > the main use case for BEAM-5204. Is it your use case? > *MM: my case is a self-join with SQL-only, written as [DISCARD_Pane JOIN > ACCU_Pane];* > *These UDFs is not a blocker, limitation in BEAM-5204 should be removed > directly IMO. With multiple-trigger assigned, developers need to handle the > output which is not complex with Java SDK, but very hard for SQL only > cases. * > > > On Thu, Nov 15, 2018 at 10:54 AM Kenneth Knowles <[email protected]> wrote: > >> From https://issues.apache.org/jira/browse/BEAM-5204 it seems like what >> you most care about is to have joins that trigger more than once per >> window. To accomplish it you hope to build an "escape hatch" from >> SQL/relational semantics to specialized Beam SQL semantics. It could make >> sense with extreme care. >> >> Separating the two parts: >> >> 1. Window start/end: Actually this is already provided in other ways and >> the window in the SQL environment is unused and just waiting to be deleted. >> So you can still access TUMBLE_START, etc. This is well-defined as a part >> of the row so there's no semantic problem, but I think it should already >> work. >> >> 2. Pane information: I don't think access to pane info is enough for >> correct results for a SQL join that triggers more than once. The pane info >> is part of a Beam element, but these records just represent a kind of >> changelog of the aggregation/join. The general solution is retractions. >> Until we finish that, you need to follow the Join/CoGBK with custom logic , >> often a stateful DoFn to get the join results right. For example, if both >> inputs are append-only relations and it is an equijoin, then you can do >> this with a dedupe when you unpack the CoGbkResult. I am guessing this is >> the main use case for BEAM-5204. Is it your use case? >> >> Kenn >> >> On Thu, Nov 15, 2018 at 10:08 AM Mingmin Xu <[email protected]> wrote: >> >>> Raise this thread. >>> Seems there're more changes in the backend on how a FUNCTION is executed >>> in the backend, as noticed in #6996 >>> <https://github.com/apache/beam/pull/6996>: >>> 1. BeamSqlExpression and BeamSqlExpressionExecutor are removed; >>> 2. BeamSqlExpressionEnvironment are removed; >>> >>> Then, >>> 1. for Calcite defined FUNCTIONS, it uses Calcite generated code (which >>> is great and duplicate work is worthless); >>> *2. no way to access Beam context now;* >>> >>> For *#2*, I think we need to find a way to expose it, at least our >>> UDF/UDAF should be able to access it to leverage the advantages of Beam >>> module. >>> >>> Any comments? >>> >>> >>> On Wed, Sep 19, 2018 at 2:55 PM Rui Wang <[email protected]> wrote: >>> >>>> This is a so exciting change! >>>> >>>> Since we cannot mix current implementation with Calcite code >>>> generation, is there any case that Calcite code generation does not support >>>> but our current implementation supports, so switching to Calcite code >>>> generation will have some impact to existing usage? >>>> >>>> -Rui >>>> >>>> On Wed, Sep 19, 2018 at 11:53 AM Andrew Pilloud <[email protected]> >>>> wrote: >>>> >>>>> To follow up on this, the PR is now in a reviewable state and I've >>>>> added more tests for FLOOR and CEIL. Both work with a more extensive set >>>>> of >>>>> arguments after this change. There are now 4 outstanding calcite PRs that >>>>> get all the tests passing. >>>>> >>>>> Unfortunately there is no easy way to mix our current implementation >>>>> and using Calcite's code generator. >>>>> >>>>> Andrew >>>>> >>>>> On Mon, Sep 17, 2018 at 3:22 PM Mingmin Xu <[email protected]> wrote: >>>>> >>>>>> Awesome work, we should call Calcite operator functions if available. >>>>>> >>>>>> I haven't get time to read the PR yet, for those impacted would keep >>>>>> existing implementation. One example is, I notice FLOOR/CEIL only >>>>>> supports >>>>>> months/years recently which is quite a surprise to me. >>>>>> >>>>>> Mingmin >>>>>> >>>>>> On Mon, Sep 17, 2018 at 3:03 PM Anton Kedin <[email protected]> wrote: >>>>>> >>>>>>> This is pretty amazing! Thank you for doing this! >>>>>>> >>>>>>> Regards, >>>>>>> Anton >>>>>>> >>>>>>> On Mon, Sep 17, 2018 at 2:27 PM Andrew Pilloud <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I've adapted Calcite's EnumerableCalc code generation to generate >>>>>>>> the BeamCalc DoFn. The primary purpose behind this change is so we can >>>>>>>> take >>>>>>>> advantage of Calcite's extensive SQL operator implementation. This >>>>>>>> deletes >>>>>>>> ~11000 lines of code from Beam (with ~350 added), significantly >>>>>>>> increases >>>>>>>> the set of supported SQL operators, and improves performance and >>>>>>>> correctness of currently supported operators. Here is my work in >>>>>>>> progress: >>>>>>>> https://github.com/apache/beam/pull/6417 >>>>>>>> >>>>>>>> There are a few bugs in Calcite that this has exposed: >>>>>>>> >>>>>>>> Fixed in Calcite master: >>>>>>>> >>>>>>>> - CALCITE-2321 >>>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2321> - The type >>>>>>>> of a union of CHAR columns of different lengths should be VARCHAR >>>>>>>> - CALCITE-2447 >>>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2447> - Some >>>>>>>> POWER, ATAN2 functions fail with NoSuchMethodException >>>>>>>> >>>>>>>> Pending PRs: >>>>>>>> >>>>>>>> - CALCITE-2529 >>>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2529> - linq4j >>>>>>>> should promote integer to floating point when generating function >>>>>>>> calls >>>>>>>> - CALCITE-2530 >>>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2530> - TRIM >>>>>>>> function does not throw exception when the length of trim character >>>>>>>> is not >>>>>>>> 1(one) >>>>>>>> >>>>>>>> More work: >>>>>>>> >>>>>>>> - CALCITE-2404 >>>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2404> - >>>>>>>> Accessing structured-types is not implemented by the runtime >>>>>>>> - (none yet) - Support multi character TRIM extension in Calcite >>>>>>>> >>>>>>>> I would like to push these changes in with these minor regressions. >>>>>>>> Do any of these Calcite bugs block this functionality being adding to >>>>>>>> Beam? >>>>>>>> >>>>>>>> Andrew >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> ---- >>>>>> Mingmin >>>>>> >>>>> >>> >>> -- >>> ---- >>> Mingmin >>> >> > > -- > ---- > Mingmin >
