1. Window start/end: Actually this is already provided in other ways and the window in the SQL environment is unused and just waiting to be deleted. So you can still access TUMBLE_START, etc. This is well-defined as a part of the row so there's no semantic problem, but I think it should already work. *MM: Others work except SESSION_END();*
2. Pane information: I don't think access to pane info is enough for correct results for a SQL join that triggers more than once. The pane info is part of a Beam element, but these records just represent a kind of changelog of the aggregation/join. The general solution is retractions. Until we finish that, you need to follow the Join/CoGBK with custom logic , often a stateful DoFn to get the join results right. For example, if both inputs are append-only relations and it is an equijoin, then you can do this with a dedupe when you unpack the CoGbkResult. I am guessing this is the main use case for BEAM-5204. Is it your use case? *MM: my case is a self-join with SQL-only, written as [DISCARD_Pane JOIN ACCU_Pane];* *These UDFs is not a blocker, limitation in BEAM-5204 should be removed directly IMO. With multiple-trigger assigned, developers need to handle the output which is not complex with Java SDK, but very hard for SQL only cases. * On Thu, Nov 15, 2018 at 10:54 AM Kenneth Knowles <[email protected]> wrote: > From https://issues.apache.org/jira/browse/BEAM-5204 it seems like what > you most care about is to have joins that trigger more than once per > window. To accomplish it you hope to build an "escape hatch" from > SQL/relational semantics to specialized Beam SQL semantics. It could make > sense with extreme care. > > Separating the two parts: > > 1. Window start/end: Actually this is already provided in other ways and > the window in the SQL environment is unused and just waiting to be deleted. > So you can still access TUMBLE_START, etc. This is well-defined as a part > of the row so there's no semantic problem, but I think it should already > work. > > 2. Pane information: I don't think access to pane info is enough for > correct results for a SQL join that triggers more than once. The pane info > is part of a Beam element, but these records just represent a kind of > changelog of the aggregation/join. The general solution is retractions. > Until we finish that, you need to follow the Join/CoGBK with custom logic , > often a stateful DoFn to get the join results right. For example, if both > inputs are append-only relations and it is an equijoin, then you can do > this with a dedupe when you unpack the CoGbkResult. I am guessing this is > the main use case for BEAM-5204. Is it your use case? > > Kenn > > On Thu, Nov 15, 2018 at 10:08 AM Mingmin Xu <[email protected]> wrote: > >> Raise this thread. >> Seems there're more changes in the backend on how a FUNCTION is executed >> in the backend, as noticed in #6996 >> <https://github.com/apache/beam/pull/6996>: >> 1. BeamSqlExpression and BeamSqlExpressionExecutor are removed; >> 2. BeamSqlExpressionEnvironment are removed; >> >> Then, >> 1. for Calcite defined FUNCTIONS, it uses Calcite generated code (which >> is great and duplicate work is worthless); >> *2. no way to access Beam context now;* >> >> For *#2*, I think we need to find a way to expose it, at least our >> UDF/UDAF should be able to access it to leverage the advantages of Beam >> module. >> >> Any comments? >> >> >> On Wed, Sep 19, 2018 at 2:55 PM Rui Wang <[email protected]> wrote: >> >>> This is a so exciting change! >>> >>> Since we cannot mix current implementation with Calcite code generation, >>> is there any case that Calcite code generation does not support but our >>> current implementation supports, so switching to Calcite code generation >>> will have some impact to existing usage? >>> >>> -Rui >>> >>> On Wed, Sep 19, 2018 at 11:53 AM Andrew Pilloud <[email protected]> >>> wrote: >>> >>>> To follow up on this, the PR is now in a reviewable state and I've >>>> added more tests for FLOOR and CEIL. Both work with a more extensive set of >>>> arguments after this change. There are now 4 outstanding calcite PRs that >>>> get all the tests passing. >>>> >>>> Unfortunately there is no easy way to mix our current implementation >>>> and using Calcite's code generator. >>>> >>>> Andrew >>>> >>>> On Mon, Sep 17, 2018 at 3:22 PM Mingmin Xu <[email protected]> wrote: >>>> >>>>> Awesome work, we should call Calcite operator functions if available. >>>>> >>>>> I haven't get time to read the PR yet, for those impacted would keep >>>>> existing implementation. One example is, I notice FLOOR/CEIL only supports >>>>> months/years recently which is quite a surprise to me. >>>>> >>>>> Mingmin >>>>> >>>>> On Mon, Sep 17, 2018 at 3:03 PM Anton Kedin <[email protected]> wrote: >>>>> >>>>>> This is pretty amazing! Thank you for doing this! >>>>>> >>>>>> Regards, >>>>>> Anton >>>>>> >>>>>> On Mon, Sep 17, 2018 at 2:27 PM Andrew Pilloud <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I've adapted Calcite's EnumerableCalc code generation to generate >>>>>>> the BeamCalc DoFn. The primary purpose behind this change is so we can >>>>>>> take >>>>>>> advantage of Calcite's extensive SQL operator implementation. This >>>>>>> deletes >>>>>>> ~11000 lines of code from Beam (with ~350 added), significantly >>>>>>> increases >>>>>>> the set of supported SQL operators, and improves performance and >>>>>>> correctness of currently supported operators. Here is my work in >>>>>>> progress: >>>>>>> https://github.com/apache/beam/pull/6417 >>>>>>> >>>>>>> There are a few bugs in Calcite that this has exposed: >>>>>>> >>>>>>> Fixed in Calcite master: >>>>>>> >>>>>>> - CALCITE-2321 >>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2321> - The type >>>>>>> of a union of CHAR columns of different lengths should be VARCHAR >>>>>>> - CALCITE-2447 >>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2447> - Some >>>>>>> POWER, ATAN2 functions fail with NoSuchMethodException >>>>>>> >>>>>>> Pending PRs: >>>>>>> >>>>>>> - CALCITE-2529 >>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2529> - linq4j >>>>>>> should promote integer to floating point when generating function >>>>>>> calls >>>>>>> - CALCITE-2530 >>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2530> - TRIM >>>>>>> function does not throw exception when the length of trim character >>>>>>> is not >>>>>>> 1(one) >>>>>>> >>>>>>> More work: >>>>>>> >>>>>>> - CALCITE-2404 >>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2404> - Accessing >>>>>>> structured-types is not implemented by the runtime >>>>>>> - (none yet) - Support multi character TRIM extension in Calcite >>>>>>> >>>>>>> I would like to push these changes in with these minor regressions. >>>>>>> Do any of these Calcite bugs block this functionality being adding to >>>>>>> Beam? >>>>>>> >>>>>>> Andrew >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> ---- >>>>> Mingmin >>>>> >>>> >> >> -- >> ---- >> Mingmin >> > -- ---- Mingmin
