For the not working SESSION_END(), I had an investigation on it:

https://issues.apache.org/jira/browse/BEAM-5799
https://issues.apache.org/jira/browse/CALCITE-2645

According to the reply in the Calcite JIRA, there might be some other way
to implement SESSION_END. I haven't looked into it though.

-Rui

On Thu, Nov 15, 2018 at 11:56 AM Mingmin Xu <[email protected]> wrote:

> 1. Window start/end: Actually this is already provided in other ways and
> the window in the SQL environment is unused and just waiting to be deleted.
> So you can still access TUMBLE_START, etc. This is well-defined as a part
> of the row so there's no semantic problem, but I think it should already
> work.
> *MM: Others work except SESSION_END();*
>
> 2. Pane information: I don't think access to pane info is enough for
> correct results for a SQL join that triggers more than once. The pane info
> is part of a Beam element, but these records just represent a kind of
> changelog of the aggregation/join. The general solution is retractions.
> Until we finish that, you need to follow the Join/CoGBK with custom logic ,
> often a stateful DoFn to get the join results right. For example, if both
> inputs are append-only relations and it is an equijoin, then you can do
> this with a dedupe when you unpack the CoGbkResult. I am guessing this is
> the main use case for BEAM-5204. Is it your use case?
> *MM: my case is a self-join with SQL-only, written as [DISCARD_Pane JOIN
> ACCU_Pane];*
> *These UDFs is not a blocker, limitation in BEAM-5204 should be removed
> directly IMO. With multiple-trigger assigned, developers need to handle the
> output which is not complex with Java SDK, but very hard for SQL only
> cases. *
>
>
> On Thu, Nov 15, 2018 at 10:54 AM Kenneth Knowles <[email protected]> wrote:
>
>> From https://issues.apache.org/jira/browse/BEAM-5204 it seems like what
>> you most care about is to have joins that trigger more than once per
>> window. To accomplish it you hope to build an "escape hatch" from
>> SQL/relational semantics to specialized Beam SQL semantics. It could make
>> sense with extreme care.
>>
>> Separating the two parts:
>>
>> 1. Window start/end: Actually this is already provided in other ways and
>> the window in the SQL environment is unused and just waiting to be deleted.
>> So you can still access TUMBLE_START, etc. This is well-defined as a part
>> of the row so there's no semantic problem, but I think it should already
>> work.
>>
>> 2. Pane information: I don't think access to pane info is enough for
>> correct results for a SQL join that triggers more than once. The pane info
>> is part of a Beam element, but these records just represent a kind of
>> changelog of the aggregation/join. The general solution is retractions.
>> Until we finish that, you need to follow the Join/CoGBK with custom logic ,
>> often a stateful DoFn to get the join results right. For example, if both
>> inputs are append-only relations and it is an equijoin, then you can do
>> this with a dedupe when you unpack the CoGbkResult. I am guessing this is
>> the main use case for BEAM-5204. Is it your use case?
>>
>> Kenn
>>
>> On Thu, Nov 15, 2018 at 10:08 AM Mingmin Xu <[email protected]> wrote:
>>
>>> Raise this thread.
>>> Seems there're more changes in the backend on how a FUNCTION is executed
>>> in the backend, as noticed in #6996
>>> <https://github.com/apache/beam/pull/6996>:
>>> 1. BeamSqlExpression and BeamSqlExpressionExecutor are removed;
>>> 2. BeamSqlExpressionEnvironment are removed;
>>>
>>> Then,
>>> 1. for Calcite defined FUNCTIONS, it uses Calcite generated code (which
>>> is great and duplicate work is worthless);
>>> *2. no way to access Beam context now;*
>>>
>>> For *#2*, I think we need to find a way to expose it, at least our
>>> UDF/UDAF should be able to access it to leverage the advantages of Beam
>>> module.
>>>
>>> Any comments?
>>>
>>>
>>> On Wed, Sep 19, 2018 at 2:55 PM Rui Wang <[email protected]> wrote:
>>>
>>>> This is a so exciting change!
>>>>
>>>> Since we cannot mix current implementation with Calcite code
>>>> generation, is there any case that Calcite code generation does not support
>>>> but our current implementation supports, so switching to Calcite code
>>>> generation will have some impact to existing usage?
>>>>
>>>> -Rui
>>>>
>>>> On Wed, Sep 19, 2018 at 11:53 AM Andrew Pilloud <[email protected]>
>>>> wrote:
>>>>
>>>>> To follow up on this, the PR is now in a reviewable state and I've
>>>>> added more tests for FLOOR and CEIL. Both work with a more extensive set 
>>>>> of
>>>>> arguments after this change. There are now 4 outstanding calcite PRs that
>>>>> get all the tests passing.
>>>>>
>>>>> Unfortunately there is no easy way to mix our current implementation
>>>>> and using Calcite's code generator.
>>>>>
>>>>> Andrew
>>>>>
>>>>> On Mon, Sep 17, 2018 at 3:22 PM Mingmin Xu <[email protected]> wrote:
>>>>>
>>>>>> Awesome work, we should call Calcite operator functions if available.
>>>>>>
>>>>>> I haven't get time to read the PR yet, for those impacted would keep
>>>>>> existing implementation. One example is, I notice FLOOR/CEIL only 
>>>>>> supports
>>>>>> months/years recently which is quite a surprise to me.
>>>>>>
>>>>>> Mingmin
>>>>>>
>>>>>> On Mon, Sep 17, 2018 at 3:03 PM Anton Kedin <[email protected]> wrote:
>>>>>>
>>>>>>> This is pretty amazing! Thank you for doing this!
>>>>>>>
>>>>>>> Regards,
>>>>>>> Anton
>>>>>>>
>>>>>>> On Mon, Sep 17, 2018 at 2:27 PM Andrew Pilloud <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I've adapted Calcite's EnumerableCalc code generation to generate
>>>>>>>> the BeamCalc DoFn. The primary purpose behind this change is so we can 
>>>>>>>> take
>>>>>>>> advantage of Calcite's extensive SQL operator implementation. This 
>>>>>>>> deletes
>>>>>>>> ~11000 lines of code from Beam (with ~350 added), significantly 
>>>>>>>> increases
>>>>>>>> the set of supported SQL operators, and improves performance and
>>>>>>>> correctness of currently supported operators. Here is my work in 
>>>>>>>> progress:
>>>>>>>> https://github.com/apache/beam/pull/6417
>>>>>>>>
>>>>>>>> There are a few bugs in Calcite that this has exposed:
>>>>>>>>
>>>>>>>> Fixed in Calcite master:
>>>>>>>>
>>>>>>>>    - CALCITE-2321
>>>>>>>>    <https://issues.apache.org/jira/browse/CALCITE-2321> - The type
>>>>>>>>    of a union of CHAR columns of different lengths should be VARCHAR
>>>>>>>>    - CALCITE-2447
>>>>>>>>    <https://issues.apache.org/jira/browse/CALCITE-2447> - Some
>>>>>>>>    POWER, ATAN2 functions fail with NoSuchMethodException
>>>>>>>>
>>>>>>>> Pending PRs:
>>>>>>>>
>>>>>>>>    - CALCITE-2529
>>>>>>>>    <https://issues.apache.org/jira/browse/CALCITE-2529> - linq4j
>>>>>>>>    should promote integer to floating point when generating function 
>>>>>>>> calls
>>>>>>>>    - CALCITE-2530
>>>>>>>>    <https://issues.apache.org/jira/browse/CALCITE-2530> - TRIM
>>>>>>>>    function does not throw exception when the length of trim character 
>>>>>>>> is not
>>>>>>>>    1(one)
>>>>>>>>
>>>>>>>> More work:
>>>>>>>>
>>>>>>>>    - CALCITE-2404
>>>>>>>>    <https://issues.apache.org/jira/browse/CALCITE-2404> -
>>>>>>>>    Accessing structured-types is not implemented by the runtime
>>>>>>>>    - (none yet) - Support multi character TRIM extension in Calcite
>>>>>>>>
>>>>>>>> I would like to push these changes in with these minor regressions.
>>>>>>>> Do any of these Calcite bugs block this functionality being adding to 
>>>>>>>> Beam?
>>>>>>>>
>>>>>>>> Andrew
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> ----
>>>>>> Mingmin
>>>>>>
>>>>>
>>>
>>> --
>>> ----
>>> Mingmin
>>>
>>
>
> --
> ----
> Mingmin
>

Reply via email to