I've spent most of the day getting beam working on Calcite 1.18 and now I'm
just starting to work on making your Joda UDF changes compatible with
Calcite's code generation, so I haven't gotten very far into investigating
how we might handle this yet.

Calcite has multiple aggregate function formats, including ones that have
access the window, but it assumes that scalar project functions never need
to get at the window.  I'm still struggling to understand what the right
semantics are here. I'm not convinced this belongs in the scalar UDF path.
Might this really be something that is an extension on join?

As for making immediate progress: When we switch to Calcite's code
generation, it looks like we can implement our own generator for windowed
UDFs. We could put the window into Calcite's DataContext
<https://github.com/apache/beam/blob/f52e5dd82ebf080bd04d216c60f3842c51140410/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java#L389>,
and then pull it out for the function. I'm a little hesitant to do this way
because it will force us to run the Calc operator (all the scalar
operations) in a windowed doFn. I think we will need to push this
distinction into the rel layer (as a WindowedCalc operation) so the planner
can take it into consideration. I'm going to keep moving forward on this
tomorrow, hope to have a proof of concept showing we can generate our own
UDF calls.

Andrew

On Thu, Nov 15, 2018 at 12:11 PM Rui Wang <ruw...@google.com> wrote:

> For the not working SESSION_END(), I had an investigation on it:
>
> https://issues.apache.org/jira/browse/BEAM-5799
> https://issues.apache.org/jira/browse/CALCITE-2645
>
> According to the reply in the Calcite JIRA, there might be some other way
> to implement SESSION_END. I haven't looked into it though.
>
> -Rui
>
> On Thu, Nov 15, 2018 at 11:56 AM Mingmin Xu <mingm...@gmail.com> wrote:
>
>> 1. Window start/end: Actually this is already provided in other ways and
>> the window in the SQL environment is unused and just waiting to be deleted.
>> So you can still access TUMBLE_START, etc. This is well-defined as a part
>> of the row so there's no semantic problem, but I think it should already
>> work.
>> *MM: Others work except SESSION_END();*
>>
>> 2. Pane information: I don't think access to pane info is enough for
>> correct results for a SQL join that triggers more than once. The pane info
>> is part of a Beam element, but these records just represent a kind of
>> changelog of the aggregation/join. The general solution is retractions.
>> Until we finish that, you need to follow the Join/CoGBK with custom logic ,
>> often a stateful DoFn to get the join results right. For example, if both
>> inputs are append-only relations and it is an equijoin, then you can do
>> this with a dedupe when you unpack the CoGbkResult. I am guessing this is
>> the main use case for BEAM-5204. Is it your use case?
>> *MM: my case is a self-join with SQL-only, written as [DISCARD_Pane JOIN
>> ACCU_Pane];*
>> *These UDFs is not a blocker, limitation in BEAM-5204 should be removed
>> directly IMO. With multiple-trigger assigned, developers need to handle the
>> output which is not complex with Java SDK, but very hard for SQL only
>> cases. *
>>
>>
>> On Thu, Nov 15, 2018 at 10:54 AM Kenneth Knowles <k...@apache.org> wrote:
>>
>>> From https://issues.apache.org/jira/browse/BEAM-5204 it seems like what
>>> you most care about is to have joins that trigger more than once per
>>> window. To accomplish it you hope to build an "escape hatch" from
>>> SQL/relational semantics to specialized Beam SQL semantics. It could make
>>> sense with extreme care.
>>>
>>> Separating the two parts:
>>>
>>> 1. Window start/end: Actually this is already provided in other ways and
>>> the window in the SQL environment is unused and just waiting to be deleted.
>>> So you can still access TUMBLE_START, etc. This is well-defined as a part
>>> of the row so there's no semantic problem, but I think it should already
>>> work.
>>>
>>> 2. Pane information: I don't think access to pane info is enough for
>>> correct results for a SQL join that triggers more than once. The pane info
>>> is part of a Beam element, but these records just represent a kind of
>>> changelog of the aggregation/join. The general solution is retractions.
>>> Until we finish that, you need to follow the Join/CoGBK with custom logic ,
>>> often a stateful DoFn to get the join results right. For example, if both
>>> inputs are append-only relations and it is an equijoin, then you can do
>>> this with a dedupe when you unpack the CoGbkResult. I am guessing this is
>>> the main use case for BEAM-5204. Is it your use case?
>>>
>>> Kenn
>>>
>>> On Thu, Nov 15, 2018 at 10:08 AM Mingmin Xu <mingm...@gmail.com> wrote:
>>>
>>>> Raise this thread.
>>>> Seems there're more changes in the backend on how a FUNCTION is
>>>> executed in the backend, as noticed in #6996
>>>> <https://github.com/apache/beam/pull/6996>:
>>>> 1. BeamSqlExpression and BeamSqlExpressionExecutor are removed;
>>>> 2. BeamSqlExpressionEnvironment are removed;
>>>>
>>>> Then,
>>>> 1. for Calcite defined FUNCTIONS, it uses Calcite generated code (which
>>>> is great and duplicate work is worthless);
>>>> *2. no way to access Beam context now;*
>>>>
>>>> For *#2*, I think we need to find a way to expose it, at least our
>>>> UDF/UDAF should be able to access it to leverage the advantages of Beam
>>>> module.
>>>>
>>>> Any comments?
>>>>
>>>>
>>>> On Wed, Sep 19, 2018 at 2:55 PM Rui Wang <ruw...@google.com> wrote:
>>>>
>>>>> This is a so exciting change!
>>>>>
>>>>> Since we cannot mix current implementation with Calcite code
>>>>> generation, is there any case that Calcite code generation does not 
>>>>> support
>>>>> but our current implementation supports, so switching to Calcite code
>>>>> generation will have some impact to existing usage?
>>>>>
>>>>> -Rui
>>>>>
>>>>> On Wed, Sep 19, 2018 at 11:53 AM Andrew Pilloud <apill...@google.com>
>>>>> wrote:
>>>>>
>>>>>> To follow up on this, the PR is now in a reviewable state and I've
>>>>>> added more tests for FLOOR and CEIL. Both work with a more extensive set 
>>>>>> of
>>>>>> arguments after this change. There are now 4 outstanding calcite PRs that
>>>>>> get all the tests passing.
>>>>>>
>>>>>> Unfortunately there is no easy way to mix our current implementation
>>>>>> and using Calcite's code generator.
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>> On Mon, Sep 17, 2018 at 3:22 PM Mingmin Xu <mingm...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Awesome work, we should call Calcite operator functions if
>>>>>>> available.
>>>>>>>
>>>>>>> I haven't get time to read the PR yet, for those impacted would keep
>>>>>>> existing implementation. One example is, I notice FLOOR/CEIL only 
>>>>>>> supports
>>>>>>> months/years recently which is quite a surprise to me.
>>>>>>>
>>>>>>> Mingmin
>>>>>>>
>>>>>>> On Mon, Sep 17, 2018 at 3:03 PM Anton Kedin <ke...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> This is pretty amazing! Thank you for doing this!
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Anton
>>>>>>>>
>>>>>>>> On Mon, Sep 17, 2018 at 2:27 PM Andrew Pilloud <apill...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I've adapted Calcite's EnumerableCalc code generation to generate
>>>>>>>>> the BeamCalc DoFn. The primary purpose behind this change is so we 
>>>>>>>>> can take
>>>>>>>>> advantage of Calcite's extensive SQL operator implementation. This 
>>>>>>>>> deletes
>>>>>>>>> ~11000 lines of code from Beam (with ~350 added), significantly 
>>>>>>>>> increases
>>>>>>>>> the set of supported SQL operators, and improves performance and
>>>>>>>>> correctness of currently supported operators. Here is my work in 
>>>>>>>>> progress:
>>>>>>>>> https://github.com/apache/beam/pull/6417
>>>>>>>>>
>>>>>>>>> There are a few bugs in Calcite that this has exposed:
>>>>>>>>>
>>>>>>>>> Fixed in Calcite master:
>>>>>>>>>
>>>>>>>>>    - CALCITE-2321
>>>>>>>>>    <https://issues.apache.org/jira/browse/CALCITE-2321> - The
>>>>>>>>>    type of a union of CHAR columns of different lengths should be 
>>>>>>>>> VARCHAR
>>>>>>>>>    - CALCITE-2447
>>>>>>>>>    <https://issues.apache.org/jira/browse/CALCITE-2447> - Some
>>>>>>>>>    POWER, ATAN2 functions fail with NoSuchMethodException
>>>>>>>>>
>>>>>>>>> Pending PRs:
>>>>>>>>>
>>>>>>>>>    - CALCITE-2529
>>>>>>>>>    <https://issues.apache.org/jira/browse/CALCITE-2529> - linq4j
>>>>>>>>>    should promote integer to floating point when generating function 
>>>>>>>>> calls
>>>>>>>>>    - CALCITE-2530
>>>>>>>>>    <https://issues.apache.org/jira/browse/CALCITE-2530> - TRIM
>>>>>>>>>    function does not throw exception when the length of trim 
>>>>>>>>> character is not
>>>>>>>>>    1(one)
>>>>>>>>>
>>>>>>>>> More work:
>>>>>>>>>
>>>>>>>>>    - CALCITE-2404
>>>>>>>>>    <https://issues.apache.org/jira/browse/CALCITE-2404> -
>>>>>>>>>    Accessing structured-types is not implemented by the runtime
>>>>>>>>>    - (none yet) - Support multi character TRIM extension in
>>>>>>>>>    Calcite
>>>>>>>>>
>>>>>>>>> I would like to push these changes in with these minor
>>>>>>>>> regressions. Do any of these Calcite bugs block this functionality 
>>>>>>>>> being
>>>>>>>>> adding to Beam?
>>>>>>>>>
>>>>>>>>> Andrew
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ----
>>>>>>> Mingmin
>>>>>>>
>>>>>>
>>>>
>>>> --
>>>> ----
>>>> Mingmin
>>>>
>>>
>>
>> --
>> ----
>> Mingmin
>>
>

Reply via email to