1. Window start/end: Actually this is already provided in other ways and
the window in the SQL environment is unused and just waiting to be deleted.
So you can still access TUMBLE_START, etc. This is well-defined as a part
of the row so there's no semantic problem, but I think it should already
work.
*MM: Others work except SESSION_END();*

2. Pane information: I don't think access to pane info is enough for
correct results for a SQL join that triggers more than once. The pane info
is part of a Beam element, but these records just represent a kind of
changelog of the aggregation/join. The general solution is retractions.
Until we finish that, you need to follow the Join/CoGBK with custom logic ,
often a stateful DoFn to get the join results right. For example, if both
inputs are append-only relations and it is an equijoin, then you can do
this with a dedupe when you unpack the CoGbkResult. I am guessing this is
the main use case for BEAM-5204. Is it your use case?
*MM: my case is a self-join with SQL-only, written as [DISCARD_Pane JOIN
ACCU_Pane];*
*These UDFs is not a blocker, limitation in BEAM-5204 should be removed
directly IMO. With multiple-trigger assigned, developers need to handle the
output which is not complex with Java SDK, but very hard for SQL only
cases. *


On Thu, Nov 15, 2018 at 10:54 AM Kenneth Knowles <[email protected]> wrote:

> From https://issues.apache.org/jira/browse/BEAM-5204 it seems like what
> you most care about is to have joins that trigger more than once per
> window. To accomplish it you hope to build an "escape hatch" from
> SQL/relational semantics to specialized Beam SQL semantics. It could make
> sense with extreme care.
>
> Separating the two parts:
>
> 1. Window start/end: Actually this is already provided in other ways and
> the window in the SQL environment is unused and just waiting to be deleted.
> So you can still access TUMBLE_START, etc. This is well-defined as a part
> of the row so there's no semantic problem, but I think it should already
> work.
>
> 2. Pane information: I don't think access to pane info is enough for
> correct results for a SQL join that triggers more than once. The pane info
> is part of a Beam element, but these records just represent a kind of
> changelog of the aggregation/join. The general solution is retractions.
> Until we finish that, you need to follow the Join/CoGBK with custom logic ,
> often a stateful DoFn to get the join results right. For example, if both
> inputs are append-only relations and it is an equijoin, then you can do
> this with a dedupe when you unpack the CoGbkResult. I am guessing this is
> the main use case for BEAM-5204. Is it your use case?
>
> Kenn
>
> On Thu, Nov 15, 2018 at 10:08 AM Mingmin Xu <[email protected]> wrote:
>
>> Raise this thread.
>> Seems there're more changes in the backend on how a FUNCTION is executed
>> in the backend, as noticed in #6996
>> <https://github.com/apache/beam/pull/6996>:
>> 1. BeamSqlExpression and BeamSqlExpressionExecutor are removed;
>> 2. BeamSqlExpressionEnvironment are removed;
>>
>> Then,
>> 1. for Calcite defined FUNCTIONS, it uses Calcite generated code (which
>> is great and duplicate work is worthless);
>> *2. no way to access Beam context now;*
>>
>> For *#2*, I think we need to find a way to expose it, at least our
>> UDF/UDAF should be able to access it to leverage the advantages of Beam
>> module.
>>
>> Any comments?
>>
>>
>> On Wed, Sep 19, 2018 at 2:55 PM Rui Wang <[email protected]> wrote:
>>
>>> This is a so exciting change!
>>>
>>> Since we cannot mix current implementation with Calcite code generation,
>>> is there any case that Calcite code generation does not support but our
>>> current implementation supports, so switching to Calcite code generation
>>> will have some impact to existing usage?
>>>
>>> -Rui
>>>
>>> On Wed, Sep 19, 2018 at 11:53 AM Andrew Pilloud <[email protected]>
>>> wrote:
>>>
>>>> To follow up on this, the PR is now in a reviewable state and I've
>>>> added more tests for FLOOR and CEIL. Both work with a more extensive set of
>>>> arguments after this change. There are now 4 outstanding calcite PRs that
>>>> get all the tests passing.
>>>>
>>>> Unfortunately there is no easy way to mix our current implementation
>>>> and using Calcite's code generator.
>>>>
>>>> Andrew
>>>>
>>>> On Mon, Sep 17, 2018 at 3:22 PM Mingmin Xu <[email protected]> wrote:
>>>>
>>>>> Awesome work, we should call Calcite operator functions if available.
>>>>>
>>>>> I haven't get time to read the PR yet, for those impacted would keep
>>>>> existing implementation. One example is, I notice FLOOR/CEIL only supports
>>>>> months/years recently which is quite a surprise to me.
>>>>>
>>>>> Mingmin
>>>>>
>>>>> On Mon, Sep 17, 2018 at 3:03 PM Anton Kedin <[email protected]> wrote:
>>>>>
>>>>>> This is pretty amazing! Thank you for doing this!
>>>>>>
>>>>>> Regards,
>>>>>> Anton
>>>>>>
>>>>>> On Mon, Sep 17, 2018 at 2:27 PM Andrew Pilloud <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I've adapted Calcite's EnumerableCalc code generation to generate
>>>>>>> the BeamCalc DoFn. The primary purpose behind this change is so we can 
>>>>>>> take
>>>>>>> advantage of Calcite's extensive SQL operator implementation. This 
>>>>>>> deletes
>>>>>>> ~11000 lines of code from Beam (with ~350 added), significantly 
>>>>>>> increases
>>>>>>> the set of supported SQL operators, and improves performance and
>>>>>>> correctness of currently supported operators. Here is my work in 
>>>>>>> progress:
>>>>>>> https://github.com/apache/beam/pull/6417
>>>>>>>
>>>>>>> There are a few bugs in Calcite that this has exposed:
>>>>>>>
>>>>>>> Fixed in Calcite master:
>>>>>>>
>>>>>>>    - CALCITE-2321
>>>>>>>    <https://issues.apache.org/jira/browse/CALCITE-2321> - The type
>>>>>>>    of a union of CHAR columns of different lengths should be VARCHAR
>>>>>>>    - CALCITE-2447
>>>>>>>    <https://issues.apache.org/jira/browse/CALCITE-2447> - Some
>>>>>>>    POWER, ATAN2 functions fail with NoSuchMethodException
>>>>>>>
>>>>>>> Pending PRs:
>>>>>>>
>>>>>>>    - CALCITE-2529
>>>>>>>    <https://issues.apache.org/jira/browse/CALCITE-2529> - linq4j
>>>>>>>    should promote integer to floating point when generating function 
>>>>>>> calls
>>>>>>>    - CALCITE-2530
>>>>>>>    <https://issues.apache.org/jira/browse/CALCITE-2530> - TRIM
>>>>>>>    function does not throw exception when the length of trim character 
>>>>>>> is not
>>>>>>>    1(one)
>>>>>>>
>>>>>>> More work:
>>>>>>>
>>>>>>>    - CALCITE-2404
>>>>>>>    <https://issues.apache.org/jira/browse/CALCITE-2404> - Accessing
>>>>>>>    structured-types is not implemented by the runtime
>>>>>>>    - (none yet) - Support multi character TRIM extension in Calcite
>>>>>>>
>>>>>>> I would like to push these changes in with these minor regressions.
>>>>>>> Do any of these Calcite bugs block this functionality being adding to 
>>>>>>> Beam?
>>>>>>>
>>>>>>> Andrew
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> ----
>>>>> Mingmin
>>>>>
>>>>
>>
>> --
>> ----
>> Mingmin
>>
>

-- 
----
Mingmin

Reply via email to