Re: Supporting named row construction in Calcite SQL

Julian Hyde Sun, 15 Apr 2018 23:27:03 -0700

Beam and HerdDB are now on “powered by”: 
https://calcite.apache.org/docs/powered_by.html 
<https://calcite.apache.org/docs/powered_by.html>


> On Apr 12, 2018, at 5:33 AM, Michael Mior <[email protected]> wrote:
> 
> FYI, I talked to Julian earlier this week and he will be adding Beam to the
> powered by page since he has a doc for generating the image with the logos.
> 
> --
> Michael Mior
> [email protected]
> 
> 2018-04-12 4:06 GMT-04:00 Shuyi Chen <[email protected]>:
> 
>> @Andrew, great to see BEAM is also using Calcite streaming SQL. Maybe you
>> can help adding an entry in the Calcite powered_by page
>> <https://calcite.apache.org/docs/powered_by.html> for BEAM  by editing
>> site/_docs/powered_by.md
>> <https://github.com/apache/calcite/pull/657#diff-
>> 1aa810bc92051b46555ee4caf24bd6c3>.
>> Also,
>> can you explain a bit more on how you plan to use streaming SQL to
>> transform
>> arbitrary JSON objects with and w/o the AS STRUCT syntax?
>> For adding DDL in BEAM, please take a look at the server module and the
>> TYPE DDL (https://issues.apache.org/jira/browse/CALCITE-2045) I am adding.
>> Let me know if you have any comments and need any help.
>> 
>> @Rong, I think it's better to conform with the ROW SQL standard, and add
>> new grammar to handle named struct construction.
>> This should work with the TYPE DDL, and we should be able to CAST the
>> created STRUCT as a custom type defined using DDL.
>> 
>> @Julian, thanks for the suggestions. I think the AS STRUCT addition should
>> do it.
>> 
>> Shuyi
>> 
>> On Mon, Apr 9, 2018 at 4:36 AM, Julian Hyde <[email protected]>
>> wrote:
>> 
>>> For what it’s worth, ROW is standard SQL. If it does what you need, we
>>> should use it.
>>> 
>>> Reading your case quickly, I perceived that you needed a concise way to
>>> assign field names, and AS STRUCT seemed to do that.
>>> 
>>> But staying within the standard is always preferred. BigQuery isn’t
>> always
>>> good at that.
>>> 
>>> Julian
>>> 
>>>> On Apr 5, 2018, at 10:18, Rong Rong <[email protected]> wrote:
>>>> 
>>>> Thanks for the fantastic proposal @shuyi. I think the STRUCT idea is
>>> great
>>>> considering ROW is not standard SQL either. As a user of calcite I
>> have a
>>>> couple questions.
>>>> 
>>>> Since ROW constructor is so similar with STRUCT, would it be a good
>> idea
>>> to
>>>> consolidate the two syntax? Or have a clear distinction between?
>>>> 
>>>> Whats the relationship going forward with DDL, for example
>> CALCITE-2045.
>>>> DDL seems more flexible in terms of defining the structure not just on
>>>> field names but also field types. Maybe @andrew can share more on the
>> use
>>>> cases on calcite on beam steaming integration?
>>>> 
>>>> Thanks,
>>>> Rong
>>>> 
>>>> 
>>>> On Thu, Apr 5, 2018, 10:05 AM Andrew Pilloud
>> <[email protected]
>>>> 
>>>> wrote:
>>>> 
>>>>> As a user of Calcite working on adding streaming SQL to Apache Beam
>> this
>>>>> sounds like a fantastic proposal. Our initial goal is to be able to
>> run
>>> SQL
>>>>> queries that transform arbitrary JSON objects. Without this syntax
>>> objects
>>>>> must be flattened when they pass through the transform. Is this
>>> something
>>>>> that might make it into 1.17?
>>>>> 
>>>>> We have also had some discussion about adding DDL to Beam so a user
>> can
>>>>> describe the schema of a stream of JSON in pure SQL. Our current
>> though
>>> is
>>>>> to use Big Query compatible STRUCT and ARRAY syntax. Big Query is a
>>> popular
>>>>> sink for our users. Syntax compatible with Big Query would be a big
>> plus
>>>>> for us.
>>>>> 
>>>>> Andrew
>>>>> 
>>>>>> On Thu, Apr 5, 2018 at 12:43 AM Shuyi Chen <[email protected]>
>> wrote:
>>>>>> 
>>>>>> @Michael, @Albert, yes, I dont think it is SQL standard. But I think
>>> it's
>>>>>> very useful in the context of streaming SQL, e.g. Flink SQL, where
>> the
>>>>>> sinks can be a database or endpoints with defined protobuf/thrift
>>> schema.
>>>>>> They usually have complex structure. Supporting complex structure in
>>> SQL
>>>>>> output will make it much easier to write to different sinks with
>>>>> predefined
>>>>>> schemas in a unified way,
>>>>>> 
>>>>>> @julian, that's great suggestion, I think instead of extending the
>> ROW
>>>>>> constructor, which is not SQL standard, adding a new extension might
>> be
>>>>> the
>>>>>> right way to go. Looking at the STRUCT big query syntax, we can
>>> implement
>>>>>> something like the following:
>>>>>> 
>>>>>> SELECT STRUCT(a as first_name, b as last_name, STRUCT(c as zip code,
>> d
>>> as
>>>>>> street, e as state) as address) as record FROM example_table
>>>>>> 
>>>>>> On Wed, Apr 4, 2018 at 5:51 PM, Julian Hyde <[email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> If I recall correctly, Google BigQuery has SELECT AS STRUCT. It’s
>> not
>>>>>>> standard, but if it does what you need we could consider adopting
>> that
>>>>>>> syntax.
>>>>>>> 
>>>>>>> Julian
>>>>>>> 
>>>>>>>> On Apr 4, 2018, at 10:23 AM, Albert <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> if it is not SQL standard, it's just a matter of categorizing it to
>>>>>> some
>>>>>>>> dialect ?
>>>>>>>> 
>>>>>>>>> On Wed, Apr 4, 2018 at 10:19 AM, Michael Mior <[email protected]
>>> 
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Apologies for my silence. I don't really have thoughts on the
>> matter
>>>>>> at
>>>>>>>>> this point. It might be helpful if you can give an example of what
>>>>>>> you're
>>>>>>>>> proposing. Unless I'm missing something (very possible), it's not
>>>>> part
>>>>>>> of
>>>>>>>>> the SQL standard.
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Michael Mior
>>>>>>>>> [email protected]
>>>>>>>>> 
>>>>>>>>> 2018-04-03 18:48 GMT-04:00 Shuyi Chen <[email protected]>:
>>>>>>>>> 
>>>>>>>>>> Friendly ping, any thoughts? Much appreciated.
>>>>>>>>>> 
>>>>>>>>>> Shuyi
>>>>>>>>>> 
>>>>>>>>>>> On Tue, Mar 27, 2018 at 11:59 PM, Shuyi Chen <
>> [email protected]>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi community,
>>>>>>>>>>> 
>>>>>>>>>>> I am thinking of adding the following support in Calcite to
>>>>> support
>>>>>>>>> named
>>>>>>>>>>> row construction, e.g.
>>>>>>>>>>> 
>>>>>>>>>>> SELECT (a as first_name, b as last_name, (c as zip code, d as
>>>>>> street,
>>>>>>> e
>>>>>>>>>> as
>>>>>>>>>>> state) as address) as record FROM example_table
>>>>>>>>>>> 
>>>>>>>>>>> The output will be struct with field names specified in the SQL.
>>>>> The
>>>>>>>>>> usage
>>>>>>>>>>> scenario is that say, in streaming SQL, the downstream sink's
>>>>> schema
>>>>>>>>> can
>>>>>>>>>>> not be changed, so we will need to use SQL to construct a struct
>>>>>> with
>>>>>>>>> the
>>>>>>>>>>> proper naming according to the schema in order to write to the
>>>>>>>>> downstream
>>>>>>>>>>> sinks. Thanks a lot.
>>>>>>>>>>> 
>>>>>>>>>>> Shuyi
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> "So you have to trust that the dots will somehow connect in your
>>>>>>>>> future."
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> "So you have to trust that the dots will somehow connect in your
>>>>>>> future."
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> ~~~~~~~~~~~~~~~
>>>>>>>> no mistakes
>>>>>>>> ~~~~~~~~~~~~~~~~~~
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> "So you have to trust that the dots will somehow connect in your
>>> future."
>>>>>> 
>>>>> 
>>> 
>> 
>> 
>> 
>> --
>> "So you have to trust that the dots will somehow connect in your future."
>>

Re: Supporting named row construction in Calcite SQL

Reply via email to