Re: Complex Types Support in DDL

Julian Hyde Wed, 02 May 2018 12:58:40 -0700

Agreed.

Test re-use = specification re-use.


Code re-use = much harder.

> On May 2, 2018, at 12:38 PM, Michael Mior <[email protected]> wrote:
> 
> That makes sense to me. I agree that it's probably not very useful to try
> to share anything in the parser between calcite-server and calcite-babel
> since calcite-babel will always be a moving target. However, given that
> calcite-babel is intended to be particularly permissive, it would be great
> to have a way to run calcite-server DDL tests against calcite-babel.
> 
> --
> Michael Mior
> [email protected]
> 
> 
> Le mer. 2 mai 2018 à 14:34, Shuyi Chen <[email protected]> a écrit :
> 
>> Yes, that's what's in my mind as well. Server module is kinda of Calcite's
>> DDL, people that use Calcite directly can just use server module for their
>> DDL purpose. Other SQL dialect have their own DDL, and in order for them to
>> leverage Calcite's relational algebra and query planning, the Babel parser
>> need to be able to parse both DML and DDL of their own dialect. Would that
>> be clear?
>> 
>> On Wed, May 2, 2018 at 11:23 AM, Julian Hyde <[email protected]> wrote:
>> 
>>> The principles are as follows:
>>> * Server should expose, as DDL, the concepts in Calcite’s framework, no
>>> more, no less. This includes the ability to define a type if supported by
>>> Calcite’s type system (RelDataTypeFactory), and the ability to define
>>> materialized views and lattices.
>>> * Babel should expose anything in a supported SQL dialect (or rather,
>>> anything that someone has found time to support).
>>> 
>>> Server’s specification is relatively fixed, whereas Babel’s specification
>>> is growing and changing all the time.
>>> 
>>> Julian
>>> 
>>> 
>>>> On May 2, 2018, at 10:06 AM, Michael Mior <[email protected]> wrote:
>>>> 
>>>> Seems logical to me, although I wonder if there's any way we could
>> easily
>>>> make the DDL part of the parser modular. At least before going too far
>>> down
>>>> the road of implementing DDL in Babel, it would be good to set a clear
>>>> scope of what will exist in calcite-babel vs. calcite-server.
>>>> 
>>>> --
>>>> Michael Mior
>>>> [email protected] <mailto:[email protected]>
>>>> 
>>>> 2018-05-02 12:57 GMT-04:00 Julian Hyde <[email protected] <mailto:
>>> [email protected]>>:
>>>> 
>>>>> By the way. We should also figure out how this fits with the project
>> to
>>>>> create a lenient parser that can handle any dialect of SQL. I am
>> calling
>>>>> that parser “Babel”[1]. That parser will be able to handle BigQuery
>>>>> dialect, among others.
>>>>> 
>>>>> Here’s my current thinking.
>>>>> 
>>>>> I think that Babel should be a new module (a sibling to
>> calcite-server,
>>>>> calcite-druid etc.) and its parser will extend the core parser. That
>>> means
>>>>> that calcite-babel will not inherit from the DDL parser in the
>>>>> calcite-server module, nor vice versa. We will probably end up with
>> two
>>>>> parsers that are capable of handling DDL, and two sets of AST classes.
>>> But
>>>>> I think that is OK, or at least, better than the chaos of trying to
>>> reuse
>>>>> too much. At least, the parsers will share 99% of their DNA with the
>>> core
>>>>> parser. And we can easily share tests.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> [1] https://issues.apache.org/jira/browse/CALCITE-2280 <
>>>>> https://issues.apache.org/jira/browse/CALCITE-2280 <
>>> https://issues.apache.org/jira/browse/CALCITE-2280>>
>>>>> 
>>>>>> On May 1, 2018, at 11:16 PM, Shuyi Chen <[email protected]> wrote:
>>>>>> 
>>>>>> Hi Anton, thanks a lot for the great questions.
>>>>>> 
>>>>>> Yes, SqlDataTypeSpec currently only support creating simple SQL
>> types,
>>> no
>>>>>> row/array/map is supported.
>>>>>> 
>>>>>> CALCITE-2045 adds support for defining custom either simple or row
>>> types
>>>>>> through the type DDL, and you should be able to use the UDT in your
>>> Table
>>>>>> DDL for complex row type. I think this should be close to what you
>>> want.
>>>>>> 
>>>>>> You can extend current type DDL in its current form in BEAM parser
>> and
>>>>> add
>>>>>> support for map and array type, or modify the grammar to tailor your
>>> need
>>>>>> to make it BigQuery compatible. All the required change for
>> supporting
>>>>> UDT
>>>>>> in calcite-core should be already done by CALCITE-2045.
>>>>>> 
>>>>>> As for the big query syntax, I am not sure if it's a good idea to
>> adopt
>>>>> it
>>>>>> in core parser unless there is no SQL equivalent, but if you
>> implement
>>> it
>>>>>> in your extended BEAM parser, it's up to you and that's by design of
>>>>>> Calcite DDL.
>>>>>> 
>>>>>> Let me know if it helps.
>>>>>> 
>>>>>> Thanks
>>>>>> Shuyi
>>>>>> 
>>>>>> On Tue, May 1, 2018 at 3:21 PM, Anton Kedin <[email protected]
>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> We want add support for non-primitive types (ROW, ARRAY, MAP) to
>>> Apache
>>>>>>> Beam SQL DDL (based on Calcite DDL extensions). What would be the
>> best
>>>>> way
>>>>>>> to approach this?
>>>>>>> 
>>>>>>> *Our Use Case:*
>>>>>>> We want to be able to use DDL to define data sources and sinks for
>>> Beam
>>>>>>> pipelines, so that users don't have to wrap SQL into custom code
>> which
>>>>>>> configures sources/sinks.
>>>>>>> 
>>>>>>> *What we have already:*
>>>>>>> We have a customized CREATE TABLE statement which allows users to
>>>>> specify
>>>>>>> the type of the data source, its schema, and data location. The
>>>>>>> implmentation is based on Calcite DDL extensions.
>>>>>>> 
>>>>>>> *What we're missing:*
>>>>>>> We need to be able to define schemas with non-primitive types, e.g.
>>>>>>> arrays or rows, so that we can correctly describe data sources and
>>> sinks
>>>>>>> which supports such types. For example if we want to manipulate data
>>> in
>>>>> a
>>>>>>> stream of JSON objects, we want to be able to describe the JSON
>>> contents
>>>>>>> somehow, including arrays or nested objects. Or we would need
>> similar
>>>>> types
>>>>>>> to interact with BigQuery which supports arrays and nested struct
>>> types.
>>>>>>> 
>>>>>>> *Problem:*
>>>>>>> I tried to check if it is possible to extend the parser using the
>>>>>>> config.fmpp approach, so that we can hook into the Parser.TypeName()
>>>>>>> <https://github.com/apache/calcite/blob/
>>> a5d520df76602d25ed66627f08f5e0
>>>>>>> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4439>
>>>>>>> method and parse the complex types ourselves. But Parser.DataType()
>>>>>>> <https://github.com/apache/calcite/blob/
>>> a5d520df76602d25ed66627f08f5e0
>>>>>>> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4377>
>>>>>>> creates
>>>>>>> SqlDataTypeSpec only in two specific ways, without ability to extend
>>>>> it, so
>>>>>>> even if we parse the typename ourselves, we would not be able to
>>>>> construct
>>>>>>> the SqlDataTypeSpec in a way that supports arrays/rows. But even if
>> we
>>>>>>> could, looking at SqlDataTypeSpec
>>>>>>> <https://github.com/apache/calcite/blob/
>>> 09be7e74a6a4d1b1c4f640c8e69b5e
>>>>>>> bdd467d811/core/src/main/java/org/apache/calcite/sql/
>>>>>>> SqlDataTypeSpec.java#L327>
>>>>>>> it seems that it does not support creating arrays or rows as well:
>> it
>>>>> calls
>>>>>>> typeFactory.createSqlType(typename)
>>>>>>> <https://github.com/apache/calcite/blob/
>>> 09be7e74a6a4d1b1c4f640c8e69b5e
>>>>>>> bdd467d811/core/src/main/java/org/apache/calcite/sql/
>>>>>>> SqlDataTypeSpec.java#L350>
>>>>>>> which
>>>>>>> only
>>>>>>> <https://github.com/apache/calcite/blob/
>>> f47465236b7650f2280092b708fa39
>>>>>>> 062fe79ffd/core/src/main/java/org/apache/calcite/sql/type/
>>>>>>> SqlTypeFactoryImpl.java#L49>
>>>>>>> creates basic types in this call.
>>>>>>> 
>>>>>>> *Path forward:*
>>>>>>> It the above is correct, then it appears that we would need to patch
>>>>>>> Calcite in couple of places to support arrays, rows, and maps in
>> DDL:
>>>>>>>  - update Parser.jj to support parsing the type definitions for the
>>>>>>> required types and constructing SqlDataTypeSpec correctly for those
>>>>> cases;
>>>>>>>  - update SqlDataTypeSpec.java to handle complex types and invoke
>>>>>>> correct typeFactory interfaces;
>>>>>>> 
>>>>>>> *Questions:*
>>>>>>> - does the above sound sane/correct?
>>>>>>> - is there a similar work already tracked in Calcite somewhere? I
>> saw
>>>>>>> something mentioned in CALCITE-2045
>>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2045?
>>>>>>> focusedCommentId=16351203&page=com.atlassian.jira.
>>>>>>> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16351203>,
>>>>>>> but didn't see any tracking Jiras specifically for this work yet;
>>>>>>> - is there a known/recommended/working syntax for such DDL? If there
>>> is
>>>>>>> none, then would it make sense to adopt something similar to
>> BigQuery
>>>>>>> STRUCT/ARRAY
>>>>>>> definition <https://cloud.google.com/bigquery/docs/data-definition-
>>>>>>> language>
>>>>>>> ?
>>>>>>> 
>>>>>>> Thank you,
>>>>>>> Anton
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> "So you have to trust that the dots will somehow connect in your
>>> future."
>>> 
>>> 
>> 
>> 
>> --
>> "So you have to trust that the dots will somehow connect in your future."
>>

Re: Complex Types Support in DDL

Reply via email to