Re: Complex Types Support in DDL

Julian Hyde Wed, 02 May 2018 11:24:12 -0700

The principles are as follows:
 * Server should expose, as DDL, the concepts in Calcite’s framework, no more, 
no less. This includes the ability to define a type if supported by Calcite’s 
type system (RelDataTypeFactory), and the ability to define materialized views 
and lattices.
 * Babel should expose anything in a supported SQL dialect (or rather, anything 
that someone has found time to support).


Server’s specification is relatively fixed, whereas Babel’s specification is 
growing and changing all the time. 

Julian


> On May 2, 2018, at 10:06 AM, Michael Mior <[email protected]> wrote:
> 
> Seems logical to me, although I wonder if there's any way we could easily
> make the DDL part of the parser modular. At least before going too far down
> the road of implementing DDL in Babel, it would be good to set a clear
> scope of what will exist in calcite-babel vs. calcite-server.
> 
> --
> Michael Mior
> [email protected] <mailto:[email protected]>
> 
> 2018-05-02 12:57 GMT-04:00 Julian Hyde <[email protected] 
> <mailto:[email protected]>>:
> 
>> By the way. We should also figure out how this fits with the project to
>> create a lenient parser that can handle any dialect of SQL. I am calling
>> that parser “Babel”[1]. That parser will be able to handle BigQuery
>> dialect, among others.
>> 
>> Here’s my current thinking.
>> 
>> I think that Babel should be a new module (a sibling to calcite-server,
>> calcite-druid etc.) and its parser will extend the core parser. That means
>> that calcite-babel will not inherit from the DDL parser in the
>> calcite-server module, nor vice versa. We will probably end up with two
>> parsers that are capable of handling DDL, and two sets of AST classes. But
>> I think that is OK, or at least, better than the chaos of trying to reuse
>> too much. At least, the parsers will share 99% of their DNA with the core
>> parser. And we can easily share tests.
>> 
>> Julian
>> 
>> [1] https://issues.apache.org/jira/browse/CALCITE-2280 <
>> https://issues.apache.org/jira/browse/CALCITE-2280 
>> <https://issues.apache.org/jira/browse/CALCITE-2280>>
>> 
>>> On May 1, 2018, at 11:16 PM, Shuyi Chen <[email protected]> wrote:
>>> 
>>> Hi Anton, thanks a lot for the great questions.
>>> 
>>> Yes, SqlDataTypeSpec currently only support creating simple SQL types, no
>>> row/array/map is supported.
>>> 
>>> CALCITE-2045 adds support for defining custom either simple or row types
>>> through the type DDL, and you should be able to use the UDT in your Table
>>> DDL for complex row type. I think this should be close to what you want.
>>> 
>>> You can extend current type DDL in its current form in BEAM parser and
>> add
>>> support for map and array type, or modify the grammar to tailor your need
>>> to make it BigQuery compatible. All the required change for supporting
>> UDT
>>> in calcite-core should be already done by CALCITE-2045.
>>> 
>>> As for the big query syntax, I am not sure if it's a good idea to adopt
>> it
>>> in core parser unless there is no SQL equivalent, but if you implement it
>>> in your extended BEAM parser, it's up to you and that's by design of
>>> Calcite DDL.
>>> 
>>> Let me know if it helps.
>>> 
>>> Thanks
>>> Shuyi
>>> 
>>> On Tue, May 1, 2018 at 3:21 PM, Anton Kedin <[email protected]>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> We want add support for non-primitive types (ROW, ARRAY, MAP) to Apache
>>>> Beam SQL DDL (based on Calcite DDL extensions). What would be the best
>> way
>>>> to approach this?
>>>> 
>>>> *Our Use Case:*
>>>> We want to be able to use DDL to define data sources and sinks for Beam
>>>> pipelines, so that users don't have to wrap SQL into custom code which
>>>> configures sources/sinks.
>>>> 
>>>> *What we have already:*
>>>> We have a customized CREATE TABLE statement which allows users to
>> specify
>>>> the type of the data source, its schema, and data location. The
>>>> implmentation is based on Calcite DDL extensions.
>>>> 
>>>> *What we're missing:*
>>>> We need to be able to define schemas with non-primitive types, e.g.
>>>> arrays or rows, so that we can correctly describe data sources and sinks
>>>> which supports such types. For example if we want to manipulate data in
>> a
>>>> stream of JSON objects, we want to be able to describe the JSON contents
>>>> somehow, including arrays or nested objects. Or we would need similar
>> types
>>>> to interact with BigQuery which supports arrays and nested struct types.
>>>> 
>>>> *Problem:*
>>>> I tried to check if it is possible to extend the parser using the
>>>> config.fmpp approach, so that we can hook into the Parser.TypeName()
>>>> <https://github.com/apache/calcite/blob/a5d520df76602d25ed66627f08f5e0
>>>> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4439>
>>>> method and parse the complex types ourselves. But Parser.DataType()
>>>> <https://github.com/apache/calcite/blob/a5d520df76602d25ed66627f08f5e0
>>>> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4377>
>>>> creates
>>>> SqlDataTypeSpec only in two specific ways, without ability to extend
>> it, so
>>>> even if we parse the typename ourselves, we would not be able to
>> construct
>>>> the SqlDataTypeSpec in a way that supports arrays/rows. But even if we
>>>> could, looking at SqlDataTypeSpec
>>>> <https://github.com/apache/calcite/blob/09be7e74a6a4d1b1c4f640c8e69b5e
>>>> bdd467d811/core/src/main/java/org/apache/calcite/sql/
>>>> SqlDataTypeSpec.java#L327>
>>>> it seems that it does not support creating arrays or rows as well: it
>> calls
>>>> typeFactory.createSqlType(typename)
>>>> <https://github.com/apache/calcite/blob/09be7e74a6a4d1b1c4f640c8e69b5e
>>>> bdd467d811/core/src/main/java/org/apache/calcite/sql/
>>>> SqlDataTypeSpec.java#L350>
>>>> which
>>>> only
>>>> <https://github.com/apache/calcite/blob/f47465236b7650f2280092b708fa39
>>>> 062fe79ffd/core/src/main/java/org/apache/calcite/sql/type/
>>>> SqlTypeFactoryImpl.java#L49>
>>>> creates basic types in this call.
>>>> 
>>>> *Path forward:*
>>>> It the above is correct, then it appears that we would need to patch
>>>> Calcite in couple of places to support arrays, rows, and maps in DDL:
>>>>   - update Parser.jj to support parsing the type definitions for the
>>>> required types and constructing SqlDataTypeSpec correctly for those
>> cases;
>>>>   - update SqlDataTypeSpec.java to handle complex types and invoke
>>>> correct typeFactory interfaces;
>>>> 
>>>> *Questions:*
>>>> - does the above sound sane/correct?
>>>> - is there a similar work already tracked in Calcite somewhere? I saw
>>>> something mentioned in CALCITE-2045
>>>> <https://issues.apache.org/jira/browse/CALCITE-2045?
>>>> focusedCommentId=16351203&page=com.atlassian.jira.
>>>> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16351203>,
>>>> but didn't see any tracking Jiras specifically for this work yet;
>>>> - is there a known/recommended/working syntax for such DDL? If there is
>>>> none, then would it make sense to adopt something similar to BigQuery
>>>> STRUCT/ARRAY
>>>> definition <https://cloud.google.com/bigquery/docs/data-definition-
>>>> language>
>>>> ?
>>>> 
>>>> Thank you,
>>>> Anton
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> "So you have to trust that the dots will somehow connect in your future."

Re: Complex Types Support in DDL

Reply via email to