The principles are as follows: * Server should expose, as DDL, the concepts in Calcite’s framework, no more, no less. This includes the ability to define a type if supported by Calcite’s type system (RelDataTypeFactory), and the ability to define materialized views and lattices. * Babel should expose anything in a supported SQL dialect (or rather, anything that someone has found time to support).
Server’s specification is relatively fixed, whereas Babel’s specification is growing and changing all the time. Julian > On May 2, 2018, at 10:06 AM, Michael Mior <[email protected]> wrote: > > Seems logical to me, although I wonder if there's any way we could easily > make the DDL part of the parser modular. At least before going too far down > the road of implementing DDL in Babel, it would be good to set a clear > scope of what will exist in calcite-babel vs. calcite-server. > > -- > Michael Mior > [email protected] <mailto:[email protected]> > > 2018-05-02 12:57 GMT-04:00 Julian Hyde <[email protected] > <mailto:[email protected]>>: > >> By the way. We should also figure out how this fits with the project to >> create a lenient parser that can handle any dialect of SQL. I am calling >> that parser “Babel”[1]. That parser will be able to handle BigQuery >> dialect, among others. >> >> Here’s my current thinking. >> >> I think that Babel should be a new module (a sibling to calcite-server, >> calcite-druid etc.) and its parser will extend the core parser. That means >> that calcite-babel will not inherit from the DDL parser in the >> calcite-server module, nor vice versa. We will probably end up with two >> parsers that are capable of handling DDL, and two sets of AST classes. But >> I think that is OK, or at least, better than the chaos of trying to reuse >> too much. At least, the parsers will share 99% of their DNA with the core >> parser. And we can easily share tests. >> >> Julian >> >> [1] https://issues.apache.org/jira/browse/CALCITE-2280 < >> https://issues.apache.org/jira/browse/CALCITE-2280 >> <https://issues.apache.org/jira/browse/CALCITE-2280>> >> >>> On May 1, 2018, at 11:16 PM, Shuyi Chen <[email protected]> wrote: >>> >>> Hi Anton, thanks a lot for the great questions. >>> >>> Yes, SqlDataTypeSpec currently only support creating simple SQL types, no >>> row/array/map is supported. >>> >>> CALCITE-2045 adds support for defining custom either simple or row types >>> through the type DDL, and you should be able to use the UDT in your Table >>> DDL for complex row type. I think this should be close to what you want. >>> >>> You can extend current type DDL in its current form in BEAM parser and >> add >>> support for map and array type, or modify the grammar to tailor your need >>> to make it BigQuery compatible. All the required change for supporting >> UDT >>> in calcite-core should be already done by CALCITE-2045. >>> >>> As for the big query syntax, I am not sure if it's a good idea to adopt >> it >>> in core parser unless there is no SQL equivalent, but if you implement it >>> in your extended BEAM parser, it's up to you and that's by design of >>> Calcite DDL. >>> >>> Let me know if it helps. >>> >>> Thanks >>> Shuyi >>> >>> On Tue, May 1, 2018 at 3:21 PM, Anton Kedin <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> We want add support for non-primitive types (ROW, ARRAY, MAP) to Apache >>>> Beam SQL DDL (based on Calcite DDL extensions). What would be the best >> way >>>> to approach this? >>>> >>>> *Our Use Case:* >>>> We want to be able to use DDL to define data sources and sinks for Beam >>>> pipelines, so that users don't have to wrap SQL into custom code which >>>> configures sources/sinks. >>>> >>>> *What we have already:* >>>> We have a customized CREATE TABLE statement which allows users to >> specify >>>> the type of the data source, its schema, and data location. The >>>> implmentation is based on Calcite DDL extensions. >>>> >>>> *What we're missing:* >>>> We need to be able to define schemas with non-primitive types, e.g. >>>> arrays or rows, so that we can correctly describe data sources and sinks >>>> which supports such types. For example if we want to manipulate data in >> a >>>> stream of JSON objects, we want to be able to describe the JSON contents >>>> somehow, including arrays or nested objects. Or we would need similar >> types >>>> to interact with BigQuery which supports arrays and nested struct types. >>>> >>>> *Problem:* >>>> I tried to check if it is possible to extend the parser using the >>>> config.fmpp approach, so that we can hook into the Parser.TypeName() >>>> <https://github.com/apache/calcite/blob/a5d520df76602d25ed66627f08f5e0 >>>> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4439> >>>> method and parse the complex types ourselves. But Parser.DataType() >>>> <https://github.com/apache/calcite/blob/a5d520df76602d25ed66627f08f5e0 >>>> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4377> >>>> creates >>>> SqlDataTypeSpec only in two specific ways, without ability to extend >> it, so >>>> even if we parse the typename ourselves, we would not be able to >> construct >>>> the SqlDataTypeSpec in a way that supports arrays/rows. But even if we >>>> could, looking at SqlDataTypeSpec >>>> <https://github.com/apache/calcite/blob/09be7e74a6a4d1b1c4f640c8e69b5e >>>> bdd467d811/core/src/main/java/org/apache/calcite/sql/ >>>> SqlDataTypeSpec.java#L327> >>>> it seems that it does not support creating arrays or rows as well: it >> calls >>>> typeFactory.createSqlType(typename) >>>> <https://github.com/apache/calcite/blob/09be7e74a6a4d1b1c4f640c8e69b5e >>>> bdd467d811/core/src/main/java/org/apache/calcite/sql/ >>>> SqlDataTypeSpec.java#L350> >>>> which >>>> only >>>> <https://github.com/apache/calcite/blob/f47465236b7650f2280092b708fa39 >>>> 062fe79ffd/core/src/main/java/org/apache/calcite/sql/type/ >>>> SqlTypeFactoryImpl.java#L49> >>>> creates basic types in this call. >>>> >>>> *Path forward:* >>>> It the above is correct, then it appears that we would need to patch >>>> Calcite in couple of places to support arrays, rows, and maps in DDL: >>>> - update Parser.jj to support parsing the type definitions for the >>>> required types and constructing SqlDataTypeSpec correctly for those >> cases; >>>> - update SqlDataTypeSpec.java to handle complex types and invoke >>>> correct typeFactory interfaces; >>>> >>>> *Questions:* >>>> - does the above sound sane/correct? >>>> - is there a similar work already tracked in Calcite somewhere? I saw >>>> something mentioned in CALCITE-2045 >>>> <https://issues.apache.org/jira/browse/CALCITE-2045? >>>> focusedCommentId=16351203&page=com.atlassian.jira. >>>> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16351203>, >>>> but didn't see any tracking Jiras specifically for this work yet; >>>> - is there a known/recommended/working syntax for such DDL? If there is >>>> none, then would it make sense to adopt something similar to BigQuery >>>> STRUCT/ARRAY >>>> definition <https://cloud.google.com/bigquery/docs/data-definition- >>>> language> >>>> ? >>>> >>>> Thank you, >>>> Anton >>>> >>> >>> >>> >>> -- >>> "So you have to trust that the dots will somehow connect in your future."
