Agreed. Test re-use = specification re-use.
Code re-use = much harder. > On May 2, 2018, at 12:38 PM, Michael Mior <[email protected]> wrote: > > That makes sense to me. I agree that it's probably not very useful to try > to share anything in the parser between calcite-server and calcite-babel > since calcite-babel will always be a moving target. However, given that > calcite-babel is intended to be particularly permissive, it would be great > to have a way to run calcite-server DDL tests against calcite-babel. > > -- > Michael Mior > [email protected] > > > Le mer. 2 mai 2018 à 14:34, Shuyi Chen <[email protected]> a écrit : > >> Yes, that's what's in my mind as well. Server module is kinda of Calcite's >> DDL, people that use Calcite directly can just use server module for their >> DDL purpose. Other SQL dialect have their own DDL, and in order for them to >> leverage Calcite's relational algebra and query planning, the Babel parser >> need to be able to parse both DML and DDL of their own dialect. Would that >> be clear? >> >> On Wed, May 2, 2018 at 11:23 AM, Julian Hyde <[email protected]> wrote: >> >>> The principles are as follows: >>> * Server should expose, as DDL, the concepts in Calcite’s framework, no >>> more, no less. This includes the ability to define a type if supported by >>> Calcite’s type system (RelDataTypeFactory), and the ability to define >>> materialized views and lattices. >>> * Babel should expose anything in a supported SQL dialect (or rather, >>> anything that someone has found time to support). >>> >>> Server’s specification is relatively fixed, whereas Babel’s specification >>> is growing and changing all the time. >>> >>> Julian >>> >>> >>>> On May 2, 2018, at 10:06 AM, Michael Mior <[email protected]> wrote: >>>> >>>> Seems logical to me, although I wonder if there's any way we could >> easily >>>> make the DDL part of the parser modular. At least before going too far >>> down >>>> the road of implementing DDL in Babel, it would be good to set a clear >>>> scope of what will exist in calcite-babel vs. calcite-server. >>>> >>>> -- >>>> Michael Mior >>>> [email protected] <mailto:[email protected]> >>>> >>>> 2018-05-02 12:57 GMT-04:00 Julian Hyde <[email protected] <mailto: >>> [email protected]>>: >>>> >>>>> By the way. We should also figure out how this fits with the project >> to >>>>> create a lenient parser that can handle any dialect of SQL. I am >> calling >>>>> that parser “Babel”[1]. That parser will be able to handle BigQuery >>>>> dialect, among others. >>>>> >>>>> Here’s my current thinking. >>>>> >>>>> I think that Babel should be a new module (a sibling to >> calcite-server, >>>>> calcite-druid etc.) and its parser will extend the core parser. That >>> means >>>>> that calcite-babel will not inherit from the DDL parser in the >>>>> calcite-server module, nor vice versa. We will probably end up with >> two >>>>> parsers that are capable of handling DDL, and two sets of AST classes. >>> But >>>>> I think that is OK, or at least, better than the chaos of trying to >>> reuse >>>>> too much. At least, the parsers will share 99% of their DNA with the >>> core >>>>> parser. And we can easily share tests. >>>>> >>>>> Julian >>>>> >>>>> [1] https://issues.apache.org/jira/browse/CALCITE-2280 < >>>>> https://issues.apache.org/jira/browse/CALCITE-2280 < >>> https://issues.apache.org/jira/browse/CALCITE-2280>> >>>>> >>>>>> On May 1, 2018, at 11:16 PM, Shuyi Chen <[email protected]> wrote: >>>>>> >>>>>> Hi Anton, thanks a lot for the great questions. >>>>>> >>>>>> Yes, SqlDataTypeSpec currently only support creating simple SQL >> types, >>> no >>>>>> row/array/map is supported. >>>>>> >>>>>> CALCITE-2045 adds support for defining custom either simple or row >>> types >>>>>> through the type DDL, and you should be able to use the UDT in your >>> Table >>>>>> DDL for complex row type. I think this should be close to what you >>> want. >>>>>> >>>>>> You can extend current type DDL in its current form in BEAM parser >> and >>>>> add >>>>>> support for map and array type, or modify the grammar to tailor your >>> need >>>>>> to make it BigQuery compatible. All the required change for >> supporting >>>>> UDT >>>>>> in calcite-core should be already done by CALCITE-2045. >>>>>> >>>>>> As for the big query syntax, I am not sure if it's a good idea to >> adopt >>>>> it >>>>>> in core parser unless there is no SQL equivalent, but if you >> implement >>> it >>>>>> in your extended BEAM parser, it's up to you and that's by design of >>>>>> Calcite DDL. >>>>>> >>>>>> Let me know if it helps. >>>>>> >>>>>> Thanks >>>>>> Shuyi >>>>>> >>>>>> On Tue, May 1, 2018 at 3:21 PM, Anton Kedin <[email protected] >>> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> We want add support for non-primitive types (ROW, ARRAY, MAP) to >>> Apache >>>>>>> Beam SQL DDL (based on Calcite DDL extensions). What would be the >> best >>>>> way >>>>>>> to approach this? >>>>>>> >>>>>>> *Our Use Case:* >>>>>>> We want to be able to use DDL to define data sources and sinks for >>> Beam >>>>>>> pipelines, so that users don't have to wrap SQL into custom code >> which >>>>>>> configures sources/sinks. >>>>>>> >>>>>>> *What we have already:* >>>>>>> We have a customized CREATE TABLE statement which allows users to >>>>> specify >>>>>>> the type of the data source, its schema, and data location. The >>>>>>> implmentation is based on Calcite DDL extensions. >>>>>>> >>>>>>> *What we're missing:* >>>>>>> We need to be able to define schemas with non-primitive types, e.g. >>>>>>> arrays or rows, so that we can correctly describe data sources and >>> sinks >>>>>>> which supports such types. For example if we want to manipulate data >>> in >>>>> a >>>>>>> stream of JSON objects, we want to be able to describe the JSON >>> contents >>>>>>> somehow, including arrays or nested objects. Or we would need >> similar >>>>> types >>>>>>> to interact with BigQuery which supports arrays and nested struct >>> types. >>>>>>> >>>>>>> *Problem:* >>>>>>> I tried to check if it is possible to extend the parser using the >>>>>>> config.fmpp approach, so that we can hook into the Parser.TypeName() >>>>>>> <https://github.com/apache/calcite/blob/ >>> a5d520df76602d25ed66627f08f5e0 >>>>>>> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4439> >>>>>>> method and parse the complex types ourselves. But Parser.DataType() >>>>>>> <https://github.com/apache/calcite/blob/ >>> a5d520df76602d25ed66627f08f5e0 >>>>>>> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4377> >>>>>>> creates >>>>>>> SqlDataTypeSpec only in two specific ways, without ability to extend >>>>> it, so >>>>>>> even if we parse the typename ourselves, we would not be able to >>>>> construct >>>>>>> the SqlDataTypeSpec in a way that supports arrays/rows. But even if >> we >>>>>>> could, looking at SqlDataTypeSpec >>>>>>> <https://github.com/apache/calcite/blob/ >>> 09be7e74a6a4d1b1c4f640c8e69b5e >>>>>>> bdd467d811/core/src/main/java/org/apache/calcite/sql/ >>>>>>> SqlDataTypeSpec.java#L327> >>>>>>> it seems that it does not support creating arrays or rows as well: >> it >>>>> calls >>>>>>> typeFactory.createSqlType(typename) >>>>>>> <https://github.com/apache/calcite/blob/ >>> 09be7e74a6a4d1b1c4f640c8e69b5e >>>>>>> bdd467d811/core/src/main/java/org/apache/calcite/sql/ >>>>>>> SqlDataTypeSpec.java#L350> >>>>>>> which >>>>>>> only >>>>>>> <https://github.com/apache/calcite/blob/ >>> f47465236b7650f2280092b708fa39 >>>>>>> 062fe79ffd/core/src/main/java/org/apache/calcite/sql/type/ >>>>>>> SqlTypeFactoryImpl.java#L49> >>>>>>> creates basic types in this call. >>>>>>> >>>>>>> *Path forward:* >>>>>>> It the above is correct, then it appears that we would need to patch >>>>>>> Calcite in couple of places to support arrays, rows, and maps in >> DDL: >>>>>>> - update Parser.jj to support parsing the type definitions for the >>>>>>> required types and constructing SqlDataTypeSpec correctly for those >>>>> cases; >>>>>>> - update SqlDataTypeSpec.java to handle complex types and invoke >>>>>>> correct typeFactory interfaces; >>>>>>> >>>>>>> *Questions:* >>>>>>> - does the above sound sane/correct? >>>>>>> - is there a similar work already tracked in Calcite somewhere? I >> saw >>>>>>> something mentioned in CALCITE-2045 >>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2045? >>>>>>> focusedCommentId=16351203&page=com.atlassian.jira. >>>>>>> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16351203>, >>>>>>> but didn't see any tracking Jiras specifically for this work yet; >>>>>>> - is there a known/recommended/working syntax for such DDL? If there >>> is >>>>>>> none, then would it make sense to adopt something similar to >> BigQuery >>>>>>> STRUCT/ARRAY >>>>>>> definition <https://cloud.google.com/bigquery/docs/data-definition- >>>>>>> language> >>>>>>> ? >>>>>>> >>>>>>> Thank you, >>>>>>> Anton >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> "So you have to trust that the dots will somehow connect in your >>> future." >>> >>> >> >> >> -- >> "So you have to trust that the dots will somehow connect in your future." >>
