That is a really nice comparison. Thanks! It makes sense for Beam to take more responsibility for the specifics of DDL parsing so we can implement the support we need. I wonder if - once we have something working well - we could push it back into Calcite or some subproject. I know their position on DDL is to avoid the issue, but it seems there's some new interest in looking at ways to support a collection of dialects.
Kenn On Fri, May 4, 2018 at 11:23 AM Anton Kedin <[email protected]> wrote: > Hi, > > I am working on adding support for non-primitive types in Beam SQL DDL. > > *Goal* > Allow users to define tables with Rows, Arrays, Maps as field types in > DDL. This enables defining schemas for complex sources, e.g. describing > JSON sources or other sources which support complex field types (BQ, etc). > > *Solution* > Extend the parser we have in Beam SQLto accept the following DDL statement: > "CREATE TABLE tableName (field_name <COMPLEX_FIELD_TYPE>)" where > "<COMPLEX_FIELD_TYPE>" can be any the following: > > - "primitiveType ARRAY", for example, "field_int_arr" INTEGER ARRAY". > Thoughts: > - this is how SQL standard defines ARRAY field declaration; > - existing parser supports similar syntax for collections; > - hard to read for nested collections; > - similar syntax is supported in Postgres > <https://www.postgresql.org/docs/9.1/static/arrays.html>; > - "ARRAY<type>", for example "field_matrix ARRAY<ARRAY<INTEGER>>". > Thoughts: > - easy to read and support arbitrary nesting; > - similar syntax is implemented in: > - BigQuery > > <https://cloud.google.com/bigquery/docs/data-definition-language#column_name_and_column_schema> > ; > - Spanner > > <https://cloud.google.com/spanner/docs/data-definition-language#arrays> > ; > - KSQL > > <https://docs.confluent.io/current/ksql/docs/syntax-reference.html#create-table> > ; > - Spark/Hive > > <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable> > ; > - "MAP<primitiveType, type>", for example "MAP<VARCHAR, > MAP<INTEGER, VARCHAR>>". Thoughts: > - there doesn't seem to be a SQL standard support for maps; > - looks similar to the "ARRAY<type>" definition; > - similar syntax is implemented in: > - KSQL > > <https://docs.confluent.io/current/ksql/docs/syntax-reference.html#create-table> > ; > - Spark/Hive > > <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable> > ; > - "ROW(fieldList)", for example "row_field ROW(f_int INTEGER, f_str > VARCHAR)". Thoughts: > - SQL standard defines the syntax this way; > - don't know where similar syntax is implemented; > - "ROW<fieldList>", for example "row_field ROW<f_int INTEGER, f_str > VARCHAR>". Thoughts: > - ROW is not supported in a lot of dialects, but STRUCT is similar and > supported in few dialects; > - similar syntax for STRUCT is implemented in: > - BigQuery > <https://cloud.google.com/bigquery/docs/data-definition-language> > ; > - Spark/Hive > > <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable> > ; > > Questions/comments? > Pull Request <https://github.com/apache/beam/pull/5276> > > Thank you, > Anton > > >
