That is a really nice comparison. Thanks! It makes sense for Beam to take
more responsibility for the specifics of DDL parsing so we can implement
the support we need. I wonder if - once we have something working well - we
could push it back into Calcite or some subproject. I know their position
on DDL is to avoid the issue, but it seems there's some new interest in
looking at ways to support a collection of dialects.

Kenn

On Fri, May 4, 2018 at 11:23 AM Anton Kedin <[email protected]> wrote:

> Hi,
>
> I am working on adding support for non-primitive types in Beam SQL DDL.
>
> *Goal*
> Allow users to define tables with Rows, Arrays, Maps as field types in
> DDL. This enables defining schemas for complex sources, e.g. describing
> JSON sources or other sources which support complex field types (BQ, etc).
>
> *Solution*
> Extend the parser we have in Beam SQLto accept the following DDL statement:
> "CREATE TABLE tableName (field_name <COMPLEX_FIELD_TYPE>)" where
> "<COMPLEX_FIELD_TYPE>" can be any the following:
>
>    - "primitiveType ARRAY", for example, "field_int_arr" INTEGER ARRAY".
>    Thoughts:
>    - this is how SQL standard defines ARRAY field declaration;
>       - existing parser supports similar syntax for collections;
>       - hard to read for nested collections;
>       - similar syntax is supported in Postgres
>       <https://www.postgresql.org/docs/9.1/static/arrays.html>;
>    - "ARRAY<type>", for example "field_matrix ARRAY<ARRAY<INTEGER>>".
>    Thoughts:
>    - easy to read and support arbitrary nesting;
>       - similar syntax is implemented in:
>          - BigQuery
>          
> <https://cloud.google.com/bigquery/docs/data-definition-language#column_name_and_column_schema>
>          ;
>          - Spanner
>          
> <https://cloud.google.com/spanner/docs/data-definition-language#arrays>
>          ;
>          - KSQL
>          
> <https://docs.confluent.io/current/ksql/docs/syntax-reference.html#create-table>
>          ;
>          - Spark/Hive
>          
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable>
>          ;
>       - "MAP<primitiveType, type>", for example "MAP<VARCHAR,
>    MAP<INTEGER, VARCHAR>>". Thoughts:
>    - there doesn't seem to be a SQL standard support for maps;
>       - looks similar to the "ARRAY<type>" definition;
>       - similar syntax is implemented in:
>          - KSQL
>          
> <https://docs.confluent.io/current/ksql/docs/syntax-reference.html#create-table>
>          ;
>          - Spark/Hive
>          
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable>
>          ;
>       - "ROW(fieldList)", for example "row_field ROW(f_int INTEGER, f_str
>    VARCHAR)". Thoughts:
>    - SQL standard defines the syntax this way;
>       - don't know where similar syntax is implemented;
>    - "ROW<fieldList>", for example "row_field ROW<f_int INTEGER, f_str
>    VARCHAR>". Thoughts:
>    - ROW is not supported in a lot of dialects, but STRUCT is similar and
>       supported in few dialects;
>       - similar syntax for STRUCT is implemented in:
>          - BigQuery
>          <https://cloud.google.com/bigquery/docs/data-definition-language>
>          ;
>          - Spark/Hive
>          
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable>
>          ;
>
> Questions/comments?
> Pull Request <https://github.com/apache/beam/pull/5276>
>
> Thank you,
> Anton
>
>
>

Reply via email to