[jira] [Updated] (IMPALA-10732) Use consistent DDL for specifying Iceberg partitions

Jira Thu, 10 Jun 2021 09:53:04 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Zoltán Borók-Nagy updated IMPALA-10732:
---------------------------------------
    Description: 
Currently we have a DDL syntax for defining Iceberg partitions that differs 
from SparkSQL:
 [https://iceberg.apache.org/spark-ddl/#partitioned-by]
  
 E.g. Impala is using the following syntax:
  
 CREATE TABLE ice_t (i int, s string, ts timestamp, d date)

*PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)*

STORED AS ICEBERG;
 The same in Spark is:
 CREATE TABLE ice_t (i int, s string, ts timestamp, d date)

USING ICEBERG

*PARTITIONED BY (bucket(5, i), months(ts), years(d))*
  
 Impala's syntax is older but hasn't been released yet. Spark's syntax is 
released so it cannot be changed.
  
 Hive is also working on DDL support for Iceberg partitions, and they are 
favoring the released SparkSQL syntax. See HIVE-25179
  
 After dicsussing it on dev@impala we decided to use SparkSQL's syntax.

  was:
Currently we have a DDL syntax for defining Iceberg partitions that differs 
from SparkSQL:
[https://iceberg.apache.org/spark-ddl/#partitioned-by]
 
E.g. Impala is using the following syntax:
 
CREATE TABLE ice_t (i int, s string, ts timestamp, d date)

*PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)*

STORED AS ICEBERG;
The same in Spark is:
CREATE TABLE ice_t (i int, s string, ts timestamp, d date)

USING ICEBERG

*PARTITIONED BY (bucket(5, i), months(ts), years(d))*
 
Impala's syntax is older but hasn't been released yet. Spark's syntax is 
released so it cannot be changed.
 
Hive is also working on DDL support for Iceberg partitions, and they are 
favoring the released SparkSQL syntax.
 
After dicsussing it on dev@impala we decided to use SparkSQL's syntax.


> Use consistent DDL for specifying Iceberg partitions
> ----------------------------------------------------
>
>                 Key: IMPALA-10732
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10732
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>
> Currently we have a DDL syntax for defining Iceberg partitions that differs 
> from SparkSQL:
>  [https://iceberg.apache.org/spark-ddl/#partitioned-by]
>   
>  E.g. Impala is using the following syntax:
>   
>  CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
> *PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)*
> STORED AS ICEBERG;
>  The same in Spark is:
>  CREATE TABLE ice_t (i int, s string, ts timestamp, d date)
> USING ICEBERG
> *PARTITIONED BY (bucket(5, i), months(ts), years(d))*
>   
>  Impala's syntax is older but hasn't been released yet. Spark's syntax is 
> released so it cannot be changed.
>   
>  Hive is also working on DDL support for Iceberg partitions, and they are 
> favoring the released SparkSQL syntax. See HIVE-25179
>   
>  After dicsussing it on dev@impala we decided to use SparkSQL's syntax.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (IMPALA-10732) Use consistent DDL for specifying Iceberg partitions

Reply via email to