Thanks Zoltan for bringing this up. The current Impala syntax for Iceberg: 'partition by spec ..' does seem a little odd although looking at the jira it seems it was inspired by Kudu's partition by syntax.
I think it makes sense to align with the 'partitioned by ..' syntax that Spark is already supporting since these get baked into user workflows. Also, 'month(ts)' seems more intuitive than 'ts month' since the former indicates a function is being applied to the timestamp column. In terms of options, I would vote for #3 considering that 4.0 release has already been in the works and IMO this particular change is not really a blocker. However, it is up to the release manager to decide :) Aman On Wed, Jun 2, 2021 at 8:31 AM Zoltán Borók-Nagy <borokna...@apache.org> wrote: > Hi Team, > > We've been working on Iceberg support in Impala for quite some time. > The status is quite good, Impala master is able to read/write/alter Iceberg > tables (there's still some work to make it production-ready). > > The problem is that currently we have a DDL syntax for defining Iceberg > partitions that differs from SparkSQL: > https://iceberg.apache.org/spark-ddl/#partitioned-by > > E.g. Impala is using the following syntax: > > CREATE TABLE ice_t (i int, s string, ts timestamp, d date) > > *PARTITION BY SPEC (i BUCKET 5, ts MONTH, d YEAR)* > > STORED AS ICEBERG; > > The same in Spark is: > > CREATE TABLE ice_t (i int, s string, ts timestamp, d date) > > USING ICEBERG > > *PARTITIONED BY (bucket(5, i), months(ts), years(d))* > > > Impala's syntax is older but hasn't been released yet. Spark's syntax is > released so it cannot be changed. > > Hive is also working on DDL support for Iceberg partitions, and they are > favoring the released SparkSQL syntax. > > I think it would be favorable if Impala used the same syntax that the other > engines use. > The DDLs won't match exactly as Spark has USING while Impala has STORED AS, > and Hive will have STORED BY ICEBERG. But I think it would be nice if we > could converge as much as we can. > > Given that we want to have a 4.0 release soon I think we have the following > options: > > 1. Keep the current syntax, being inconsistent with other engines > 2. Hold on the 4.0 release and fix the syntax > 3. Marking Iceberg support as experimental and change syntax for 4.1 > > What do you think? If you agree to change the syntax, I will volunteer to > implement it. > > Cheers, > Zoltan >