rdblue commented on a change in pull request #2560:
URL: https://github.com/apache/iceberg/pull/2560#discussion_r653856249
##########
File path:
spark3-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4
##########
@@ -71,6 +71,8 @@ statement
| ALTER TABLE multipartIdentifier DROP PARTITION FIELD transform
#dropPartitionField
| ALTER TABLE multipartIdentifier REPLACE PARTITION FIELD transform WITH
transform (AS name=identifier)? #replacePartitionField
| ALTER TABLE multipartIdentifier WRITE writeSpec
#setWriteDistributionAndOrdering
+ | ALTER TABLE table=multipartIdentifier SET IDENTIFIER_KW FIELDS '('
fields+=multipartIdentifier (',' fields+=multipartIdentifier)* ')'
#setIdentifierFields
Review comment:
My comment was based on similar reasoning for other DDL. For example, it
is better to support both `ADD COLUMNS` and `ADD COLUMN` interchangeably so
that users don't get needless syntax errors when they use a plural or singular
term with the "wrong" number of columns.
I think that the problem extending that to this is that it isn't clear that
`SET IDENTIFIER FIELD x` actually replaces all existing identifier fields,
which I think is why you're mentioning not wanting to use the `ADD IDENTIFIER
FIELD` syntax. I think that if there is confusion now, then there would be more
confusion for users, so maybe it is better to only support `SET IDENTIFIER
FIELDS (...)`. Though I would make the parens optional.
> . . . given that all Hive DDLs are in this batch format
I don't think that we want to take inspiration from Hive very much. Better
to use Postgres or Trino as reference points.
> If we want to have a way to only add new fields without the need to know
current fields, I can also add a ADD IDENTIFIER FIELDS.
This wasn't my intent and I would not want to have confusion, so let's not
allow using `FIELD` interchangeably.
##########
File path:
spark3-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4
##########
@@ -71,6 +71,8 @@ statement
| ALTER TABLE multipartIdentifier DROP PARTITION FIELD transform
#dropPartitionField
| ALTER TABLE multipartIdentifier REPLACE PARTITION FIELD transform WITH
transform (AS name=identifier)? #replacePartitionField
| ALTER TABLE multipartIdentifier WRITE writeSpec
#setWriteDistributionAndOrdering
+ | ALTER TABLE table=multipartIdentifier SET IDENTIFIER_KW FIELDS '('
fields+=multipartIdentifier (',' fields+=multipartIdentifier)* ')'
#setIdentifierFields
Review comment:
Yeah, I would make it possible to use that syntax. No need for users to
remember that they need parens when we don't actually need them to parse
correctly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]