[GitHub] [iceberg] rdblue commented on a change in pull request #2560: Spark: add spark extension for identifier fields

GitBox Fri, 18 Jun 2021 14:12:01 -0700


rdblue commented on a change in pull request #2560:
URL: https://github.com/apache/iceberg/pull/2560#discussion_r653856249




##########
File path: 
spark3-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4
##########
@@ -71,6 +71,8 @@ statement
     | ALTER TABLE multipartIdentifier DROP PARTITION FIELD transform           
             #dropPartitionField
     | ALTER TABLE multipartIdentifier REPLACE PARTITION FIELD transform WITH 
transform (AS name=identifier)? #replacePartitionField
     | ALTER TABLE multipartIdentifier WRITE writeSpec                          
             #setWriteDistributionAndOrdering
+    | ALTER TABLE table=multipartIdentifier SET IDENTIFIER_KW FIELDS '(' 
fields+=multipartIdentifier (',' fields+=multipartIdentifier)* ')' 
#setIdentifierFields

Review comment:
       My comment was based on similar reasoning for other DDL. For example, it 
is better to support both `ADD COLUMNS` and `ADD COLUMN` interchangeably so 
that users don't get needless syntax errors when they use a plural or singular 
term with the "wrong" number of columns.
   
   I think that the problem extending that to this is that it isn't clear that 
`SET IDENTIFIER FIELD x` actually replaces all existing identifier fields, 
which I think is why you're mentioning not wanting to use the `ADD IDENTIFIER 
FIELD` syntax. I think that if there is confusion now, then there would be more 
confusion for users, so maybe it is better to only support `SET IDENTIFIER 
FIELDS (...)`. Though I would make the parens optional.
   
   > . . . given that all Hive DDLs are in this batch format
   
   I don't think that we want to take inspiration from Hive very much. Better 
to use Postgres or Trino as reference points.
   
   > If we want to have a way to only add new fields without the need to know 
current fields, I can also add a ADD IDENTIFIER FIELDS.
   
   This wasn't my intent and I would not want to have confusion, so let's not 
allow using `FIELD` interchangeably.

##########
File path: 
spark3-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4
##########
@@ -71,6 +71,8 @@ statement
     | ALTER TABLE multipartIdentifier DROP PARTITION FIELD transform           
             #dropPartitionField
     | ALTER TABLE multipartIdentifier REPLACE PARTITION FIELD transform WITH 
transform (AS name=identifier)? #replacePartitionField
     | ALTER TABLE multipartIdentifier WRITE writeSpec                          
             #setWriteDistributionAndOrdering
+    | ALTER TABLE table=multipartIdentifier SET IDENTIFIER_KW FIELDS '(' 
fields+=multipartIdentifier (',' fields+=multipartIdentifier)* ')' 
#setIdentifierFields

Review comment:
       Yeah, I would make it possible to use that syntax. No need for users to 
remember that they need parens when we don't actually need them to parse 
correctly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #2560: Spark: add spark extension for identifier fields

Reply via email to