[GitHub] [iceberg] jackye1995 opened a new pull request #2365: Spark: SQL extention to update partition field atomically

GitBox Tue, 23 Mar 2021 20:44:01 -0700


jackye1995 opened a new pull request #2365:
URL: https://github.com/apache/iceberg/pull/2365



   I received some feedback from users about the current Spark SQL extension 
not able to directly update partition field. Currently it has to first drop and 
then add the new field, which (1) is not straight-forward for the common use 
case that updates the granularity of timestamp or bucket transform, (2) creates 
a time period between 2 commits that is not locked and might cause writer to 
write data with a wrong partition spec.
   
   This PR introduces the syntax of `ALTER TABLE table CHANGE PARTITION FIELD 
transform TO transform` that drops the old transform and adds the new transform 
in a single commit to solve the issue above.
   
   There is no similar syntax as reference in other systems, Delta lake took 
the route of directly adding or dropping the entire partition spec so I could 
not use that as a basis. I chose the current syntax based on the following 
reasons:
   1. keyword `CHANGE` is chosen based on the Hive syntax of `CHANGE COLUMN col 
...`, I think we might be able to reuse this keyword in the future for column 
DDL extensions.
   2. keyword `TO` is chosen to be consistent with a similar syntax for `RENAME 
col TO col`
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] jackye1995 opened a new pull request #2365: Spark: SQL extention to update partition field atomically

Reply via email to