aokolnychyi commented on pull request #2354:
URL: https://github.com/apache/iceberg/pull/2354#issuecomment-811571178


   I support the idea of a row identifier as long as Iceberg does not enforce 
it. I see its primary usage in `UPSERT` statements where we don't know the 
upsert columns unless they are provided in the command.
   
   I also think it is important not to limit equality deletes to row identifier 
alone, which is currently handled by the spec as each delete file is associated 
with arbitrary column ids. We plan to leverage it in some MERGE INTO use cases, 
where the we can derive the delete column from the ON clause and merge columns 
can vary from operation to operation.
   
   W.r.t. versioning, I'd go simple. I think the current rollback semantics 
applies only to snapshots. We don't revert table properties or sort order. I 
believe we should treat row identifiers in the same way.
   
   That said, @openinx's use case is also valid. I have seen scenarios when 
users want to rollback the table state completely rather the current snapshot. 
I think that should be done by replacing the current pointer in the catalog to 
an old JSON file rather than by calling the table rollback API. Do we want to 
expose ways for rolling back table state to the users? I think that may be 
handy and should cover the use case that @openinx brought up.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to