openinx commented on pull request #2354: URL: https://github.com/apache/iceberg/pull/2354#issuecomment-811609723
Okay, I think everyone has reached a consensus on this issue `Keeping table metadata and data separate (and only versioning data) is the right behavior`. Then let's keep this consensus. @aokolnychyi 's suggestion about `replacing the current pointer in the catalog to an old JSON file rather than by calling the table rollback API.` looks good to me, I think this way we can also achieve the rollback of the table metadata (for now, this priority does not sound that high because people could change table state as they want by calling table API). > I support the idea of a row identifier as long as Iceberg does not enforce it As a common iceberg table specification, the row identifier don't have to be enforced. (I've left a comment [here](https://github.com/apache/iceberg/pull/2010#issuecomment-800769586)). > We plan to leverage it in some MERGE INTO use cases, where the we can derive the delete column from the ON clause and merge columns can vary from operation to operation. I don't know much about this point, I guess you may want to use row identifier to achieve some optimizations at the spark engine level. Can you provide more information? @jackye1995 , I think we could update this PR now, thanks for the great work -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
