HeartSaVioR edited a comment on issue #1286: URL: https://github.com/apache/iceberg/issues/1286#issuecomment-670816890
At a glance of codebase, looks like applying the changes into current base snapshot is executed per retrial. The way to build manifests/snapshot looks to be always based on the operation, even Iceberg can also leverage the information about the delta of previous base snapshot vs new base snapshot when retrying. We could do differently when snapshots after base snapshots all came from "fast append" operations. (I'm assuming we believe the information of "operation" for the snapshot information. If we have to read through manifest list files & manifest files then probably introduce more latency.) Probably the cheapest approach would be allowing reorder of snapshots - insert the snapshot between base snapshot and the snapshot having base snapshot as parent. As we look to add a new snapshot to only the tail (append), so it is only viable if we are OK with breaking the policy. Alternatively, we can list up manifests written from snapshots after base snapshots, and only add these files to manifest list file (it would be nice if we can simply append, but if not possible, read and merge into new file) in snapshot created in previous trial, and write metadata for the modified snapshot and commit. Does it make sense? If it makes sense for us, I'll try to do with some POC. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
