findepi commented on issue #5591: URL: https://github.com/apache/iceberg/issues/5591#issuecomment-1236858872
From end-user perspective, there is a difference between 1. querying table state at given snapshot -- at least in Trino, this uses the "schema current at that time", so includes columns that have been dropped since then 2. query table state after rollback_to_snapshot -- if this uses current schema, this doesn't include columns that have been dropped since the snapshot Now consider example ``` -- add new column ALTER TABLE orders ADD COLUMN order_timestamp timestamp(6) with time zone; -- feel in data for new column UPDATE orders SET order_timestamp = CAST(json_value(order_data, '$.timestamp') AS timestamp(6) with time zone); -- drop the now-redundant column ALTER TABLE orders DROP COLUMN order_data; -- imagine now that comparing this uncovered that `order_data` was encoded in a bad way, so we need to roll this all back CALL rollback_to_snapshot(.....) ``` As a user, i would expect to see `order_data` column back in my table. Per this issue, i understand this wouldn't be the case. As a user I would call it a data loss (and so a bug). cc @alexjo2144 @electrum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
