findepi commented on issue #5591:
URL: https://github.com/apache/iceberg/issues/5591#issuecomment-1236858872

   From end-user perspective, there is a difference between
   
   1. querying table state at given snapshot -- at least in Trino, this uses 
the "schema current at that time", so includes columns that have been dropped 
since then
   
   2. query table state after rollback_to_snapshot -- if this uses current 
schema, this doesn't include columns that have been dropped since the snapshot
   
   Now consider example
   
   ```
   -- add new column
   ALTER TABLE orders ADD COLUMN order_timestamp timestamp(6) with time zone;
   -- feel in data for new column
   UPDATE orders SET order_timestamp = CAST(json_value(order_data, 
'$.timestamp') AS timestamp(6) with time zone);
   -- drop the now-redundant column
   ALTER TABLE orders DROP COLUMN order_data;
   
   -- imagine now that comparing this uncovered that `order_data` was encoded 
in a bad way, so we need to roll this all back
   CALL rollback_to_snapshot(.....)
   ```
   
   As a user, i would expect to see `order_data` column back in my table.
   Per this issue, i understand this wouldn't be the case. As a user I would 
call it a data loss (and so a bug).
   
   cc @alexjo2144 @electrum 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to