Fokko commented on issue #2604:
URL: 
https://github.com/apache/iceberg-python/issues/2604#issuecomment-3434101307

   Hey @Anton-Tarazi Thanks for raising this!
   
   I do think cleaning up the snapshots makes sense, but it can be pretty 
expensive. It can be that the files that are referenced are still used by other 
snapshots. Of course, we could clean up the manifest-lists since they are 
unqiue per snapshot. We can do this on a best-effort basis; do the commit, and 
then delete the files.
   
   > (Once https://github.com/apache/iceberg-python/pull/1958 is merged one 
could just call remove_orphan_files after the expire_snapshots and the result 
would be the same, but I think its valuable to have expire_snapshots be 
consistent with the java version).
   
   I think there is also a difference here. If we expire a snapshot, we can 
easily list all the files that are related to that snapshot using the metadata. 
The `remove_orphan_files` will do a `list` operation on the object-store, which 
can be _pretty_ slow. I think if we want to clean up the data files, we could 
also collect a `Set` of the files that are in the expired snapshots, and 
compare that with the full metadata tree (we can use the metadata tables here).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to