szehon-ho commented on PR #4674:
URL: https://github.com/apache/iceberg/pull/4674#issuecomment-1121825416

   > c) I have some more plans for improving the expire_snaphots (different 
idea than https://github.com/apache/iceberg/pull/3457), I will work in the 
subsequent PR.
   
   > Idea is that add a new column "from_snapshot_id" while preparing the 
actual files, then filter out (NOT IN filter) the expired snapshot ids rows 
from persisted output (without scanning again) and then same logic of 
df.except() to find the expired files.
   
   The problem I think is that there's not many utilities to project anything 
other than partition filter to do the filtering .  I spent some time to look 
again, and tried to use time-travel which is effectively snapshot-filtering in  
https://github.com/apache/iceberg/pull/4736 but unfortunately it didn't work as 
manifest table does not support it.  You can take a look if that also makes 
sense.
   
   Anyway look forward to working together on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to