szehon-ho commented on PR #4674: URL: https://github.com/apache/iceberg/pull/4674#issuecomment-1121825416
> c) I have some more plans for improving the expire_snaphots (different idea than https://github.com/apache/iceberg/pull/3457), I will work in the subsequent PR. > Idea is that add a new column "from_snapshot_id" while preparing the actual files, then filter out (NOT IN filter) the expired snapshot ids rows from persisted output (without scanning again) and then same logic of df.except() to find the expired files. The problem I think is that there's not many utilities to project anything other than partition filter to do the filtering . I spent some time to look again, and tried to use time-travel which is effectively snapshot-filtering in https://github.com/apache/iceberg/pull/4736 but unfortunately it didn't work as manifest table does not support it. You can take a look if that also makes sense. Anyway look forward to working together on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
