rdblue commented on pull request #3255: URL: https://github.com/apache/iceberg/pull/3255#issuecomment-939527365
@myandpr, I have a few questions about the approach of this PR. First, what is the use case that motivates having separate configuration for this rather than using the existing config for snapshot expiration? I think that snapshot expiration policy and processes that actually perform the expiration are two orthogonal things. If you want to run expiration from Flink, that makes sense. But expiring snapshots differently when running in Flink vs some other method seems incorrect to me. Second, how are you identifying the files to delete? There are two methods, incremental and reachability based. Incremental is no longer our recommended way of cleaning up files because it is really hard to reason about certain cases when snapshots are cherry-picked across branches. It isn't necessarily bad to continue using it here, but we should consider whether we can improve it. For example, I think this only cares about expiring snapshots in the current table state, which is an easier problem. Maybe we can extend the operation to support that more directly and avoid some of the hard cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
