rdblue commented on pull request #3255:
URL: https://github.com/apache/iceberg/pull/3255#issuecomment-939527365


   @myandpr, I have a few questions about the approach of this PR. First, what 
is the use case that motivates having separate configuration for this rather 
than using the existing config for snapshot expiration? I think that snapshot 
expiration policy and processes that actually perform the expiration are two 
orthogonal things. If you want to run expiration from Flink, that makes sense. 
But expiring snapshots differently when running in Flink vs some other method 
seems incorrect to me.
   
   Second, how are you identifying the files to delete? There are two methods, 
incremental and reachability based. Incremental is no longer our recommended 
way of cleaning up files because it is really hard to reason about certain 
cases when snapshots are cherry-picked across branches. It isn't necessarily 
bad to continue using it here, but we should consider whether we can improve 
it. For example, I think this only cares about expiring snapshots in the 
current table state, which is an easier problem. Maybe we can extend the 
operation to support that more directly and avoid some of the hard cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to