amogh-jahagirdar opened a new pull request #4006: URL: https://github.com/apache/iceberg/pull/4006
This is a draft PR for addressing https://github.com/apache/iceberg/issues/3897. It builds on the implementation of references in this PR https://github.com/apache/iceberg/pull/3883 . While we get clarity on what the APIs should look like, wanted to start a draft PR on the retention logic based on https://docs.google.com/document/d/1PvxK_0ebEoX3s7nS6-LOJJZdBYr_olTWH9oepNUfJ-A/edit#. Some aspects I want to get feedback on: 1.) Currently, for the global expiration age, we set the expiration age in the operation based on a timestamp obtained in the constructor (this can be overridden in a setter as well). However in the calculation in the draft we are using timestamps closer to the time of calculating what should be retained. So we should certainly define a consistent time for comparisons. 2.) For the retention policy evaluation, we identify what are the snapshots to retain based on branch policies such as minSnapshots and max age for the branch and the global max age. After this it could be possible that we have not retained the min snapshots for the table level. So we go through the snapshots we would expire, using a heap to identify what are the latest snapshots to retain to reach the global table policy. Those snapshots are removed from expiration, and the remaining set is then passed to the removeSnapshots API. This logic maybe overkill so would like to get feedback on that. 3.) Would also like to get feedback on how global snapshot age should fit in with retention? Should it be possible for global snapshot age to override what's on the branch level (as implemented)? This would force users to really have to think about what's defined at the table level and the branch level to make sure they get the retention experience they desire. There's a lot of code cleanup to do, and will definitley write tests which span multiple commit graph shapes and retention policies as we get clarity on what we want this to look like. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
