amogh-jahagirdar opened a new pull request #4006:
URL: https://github.com/apache/iceberg/pull/4006


   This is a draft PR for addressing 
https://github.com/apache/iceberg/issues/3897. It builds on the implementation 
of references in this PR https://github.com/apache/iceberg/pull/3883 . While we 
get clarity on what the APIs should look like, wanted to start a draft PR on 
the retention logic based on 
https://docs.google.com/document/d/1PvxK_0ebEoX3s7nS6-LOJJZdBYr_olTWH9oepNUfJ-A/edit#.
   
   Some aspects I want to get feedback on: 
   
   1.) Currently, for the global expiration age, we set the expiration age in 
the operation based on a timestamp obtained in the constructor (this can be 
overridden in a setter as well). However in the calculation in the draft we are 
using timestamps closer to the time of calculating what should be retained. So 
we should certainly define a consistent time for comparisons.
   
   2.) For the retention policy evaluation, we identify what are the snapshots 
to retain based on branch policies such as minSnapshots and max age for the 
branch and the global max age. After this it could be possible that we have not 
retained the min snapshots for the table level. So we go through the snapshots 
we would expire,  using a heap to identify what are the latest snapshots to 
retain to reach the global table policy. Those snapshots are removed from 
expiration, and the remaining set is then passed to the removeSnapshots API. 
This logic maybe overkill so would like to get feedback on that. 
   
   
   3.) Would also like to get feedback on how global snapshot age should fit in 
with retention? Should it be possible for global snapshot age to override 
what's on the branch level (as implemented)? This would force users to really 
have to think about what's defined at the table level and the branch level to 
make sure they get the retention experience they desire.
   
   There's a lot of code cleanup to do, and will definitley write tests which 
span multiple commit graph shapes and retention policies as we get clarity on 
what we want this to look like.
   
   
   Thank you!  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to