Hey Manu, I think I understand what you're trying to achieve here and I feel like the most important part is to have an updated version of the retention procedure <https://iceberg.apache.org/spec/#snapshot-retention-policy> to clearly state how this interacts with the other settings as part of the PR.
-Dan On Thu, Jan 16, 2025 at 8:37 PM Yufei Gu <flyrain...@gmail.com> wrote: > It makes sense to have an option to control the max number of snapshots. > Thanks Manu for the proposal. > > Yufei > > > On Thu, Jan 16, 2025 at 7:46 PM Manu Zhang <owenzhang1...@gmail.com> > wrote: > >> Hi all, >> >> Do you have more comments on this feature? Do you have concerns about >> adding a new field to SnapshotRef? >> >> Thanks, >> Manu >> >> On Tue, Jan 7, 2025 at 2:37 PM Manu Zhang <owenzhang1...@gmail.com> >> wrote: >> >>> Hi Ajantha, >>> >>> `history.expire.min-snapshots-to-keep` is the *minimum number of >>> snapshots* we can keep. I'm proposing to decide the *maximum number of >>> snapshots* to keep by count rather than by age. >>> >>> Thanks, >>> Manu >>> >>> On Tue, Jan 7, 2025 at 2:18 PM Ajantha Bhat <ajanthab...@gmail.com> >>> wrote: >>> >>>> Hi Manu, >>>> >>>> We already have `retain_last` and >>>> `history.expire.min-snapshots-to-keep` to retain the snapshots based on >>>> count. Can you please elaborate on why can't we use the same? >>>> >>>> - Ajantha >>>> >>>> On Tue, Jan 7, 2025 at 11:33 AM Walaa Eldin Moustafa < >>>> wa.moust...@gmail.com> wrote: >>>> >>>>> Thanks Manu for starting this discussion. That is definitely a valid >>>>> feature. I have always found maintaining snapshots by day makes it harder >>>>> to provide different types of guarantees/contracts especially when tables >>>>> change rates are diverse or irregular. Maintaining by snapshot count makes >>>>> a lot of sense and prevents table sizes from growing excessively when >>>>> change rate is frequent. >>>>> >>>>> Thanks, >>>>> Walaa. >>>>> >>>>> >>>>> On Mon, Jan 6, 2025 at 8:38 PM Manu Zhang <owenzhang1...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> While maintaining Iceberg tables for our customers, I find it's >>>>>> difficult to set a default snapshot expiration time >>>>>> (`history.expire.max-snapshot-age-ms`) for different workloads. The >>>>>> default >>>>>> value of 5 days looks good for daily batch jobs but is too long for >>>>>> frequently-updated jobs. >>>>>> >>>>>> I'm thinking about adding another option like >>>>>> `history.expire.max-snapshots-to-keep` to keep at most N snapshots. A >>>>>> snapshot will be removed when either its age is larger than >>>>>> `history.expire.max-snapshot-age-ms` or it's the oldest in >>>>>> `history.expire.max-snapshots-to-keep + 1` snapshots. I've created a >>>>>> draft >>>>>> PR to demo the idea[1]. >>>>>> >>>>>> If you agree this is a valid feature request, we also need to update >>>>>> SnapshotRef[2] adding a new field `max-snapshots-to-keep`. Will there be >>>>>> a >>>>>> compatibility issue or too much cost to maintain compatibility? My >>>>>> experiment shows many parsers need to be updated. >>>>>> >>>>>> I'd like to hear your thoughts on this. >>>>>> >>>>>> 1. https://github.com/apache/iceberg/pull/11879 >>>>>> 2. https://iceberg.apache.org/spec/#snapshot-references >>>>>> >>>>>> Happy New Year! >>>>>> Manu >>>>>> >>>>>