Hi Szehon, This is a good idea considering the use case it intends to solve. Added few questions and comments in the design doc.
IMO , Alternate options considered specified in the design doc look cleaner to me. I think, it might add to maintenance burden, now that we need to remember to remove these metadata only snapshots. Also I wonder some of the use cases it intends to address, is solvable by metadata alone? - i.e how much data was added in a given time range? - May be to answer these kind of questions user would prefer a to create KPI using columns in the dataset. Regards, Himadri Pal On Tue, Jul 9, 2024 at 11:10 PM Steven Wu <stevenz...@gmail.com> wrote: > I am not totally convinced of the motivation yet. > > I thought the snapshot retention window is primarily meant for time travel > and troubleshooting table changes that happened recently (like a few days > or weeks). > > Is it valuable enough to keep expired snapshots for as long as months or > years? While metadata files are typically smaller than data files in total > size, it can still be significant considering the default amount of column > stats written today (especially for wide tables with many columns). > > How long are we going to keep the expired snapshot references by default? > If it is months/years, it can have major implications on the query > performance of metadata tables (like snapshots, all_*). > > I assume it will also have some performance impact on table loading as a > lot more expired snapshots are still referenced. > > > > > On Tue, Jul 9, 2024 at 6:36 PM Szehon Ho <szehon.apa...@gmail.com> wrote: > >> Thanks Peter and Yufei. >> >> Yes, in terms of implementation, I noted in the doc we need to add error >> checks to prevent time-travel / rollback / cherry-pick operations to >> 'expired' snapshots. I'll make it more clear in the doc, which operations >> we need to check against. >> >> I believe DeleteOrphanFiles may be ok as is, because currently the logic >> walks down the reachable graph and marks those metadata files as >> 'not-orphan', so it should naturally walk these 'expired' snapshots as well. >> >> So, I think the main changes in terms of implementations is going to be >> adding error checks in those Table API's, and updating ExpireSnapshots API. >> >> Do we want to consider expiring snapshots in the middle of the history of >>> the table? >>> >> You mean purging expired snapshots in the middle of the history, right? >> I think the current mechanism for this is 'tagging' and 'branching'. So >> interestingly, I was thinking its related to your other question, and if we >> don't add error-check to 'tagging' and 'branching' on 'expired' snapshot, >> it could be handled just as they are handled today for other snapshots. >> Its one option. We could support it subsequently as well , after the first >> version and if there's some usage of this. >> >> One thing that comes up in this thread and google doc is some question >> about the size of preserved metadata. I had put in the Alternatives >> section, that we could potentially make the ExpireSnapshots purge boolean >> argument more nuanced like PURGE, PRESERVE_REFS (snapshot refs are >> preserved), PRESERVE_METADATA (snapshot refs and all metadata files are >> preserved), though I am still debating if its worth it, as users could >> choose not to use this feature. >> >> Thanks >> Szehon >> >> >> >> On Tue, Jul 9, 2024 at 6:02 PM Yufei Gu <flyrain...@gmail.com> wrote: >> >>> Thank you for the interesting proposal. With a minor specification >>> change, it could indeed enable different retention periods for data files >>> and metadata files. This differentiation is useful for two reasons: >>> >>> 1. More metadata helps us better understand the table history, >>> providing valuable insights. >>> 2. Users often prioritize data file deletion as it frees up >>> significant storage space and removes potentially sensitive data. >>> >>> However, adding a boolean property to the specification isn't >>> necessarily a lightweight solution. As Peter mentioned, implementing this >>> change requires modifications in several places. In this context, external >>> systems like LakeChime or a REST catalog implementation could offer >>> effective solutions to manage extended metadata retention periods, without >>> spec changes. >>> >>> I am neutral on this proposal (+0) and look forward to seeing more input >>> from people. >>> Yufei >>> >>> >>> On Mon, Jul 8, 2024 at 10:32 PM Péter Váry <peter.vary.apa...@gmail.com> >>> wrote: >>> >>>> We need to handle expired snapshots in several places differently in >>>> Iceberg core as well. >>>> - We need to add checks to prevent scans read these snapshots and throw >>>> a meaningful error. >>>> - We need to add checks to prevent tagging/branching these snapshots >>>> - We need to update DeleteOrphanFiles in Spark/Flink to not consider >>>> files only referenced by the expired snapshots >>>> >>>> Some Flink jobs do frequent commits, and in these cases, the size of >>>> the metadata file becomes a constraining factor too. In this case, we could >>>> just tell not to use this feature, and expire the metadata as we do now, >>>> but I thought it's worth to mention. >>>> >>>> Do we want to consider expiring snapshots in the middle of the history >>>> of the table? >>>> When we compact the table, then the compaction commits litter the real >>>> history of the table. Consider the following: >>>> - S1 writes some data >>>> - S2 writes some more data >>>> - S3 compacts the previous 2 commits >>>> - S4 writes even more data >>>> From the query engine user perspective S3 is a commit which does >>>> nothing, not initiated by the user, and most probably they don't even want >>>> to know of. If one can expire a snapshot from the middle of the history, >>>> that would be nice, so users would see only S1/S2/S4. The only downside is >>>> that reading S2 is less performant than reading S3, but IMHO this could be >>>> acceptable for having only user driven changes in the table history. >>>> >>>> >>>> In Mon, Jul 8, 2024, 20:15 Szehon Ho <szehon.apa...@gmail.com> wrote: >>>> >>>>> Thanks for the comments so far. I also thought previously that this >>>>> functionality would be in an external system, like LakeChime, or a custom >>>>> catalog extension. But after doing an initial analysis (please double >>>>> check), I thought it's a small enough change that it would be worth >>>>> putting >>>>> in the Iceberg spec/API's for all users: >>>>> >>>>> - Table Spec, only one optional boolean field (on Snapshot, only >>>>> set if the functionality is used). >>>>> - API, only one boolean parameter (on ExpireSnapshots). >>>>> >>>>> I do wonder, will keeping expired snapshots as is slow down >>>>>> manifest/scan planning though (REST catalog approaches could probably >>>>>> mitigate this)? >>>>>> >>>>> >>>>> I think it should not slow down manifest/scan planning, because we >>>>> plan using the current snapshot (or the one we specify via time travel), >>>>> and we wouldn't read expired snapshots in this case. >>>>> >>>>> Thanks >>>>> Szehon >>>>> >>>>> On Mon, Jul 8, 2024 at 10:54 AM John Greene <jgreene1...@gmail.com> >>>>> wrote: >>>>> >>>>>> I do agree with the need that this proposal solves, to decouple the >>>>>> snapshot history from the data deletion. I do wonder, will keeping >>>>>> expired >>>>>> snapshots as is slow down manifest/scan planning though (REST catalog >>>>>> approaches could probably mitigate this)? >>>>>> >>>>>> On Mon, Jul 8, 2024, 5:34 AM Piotr Findeisen < >>>>>> piotr.findei...@gmail.com> wrote: >>>>>> >>>>>>> Hi Shehon, Walaa >>>>>>> >>>>>>> Thank Shehon for bringing this up. And thank you Walaa for proving >>>>>>> more context from similar existing solution to the problem. >>>>>>> The choices that LakeChime seems to have made -- to keep information >>>>>>> in a separate RDBMS and which particular metadata information to retain >>>>>>> -- >>>>>>> they indeed look as use-case specific, until we observe repeating >>>>>>> patterns. >>>>>>> The idea to bake lifecycle changes into table format spec was >>>>>>> proposed as an alternative to the idea to bake lifecycle changes into >>>>>>> REST >>>>>>> catalog spec. It was brought into discussion based on the intuition that >>>>>>> REST catalog is first-class citizen in Iceberg world, just like other >>>>>>> catalogs, and so solutions to table-centric problems do not need to be >>>>>>> limited to REST catalog. What is the information we retain, how/whether >>>>>>> this is configurable are open question and applicable to both avenues. >>>>>>> >>>>>>> As a 3rd/another alternative, we could focus on REST catalog >>>>>>> *extensions*, without naming snapshot metadata lifecycle, and leave >>>>>>> the problem up to REST's implementors. That would mean Iceberg project >>>>>>> doesn't address snapshot metadata lifecycle changes topic directly, but >>>>>>> instead gives users tools to build solutions around it. At this point I >>>>>>> am >>>>>>> not trying to judge whether it's a good idea or not. Probably depends >>>>>>> how >>>>>>> important it is to solve the problem and have a common solution. >>>>>>> >>>>>>> Best, >>>>>>> Piotr >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, 6 Jul 2024 at 09:46, Walaa Eldin Moustafa < >>>>>>> wa.moust...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Szehon, >>>>>>>> >>>>>>>> Thanks for sharing this proposal. We have thought along the same >>>>>>>> lines and implemented an external system (LakeChime [1]) that retains >>>>>>>> snapshot + partition metadata for longer (actual internal >>>>>>>> implementation >>>>>>>> keeps data for 13 months, but that can be tuned). For efficient >>>>>>>> analysis, >>>>>>>> we have kept this data in an RDBMS. My opinion is this may be a better >>>>>>>> fit >>>>>>>> to an external system (similar to LakeChime) since it could potentially >>>>>>>> complicate the Iceberg spec, APIs, or their implementations. Also, the >>>>>>>> type >>>>>>>> of metadata tracked can differ depending on the use case. For example, >>>>>>>> while LakeChime retains partition and operation type metadata, it does >>>>>>>> not >>>>>>>> track file-level metadata as there was no specific use case for that. >>>>>>>> >>>>>>>> [1] >>>>>>>> https://www.linkedin.com/blog/engineering/data-management/lakechime-a-data-trigger-service-for-modern-data-lakes >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Walaa. >>>>>>>> >>>>>>>> On Fri, Jul 5, 2024 at 11:49 PM Szehon Ho <szehon.apa...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi folks, >>>>>>>>> >>>>>>>>> I would like to discuss an idea for an optional extension of >>>>>>>>> Iceberg's Snapshot metadata lifecycle. Thanks Piotr for replying on >>>>>>>>> the >>>>>>>>> other thread that this should be a fuller Iceberg format change. >>>>>>>>> >>>>>>>>> *Proposal Summary* >>>>>>>>> >>>>>>>>> Currently, ExpireSnapshots(long olderThan) purges metadata and >>>>>>>>> deleted data of a Snapshot together. Purging deleted data often >>>>>>>>> requires a >>>>>>>>> smaller timeline, due to strict requirements to claw back unused disk >>>>>>>>> space, fulfill data lifecycle compliance, etc. In many deployments, >>>>>>>>> this >>>>>>>>> means 'olderThan' timestamp is set to just a few days before the >>>>>>>>> current >>>>>>>>> time (the default is 5 days). >>>>>>>>> >>>>>>>>> On the other hand, purging metadata could be ideally done on a >>>>>>>>> more relaxed timeline, such as months or more, to allow for meaningful >>>>>>>>> historical table analysis. >>>>>>>>> >>>>>>>>> We should have an optional way to purge Snapshot metadata >>>>>>>>> separately from purging deleted data. This would allow us to get >>>>>>>>> history >>>>>>>>> of the table, and answer questions like: >>>>>>>>> >>>>>>>>> - When was a file/partition added >>>>>>>>> - When was a file/partition deleted >>>>>>>>> - How much data was added or removed in time X >>>>>>>>> >>>>>>>>> that are currently only possible for data operations within a few >>>>>>>>> days. >>>>>>>>> >>>>>>>>> *Github Proposal*: https://github.com/apache/iceberg/issues/10646 >>>>>>>>> *Google Design Doc*: >>>>>>>>> https://docs.google.com/document/d/1m5K_XT7bckGfp8VrTe2093wEmEMslcTUE3kU_ohDn6A/edit >>>>>>>>> <https://docs.google.com/document/d/1m5K_XT7bckGfp8VrTe2093wEmEMslcTUE3kU_ohDn6A/edit> >>>>>>>>> >>>>>>>>> Curious if anyone has thought along these lines and/or sees >>>>>>>>> obvious issues. Would appreciate any feedback on the proposal. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Szehon >>>>>>>>> >>>>>>>>