Hi Szehon,

This is a good idea considering the use case it intends to solve. Added few
questions and comments in the design doc.

IMO , Alternate options considered specified in the design doc look cleaner
to me.

I think, it might add to maintenance burden, now that we need to remember
to remove these metadata only snapshots.

Also I wonder some of the use cases it intends to address, is solvable by
metadata alone? - i.e how much data was added in a given time range? - May
be to answer these kind of questions user would prefer a to create KPI
using columns in the dataset.


Regards,
Himadri Pal


On Tue, Jul 9, 2024 at 11:10 PM Steven Wu <stevenz...@gmail.com> wrote:

> I am not totally convinced of the motivation yet.
>
> I thought the snapshot retention window is primarily meant for time travel
> and troubleshooting table changes that happened recently (like a few days
> or weeks).
>
> Is it valuable enough to keep expired snapshots for as long as months or
> years? While metadata files are typically smaller than data files in total
> size, it can still be significant considering the default amount of column
> stats written today (especially for wide tables with many columns).
>
> How long are we going to keep the expired snapshot references by default?
> If it is months/years, it can have major implications on the query
> performance of metadata tables (like snapshots, all_*).
>
> I assume it will also have some performance impact on table loading as a
> lot more expired snapshots are still referenced.
>
>
>
>
> On Tue, Jul 9, 2024 at 6:36 PM Szehon Ho <szehon.apa...@gmail.com> wrote:
>
>> Thanks Peter and Yufei.
>>
>> Yes, in terms of implementation, I noted in the doc we need to add error
>> checks to prevent time-travel / rollback / cherry-pick operations to
>> 'expired' snapshots.  I'll make it more clear in the doc, which operations
>> we need to check against.
>>
>> I believe DeleteOrphanFiles may be ok as is, because currently the logic
>> walks down the reachable graph and marks those metadata files as
>> 'not-orphan', so it should naturally walk these 'expired' snapshots as well.
>>
>> So, I think the main changes in terms of implementations is going to be
>> adding error checks in those Table API's, and updating ExpireSnapshots API.
>>
>> Do we want to consider expiring snapshots in the middle of the history of
>>> the table?
>>>
>> You mean purging expired snapshots in the middle of the history, right?
>> I think the current mechanism for this is 'tagging' and 'branching'.  So
>> interestingly, I was thinking its related to your other question, and if we
>> don't add error-check to 'tagging' and 'branching' on 'expired' snapshot,
>> it could be handled just as they are handled today for other snapshots.
>> Its one option.  We could support it subsequently as well , after the first
>> version and if there's some usage of this.
>>
>> One thing that comes up in this thread and google doc is some question
>> about the size of preserved metadata.  I had put in the Alternatives
>> section, that we could potentially make the ExpireSnapshots purge boolean
>> argument more nuanced like PURGE, PRESERVE_REFS (snapshot refs are
>> preserved), PRESERVE_METADATA (snapshot refs and all metadata files are
>> preserved), though I am still debating if its worth it, as users could
>> choose not to use this feature.
>>
>> Thanks
>> Szehon
>>
>>
>>
>> On Tue, Jul 9, 2024 at 6:02 PM Yufei Gu <flyrain...@gmail.com> wrote:
>>
>>> Thank you for the interesting proposal. With a minor specification
>>> change, it could indeed enable different retention periods for data files
>>> and metadata files. This differentiation is useful for two reasons:
>>>
>>>    1. More metadata helps us better understand the table history,
>>>    providing valuable insights.
>>>    2. Users often prioritize data file deletion as it frees up
>>>    significant storage space and removes potentially sensitive data.
>>>
>>> However, adding a boolean property to the specification isn't
>>> necessarily a lightweight solution. As Peter mentioned, implementing this
>>> change requires modifications in several places. In this context, external
>>> systems like LakeChime or a REST catalog implementation could offer
>>> effective solutions to manage extended metadata retention periods, without
>>> spec changes.
>>>
>>> I am neutral on this proposal (+0) and look forward to seeing more input
>>> from people.
>>> Yufei
>>>
>>>
>>> On Mon, Jul 8, 2024 at 10:32 PM Péter Váry <peter.vary.apa...@gmail.com>
>>> wrote:
>>>
>>>> We need to handle expired snapshots in several places differently in
>>>> Iceberg core as well.
>>>> - We need to add checks to prevent scans read these snapshots and throw
>>>> a meaningful error.
>>>> - We need to add checks to prevent tagging/branching these snapshots
>>>> - We need to update DeleteOrphanFiles in Spark/Flink to not consider
>>>> files only referenced by the expired snapshots
>>>>
>>>> Some Flink jobs do frequent commits, and in these cases, the size of
>>>> the metadata file becomes a constraining factor too. In this case, we could
>>>> just tell not to use this feature, and expire the metadata as we do now,
>>>> but I thought it's worth to mention.
>>>>
>>>> Do we want to consider expiring snapshots in the middle of the history
>>>> of the table?
>>>> When we compact the table, then the compaction commits litter the real
>>>> history of the table. Consider the following:
>>>> - S1 writes some data
>>>> - S2 writes some more data
>>>> - S3 compacts the previous 2 commits
>>>> - S4 writes even more data
>>>> From the query engine user perspective S3 is a commit which does
>>>> nothing, not initiated by the user, and most probably they don't even want
>>>> to know of. If one can expire a snapshot from the middle of the history,
>>>> that would be nice, so users would see only S1/S2/S4. The only downside is
>>>> that reading S2 is less performant than reading S3, but IMHO this could be
>>>> acceptable for having only user driven changes in the table history.
>>>>
>>>>
>>>> In Mon, Jul 8, 2024, 20:15 Szehon Ho <szehon.apa...@gmail.com> wrote:
>>>>
>>>>> Thanks for the comments so far.  I also thought previously that this
>>>>> functionality would be in an external system, like LakeChime, or a custom
>>>>> catalog extension.  But after doing an initial analysis (please double
>>>>> check), I thought it's a small enough change that it would be worth 
>>>>> putting
>>>>> in the Iceberg spec/API's for all users:
>>>>>
>>>>>    - Table Spec, only one optional boolean field (on Snapshot, only
>>>>>    set if the functionality is used).
>>>>>    - API, only one boolean parameter (on ExpireSnapshots).
>>>>>
>>>>> I do wonder, will keeping expired snapshots as is slow down
>>>>>> manifest/scan planning though (REST catalog approaches could probably
>>>>>> mitigate this)?
>>>>>>
>>>>>
>>>>> I think it should not slow down manifest/scan planning, because we
>>>>> plan using the current snapshot (or the one we specify via time travel),
>>>>> and we wouldn't read expired snapshots in this case.
>>>>>
>>>>> Thanks
>>>>> Szehon
>>>>>
>>>>> On Mon, Jul 8, 2024 at 10:54 AM John Greene <jgreene1...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I do agree with the need that this proposal solves, to decouple the
>>>>>> snapshot history from the data deletion. I do wonder, will keeping 
>>>>>> expired
>>>>>> snapshots as is slow down manifest/scan planning though (REST catalog
>>>>>> approaches could probably mitigate this)?
>>>>>>
>>>>>> On Mon, Jul 8, 2024, 5:34 AM Piotr Findeisen <
>>>>>> piotr.findei...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Shehon, Walaa
>>>>>>>
>>>>>>> Thank Shehon for bringing this up. And thank you Walaa for proving
>>>>>>> more context from similar existing solution to the problem.
>>>>>>> The choices that LakeChime seems to have made -- to keep information
>>>>>>> in a separate RDBMS and which particular metadata information to retain 
>>>>>>> --
>>>>>>> they indeed look as use-case specific, until we observe repeating 
>>>>>>> patterns.
>>>>>>> The idea to bake lifecycle changes into table format spec was
>>>>>>> proposed as an alternative to the idea to bake lifecycle changes into 
>>>>>>> REST
>>>>>>> catalog spec. It was brought into discussion based on the intuition that
>>>>>>> REST catalog is first-class citizen in Iceberg world, just like other
>>>>>>> catalogs, and so solutions to table-centric problems do not need to be
>>>>>>> limited to REST catalog. What is the information we retain, how/whether
>>>>>>> this is configurable are open question and applicable to both avenues.
>>>>>>>
>>>>>>> As a 3rd/another alternative, we could focus on REST catalog
>>>>>>> *extensions*, without naming snapshot metadata lifecycle, and leave
>>>>>>> the problem up to REST's implementors. That would mean Iceberg project
>>>>>>> doesn't address snapshot metadata lifecycle changes topic directly, but
>>>>>>> instead gives users tools to build solutions around it. At this point I 
>>>>>>> am
>>>>>>> not trying to judge whether it's a good idea or not. Probably depends 
>>>>>>> how
>>>>>>> important it is to solve the problem and have a common solution.
>>>>>>>
>>>>>>> Best,
>>>>>>> Piotr
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, 6 Jul 2024 at 09:46, Walaa Eldin Moustafa <
>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Szehon,
>>>>>>>>
>>>>>>>> Thanks for sharing this proposal. We have thought along the same
>>>>>>>> lines and implemented an external system (LakeChime [1]) that retains
>>>>>>>> snapshot + partition metadata for longer (actual internal 
>>>>>>>> implementation
>>>>>>>> keeps data for 13 months, but that can be tuned). For efficient 
>>>>>>>> analysis,
>>>>>>>> we have kept this data in an RDBMS. My opinion is this may be a better 
>>>>>>>> fit
>>>>>>>> to an external system (similar to LakeChime) since it could potentially
>>>>>>>> complicate the Iceberg spec, APIs, or their implementations. Also, the 
>>>>>>>> type
>>>>>>>> of metadata tracked can differ depending on the use case. For example,
>>>>>>>> while LakeChime retains partition and operation type metadata, it does 
>>>>>>>> not
>>>>>>>> track file-level metadata as there was no specific use case for that.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://www.linkedin.com/blog/engineering/data-management/lakechime-a-data-trigger-service-for-modern-data-lakes
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Walaa.
>>>>>>>>
>>>>>>>> On Fri, Jul 5, 2024 at 11:49 PM Szehon Ho <szehon.apa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi folks,
>>>>>>>>>
>>>>>>>>> I would like to discuss an idea for an optional extension of
>>>>>>>>> Iceberg's Snapshot metadata lifecycle.  Thanks Piotr for replying on 
>>>>>>>>> the
>>>>>>>>> other thread that this should be a fuller Iceberg format change.
>>>>>>>>>
>>>>>>>>> *Proposal Summary*
>>>>>>>>>
>>>>>>>>> Currently, ExpireSnapshots(long olderThan) purges metadata and
>>>>>>>>> deleted data of a Snapshot together.  Purging deleted data often 
>>>>>>>>> requires a
>>>>>>>>> smaller timeline, due to strict requirements to claw back unused disk
>>>>>>>>> space, fulfill data lifecycle compliance, etc.  In many deployments, 
>>>>>>>>> this
>>>>>>>>> means 'olderThan' timestamp is set to just a few days before the 
>>>>>>>>> current
>>>>>>>>> time (the default is 5 days).
>>>>>>>>>
>>>>>>>>> On the other hand, purging metadata could be ideally done on a
>>>>>>>>> more relaxed timeline, such as months or more, to allow for meaningful
>>>>>>>>> historical table analysis.
>>>>>>>>>
>>>>>>>>> We should have an optional way to purge Snapshot metadata
>>>>>>>>> separately from purging deleted data.  This would allow us to get 
>>>>>>>>> history
>>>>>>>>> of the table, and answer questions like:
>>>>>>>>>
>>>>>>>>>    - When was a file/partition added
>>>>>>>>>    - When was a file/partition deleted
>>>>>>>>>    - How much data was added or removed in time X
>>>>>>>>>
>>>>>>>>> that are currently only possible for data operations within a few
>>>>>>>>> days.
>>>>>>>>>
>>>>>>>>> *Github Proposal*:  https://github.com/apache/iceberg/issues/10646
>>>>>>>>> *Google Design Doc*:
>>>>>>>>> https://docs.google.com/document/d/1m5K_XT7bckGfp8VrTe2093wEmEMslcTUE3kU_ohDn6A/edit
>>>>>>>>> <https://docs.google.com/document/d/1m5K_XT7bckGfp8VrTe2093wEmEMslcTUE3kU_ohDn6A/edit>
>>>>>>>>>
>>>>>>>>> Curious if anyone has thought along these lines and/or sees
>>>>>>>>> obvious issues.  Would appreciate any feedback on the proposal.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Szehon
>>>>>>>>>
>>>>>>>>

Reply via email to