Thanks Peter and Yufei.

Yes, in terms of implementation, I noted in the doc we need to add error
checks to prevent time-travel / rollback / cherry-pick operations to
'expired' snapshots.  I'll make it more clear in the doc, which operations
we need to check against.

I believe DeleteOrphanFiles may be ok as is, because currently the logic
walks down the reachable graph and marks those metadata files as
'not-orphan', so it should naturally walk these 'expired' snapshots as well.

So, I think the main changes in terms of implementations is going to be
adding error checks in those Table API's, and updating ExpireSnapshots API.

Do we want to consider expiring snapshots in the middle of the history of
> the table?
>
You mean purging expired snapshots in the middle of the history, right?  I
think the current mechanism for this is 'tagging' and 'branching'.  So
interestingly, I was thinking its related to your other question, and if we
don't add error-check to 'tagging' and 'branching' on 'expired' snapshot,
it could be handled just as they are handled today for other snapshots.
Its one option.  We could support it subsequently as well , after the first
version and if there's some usage of this.

One thing that comes up in this thread and google doc is some question
about the size of preserved metadata.  I had put in the Alternatives
section, that we could potentially make the ExpireSnapshots purge boolean
argument more nuanced like PURGE, PRESERVE_REFS (snapshot refs are
preserved), PRESERVE_METADATA (snapshot refs and all metadata files are
preserved), though I am still debating if its worth it, as users could
choose not to use this feature.

Thanks
Szehon



On Tue, Jul 9, 2024 at 6:02 PM Yufei Gu <flyrain...@gmail.com> wrote:

> Thank you for the interesting proposal. With a minor specification change,
> it could indeed enable different retention periods for data files and
> metadata files. This differentiation is useful for two reasons:
>
>    1. More metadata helps us better understand the table history,
>    providing valuable insights.
>    2. Users often prioritize data file deletion as it frees up
>    significant storage space and removes potentially sensitive data.
>
> However, adding a boolean property to the specification isn't necessarily
> a lightweight solution. As Peter mentioned, implementing this change
> requires modifications in several places. In this context, external systems
> like LakeChime or a REST catalog implementation could offer effective
> solutions to manage extended metadata retention periods, without spec
> changes.
>
> I am neutral on this proposal (+0) and look forward to seeing more input
> from people.
> Yufei
>
>
> On Mon, Jul 8, 2024 at 10:32 PM Péter Váry <peter.vary.apa...@gmail.com>
> wrote:
>
>> We need to handle expired snapshots in several places differently in
>> Iceberg core as well.
>> - We need to add checks to prevent scans read these snapshots and throw a
>> meaningful error.
>> - We need to add checks to prevent tagging/branching these snapshots
>> - We need to update DeleteOrphanFiles in Spark/Flink to not consider
>> files only referenced by the expired snapshots
>>
>> Some Flink jobs do frequent commits, and in these cases, the size of the
>> metadata file becomes a constraining factor too. In this case, we could
>> just tell not to use this feature, and expire the metadata as we do now,
>> but I thought it's worth to mention.
>>
>> Do we want to consider expiring snapshots in the middle of the history of
>> the table?
>> When we compact the table, then the compaction commits litter the real
>> history of the table. Consider the following:
>> - S1 writes some data
>> - S2 writes some more data
>> - S3 compacts the previous 2 commits
>> - S4 writes even more data
>> From the query engine user perspective S3 is a commit which does nothing,
>> not initiated by the user, and most probably they don't even want to know
>> of. If one can expire a snapshot from the middle of the history, that would
>> be nice, so users would see only S1/S2/S4. The only downside is that
>> reading S2 is less performant than reading S3, but IMHO this could be
>> acceptable for having only user driven changes in the table history.
>>
>>
>> In Mon, Jul 8, 2024, 20:15 Szehon Ho <szehon.apa...@gmail.com> wrote:
>>
>>> Thanks for the comments so far.  I also thought previously that this
>>> functionality would be in an external system, like LakeChime, or a custom
>>> catalog extension.  But after doing an initial analysis (please double
>>> check), I thought it's a small enough change that it would be worth putting
>>> in the Iceberg spec/API's for all users:
>>>
>>>    - Table Spec, only one optional boolean field (on Snapshot, only set
>>>    if the functionality is used).
>>>    - API, only one boolean parameter (on ExpireSnapshots).
>>>
>>> I do wonder, will keeping expired snapshots as is slow down
>>>> manifest/scan planning though (REST catalog approaches could probably
>>>> mitigate this)?
>>>>
>>>
>>> I think it should not slow down manifest/scan planning, because we plan
>>> using the current snapshot (or the one we specify via time travel), and we
>>> wouldn't read expired snapshots in this case.
>>>
>>> Thanks
>>> Szehon
>>>
>>> On Mon, Jul 8, 2024 at 10:54 AM John Greene <jgreene1...@gmail.com>
>>> wrote:
>>>
>>>> I do agree with the need that this proposal solves, to decouple the
>>>> snapshot history from the data deletion. I do wonder, will keeping expired
>>>> snapshots as is slow down manifest/scan planning though (REST catalog
>>>> approaches could probably mitigate this)?
>>>>
>>>> On Mon, Jul 8, 2024, 5:34 AM Piotr Findeisen <piotr.findei...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Shehon, Walaa
>>>>>
>>>>> Thank Shehon for bringing this up. And thank you Walaa for proving
>>>>> more context from similar existing solution to the problem.
>>>>> The choices that LakeChime seems to have made -- to keep information
>>>>> in a separate RDBMS and which particular metadata information to retain --
>>>>> they indeed look as use-case specific, until we observe repeating 
>>>>> patterns.
>>>>> The idea to bake lifecycle changes into table format spec was proposed
>>>>> as an alternative to the idea to bake lifecycle changes into REST catalog
>>>>> spec. It was brought into discussion based on the intuition that REST
>>>>> catalog is first-class citizen in Iceberg world, just like other catalogs,
>>>>> and so solutions to table-centric problems do not need to be limited to
>>>>> REST catalog. What is the information we retain, how/whether this is
>>>>> configurable are open question and applicable to both avenues.
>>>>>
>>>>> As a 3rd/another alternative, we could focus on REST catalog
>>>>> *extensions*, without naming snapshot metadata lifecycle, and leave
>>>>> the problem up to REST's implementors. That would mean Iceberg project
>>>>> doesn't address snapshot metadata lifecycle changes topic directly, but
>>>>> instead gives users tools to build solutions around it. At this point I am
>>>>> not trying to judge whether it's a good idea or not. Probably depends how
>>>>> important it is to solve the problem and have a common solution.
>>>>>
>>>>> Best,
>>>>> Piotr
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sat, 6 Jul 2024 at 09:46, Walaa Eldin Moustafa <
>>>>> wa.moust...@gmail.com> wrote:
>>>>>
>>>>>> Hi Szehon,
>>>>>>
>>>>>> Thanks for sharing this proposal. We have thought along the same
>>>>>> lines and implemented an external system (LakeChime [1]) that retains
>>>>>> snapshot + partition metadata for longer (actual internal implementation
>>>>>> keeps data for 13 months, but that can be tuned). For efficient analysis,
>>>>>> we have kept this data in an RDBMS. My opinion is this may be a better 
>>>>>> fit
>>>>>> to an external system (similar to LakeChime) since it could potentially
>>>>>> complicate the Iceberg spec, APIs, or their implementations. Also, the 
>>>>>> type
>>>>>> of metadata tracked can differ depending on the use case. For example,
>>>>>> while LakeChime retains partition and operation type metadata, it does 
>>>>>> not
>>>>>> track file-level metadata as there was no specific use case for that.
>>>>>>
>>>>>> [1]
>>>>>> https://www.linkedin.com/blog/engineering/data-management/lakechime-a-data-trigger-service-for-modern-data-lakes
>>>>>>
>>>>>> Thanks,
>>>>>> Walaa.
>>>>>>
>>>>>> On Fri, Jul 5, 2024 at 11:49 PM Szehon Ho <szehon.apa...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi folks,
>>>>>>>
>>>>>>> I would like to discuss an idea for an optional extension of
>>>>>>> Iceberg's Snapshot metadata lifecycle.  Thanks Piotr for replying on the
>>>>>>> other thread that this should be a fuller Iceberg format change.
>>>>>>>
>>>>>>> *Proposal Summary*
>>>>>>>
>>>>>>> Currently, ExpireSnapshots(long olderThan) purges metadata and
>>>>>>> deleted data of a Snapshot together.  Purging deleted data often 
>>>>>>> requires a
>>>>>>> smaller timeline, due to strict requirements to claw back unused disk
>>>>>>> space, fulfill data lifecycle compliance, etc.  In many deployments, 
>>>>>>> this
>>>>>>> means 'olderThan' timestamp is set to just a few days before the current
>>>>>>> time (the default is 5 days).
>>>>>>>
>>>>>>> On the other hand, purging metadata could be ideally done on a more
>>>>>>> relaxed timeline, such as months or more, to allow for meaningful
>>>>>>> historical table analysis.
>>>>>>>
>>>>>>> We should have an optional way to purge Snapshot metadata separately
>>>>>>> from purging deleted data.  This would allow us to get history of the
>>>>>>> table, and answer questions like:
>>>>>>>
>>>>>>>    - When was a file/partition added
>>>>>>>    - When was a file/partition deleted
>>>>>>>    - How much data was added or removed in time X
>>>>>>>
>>>>>>> that are currently only possible for data operations within a few
>>>>>>> days.
>>>>>>>
>>>>>>> *Github Proposal*:  https://github.com/apache/iceberg/issues/10646
>>>>>>> *Google Design Doc*:
>>>>>>> https://docs.google.com/document/d/1m5K_XT7bckGfp8VrTe2093wEmEMslcTUE3kU_ohDn6A/edit
>>>>>>> <https://docs.google.com/document/d/1m5K_XT7bckGfp8VrTe2093wEmEMslcTUE3kU_ohDn6A/edit>
>>>>>>>
>>>>>>> Curious if anyone has thought along these lines and/or sees obvious
>>>>>>> issues.  Would appreciate any feedback on the proposal.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Szehon
>>>>>>>
>>>>>>

Reply via email to