I agree with Russell. The right thing to do is to keep snapshots around
long enough that you don't lose the history. I don't think that changing
the read logic would work because you need all the history to ensure you
aren't skipping a snapshot.

On Mon, Jan 11, 2021 at 11:24 AM Russell Spitzer <[email protected]>
wrote:

> I would probably try to just extend my expiration interval if that was
> possible to fix the issue since it's basically functioning as a watermark
> for state at the moment.
>
> Is our underlying issue here that we cannot determine the lineage of a
> Snapshot that has been expired? IE: We know all the files and which
> snapshots added them, but we cannot determine where our "From" snapshot
> exists in history since we did the expiration?
>
> On Mon, Jan 11, 2021 at 11:07 AM Filip <[email protected]> wrote:
>
>> Hi team,
>>
>> We've recently bumped into an issue with a particular edge case that
>> messes with our implementation of leveraging the incremental read and the
>> expire snapshot features combined.
>>
>> With incremental read we're relying on the client to preserve the
>> snapshot that was last used for reading data as a checkpoint. Every time
>> the client does an incremental read it gets new data (if available) along
>> with the current snapshot that the client will store along as its new
>> checkpoint.
>>
>> Expire snapshot is scheduled to kick in and wipe snapshots based on
>> recency (say older than N days).
>> But in the edge-case of two consecutive write operations happening less
>> often than the expiration interval (*)  if the incremental read process
>> doesn't run before the snapshot expiration then the client will be left in
>> an inconsistent state since the snapshot it has stored as checkpoint is not
>> going to work anymore.
>>
>> So we were looking at either extending the snapshot expiration feature or
>> extending the implementation of incremental read.
>>
>> I'll just drop-in some details on exploring the solution to extend
>> incremental read - extend it by adding a fallback logic when the provided
>> snapshot is missing and try to locate the snapshot parented by that
>> particular snapshot instead.
>> This would change the logic of the incremental read with respect to
>> inclusiveness of loading the snapshots, if it currently considers the
>> provided "from" snapshot as exclusive, in the case of the fallback logic to
>> using the child snapshot as "from" it would have to be inclusive.
>>
>> Let me know if you think this edge-case should be supported by Iceberg
>> and if this idea of extending the incremental read logic makes sense or if
>> folks in the community have a better solution for this.
>>
>> (*) We expire snapshots older than 10 days but we observe two
>> consecutive write operations 11 days apart.
>>
>> --
>> Filip Bocse
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to