Thanks Ryan, this is helpful! I will keep what you said in mind when I
explore it.

On Fri, Sep 30, 2022 at 10:42 AM Ryan Blue <b...@tabular.io> wrote:

> It depends on what you want the semantics of the revert to be. Here’s an
> example overwrite:
>
> df.writeTo("db.table")
>     .overwrite(expr("ts >= today() and ts <= date_add(today(), 1)"))
>
> The overwrite expression removes any files written today and replaces them
> with the contents of the DataFrame. Let’s say that replaces
> today/file_A.parquet and today/file_B.parquet with today/file_C.parquet.
> When there are no further changes, it’s easy to revert by replacing C with
> A and B. That means at a minimum that C still needs to exist in the table
> to revert.
>
> But what happens when there’s a new delete applied to C? Reverting would
> un-delete a position delete against C and if the row was in A or B then it
> would bring back a deleted row.
>
> For this, we probably also need to know the original filter so that we can
> check for certain conflicts. Right now, that’s not stored anywhere. But we
> could start adding it to Snapshot metadata.
>
> Ryan
>
> On Fri, Sep 30, 2022 at 9:41 AM Pucheng Yang <py...@pinterest.com.invalid>
> wrote:
>
>> Thanks Ryan, how about an overwrite commit (insert overwrite)? What
>> should I be aware of? Thanks.
>>
>> On Fri, Sep 30, 2022 at 9:26 AM Ryan Blue <b...@tabular.io> wrote:
>>
>>> Pucheng,
>>>
>>> I think you'd want to add a new option to the SnapshotManager to revert
>>> a commit by ID. That would need to get the changes from the commit and
>>> reverse them. We'd want to start small because reverting the file-level
>>> changes isn't always the same thing as reverting the semantic changes. But
>>> for simple cases like an append commit, it would work just fine.
>>>
>>> Ryan
>>>
>>> On Thu, Sep 29, 2022 at 3:13 PM Pucheng Yang <py...@pinterest.com.invalid>
>>> wrote:
>>>
>>>> Thank you, I will take a look.
>>>>
>>>> On Thu, Sep 29, 2022 at 2:40 PM Ye, Jack <yzhao...@amazon.com.invalid>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> There is a PR published just today for something similar that you
>>>>> might be able to reference:
>>>>> https://github.com/apache/iceberg/pull/5888, which rolls back a
>>>>> compaction commit on conflict and then reapply the changes. The logic 
>>>>> seems
>>>>> to be similar as what you want, to rollback to that specific snapshot and
>>>>> try to reapply the ones you still want.
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Jack Ye
>>>>>
>>>>>
>>>>>
>>>>> *From: *Pucheng Yang <py...@pinterest.com.INVALID>
>>>>> *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>
>>>>> *Date: *Thursday, September 29, 2022 at 2:27 PM
>>>>> *To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>
>>>>> *Subject: *[EXTERNAL] Reverting a commit in the table history?
>>>>>
>>>>>
>>>>>
>>>>> Hi all,
>>>>>
>>>>>
>>>>>
>>>>> I wonder if any discussion happened about the idea of reverting a
>>>>> commit in the table history?
>>>>>
>>>>>
>>>>>
>>>>> My clients have such a use case: they are writing some data into a
>>>>> partition, and later want to revert that. But since there are new 
>>>>> snapshots
>>>>> generated, thus they can not use snapshot rollback.
>>>>>
>>>>>
>>>>>
>>>>> Any comments are welcome! Thanks!
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>> Pucheng
>>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to