Thanks Ryan, this is helpful! I will keep what you said in mind when I explore it.
On Fri, Sep 30, 2022 at 10:42 AM Ryan Blue <b...@tabular.io> wrote: > It depends on what you want the semantics of the revert to be. Here’s an > example overwrite: > > df.writeTo("db.table") > .overwrite(expr("ts >= today() and ts <= date_add(today(), 1)")) > > The overwrite expression removes any files written today and replaces them > with the contents of the DataFrame. Let’s say that replaces > today/file_A.parquet and today/file_B.parquet with today/file_C.parquet. > When there are no further changes, it’s easy to revert by replacing C with > A and B. That means at a minimum that C still needs to exist in the table > to revert. > > But what happens when there’s a new delete applied to C? Reverting would > un-delete a position delete against C and if the row was in A or B then it > would bring back a deleted row. > > For this, we probably also need to know the original filter so that we can > check for certain conflicts. Right now, that’s not stored anywhere. But we > could start adding it to Snapshot metadata. > > Ryan > > On Fri, Sep 30, 2022 at 9:41 AM Pucheng Yang <py...@pinterest.com.invalid> > wrote: > >> Thanks Ryan, how about an overwrite commit (insert overwrite)? What >> should I be aware of? Thanks. >> >> On Fri, Sep 30, 2022 at 9:26 AM Ryan Blue <b...@tabular.io> wrote: >> >>> Pucheng, >>> >>> I think you'd want to add a new option to the SnapshotManager to revert >>> a commit by ID. That would need to get the changes from the commit and >>> reverse them. We'd want to start small because reverting the file-level >>> changes isn't always the same thing as reverting the semantic changes. But >>> for simple cases like an append commit, it would work just fine. >>> >>> Ryan >>> >>> On Thu, Sep 29, 2022 at 3:13 PM Pucheng Yang <py...@pinterest.com.invalid> >>> wrote: >>> >>>> Thank you, I will take a look. >>>> >>>> On Thu, Sep 29, 2022 at 2:40 PM Ye, Jack <yzhao...@amazon.com.invalid> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> There is a PR published just today for something similar that you >>>>> might be able to reference: >>>>> https://github.com/apache/iceberg/pull/5888, which rolls back a >>>>> compaction commit on conflict and then reapply the changes. The logic >>>>> seems >>>>> to be similar as what you want, to rollback to that specific snapshot and >>>>> try to reapply the ones you still want. >>>>> >>>>> >>>>> >>>>> Best, >>>>> >>>>> Jack Ye >>>>> >>>>> >>>>> >>>>> *From: *Pucheng Yang <py...@pinterest.com.INVALID> >>>>> *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org> >>>>> *Date: *Thursday, September 29, 2022 at 2:27 PM >>>>> *To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org> >>>>> *Subject: *[EXTERNAL] Reverting a commit in the table history? >>>>> >>>>> >>>>> >>>>> Hi all, >>>>> >>>>> >>>>> >>>>> I wonder if any discussion happened about the idea of reverting a >>>>> commit in the table history? >>>>> >>>>> >>>>> >>>>> My clients have such a use case: they are writing some data into a >>>>> partition, and later want to revert that. But since there are new >>>>> snapshots >>>>> generated, thus they can not use snapshot rollback. >>>>> >>>>> >>>>> >>>>> Any comments are welcome! Thanks! >>>>> >>>>> >>>>> >>>>> Best, >>>>> >>>>> Pucheng >>>>> >>>> >>> >>> -- >>> Ryan Blue >>> Tabular >>> >> > > -- > Ryan Blue > Tabular >