It depends on what you want the semantics of the revert to be. Here’s an example overwrite:
df.writeTo("db.table") .overwrite(expr("ts >= today() and ts <= date_add(today(), 1)")) The overwrite expression removes any files written today and replaces them with the contents of the DataFrame. Let’s say that replaces today/file_A.parquet and today/file_B.parquet with today/file_C.parquet. When there are no further changes, it’s easy to revert by replacing C with A and B. That means at a minimum that C still needs to exist in the table to revert. But what happens when there’s a new delete applied to C? Reverting would un-delete a position delete against C and if the row was in A or B then it would bring back a deleted row. For this, we probably also need to know the original filter so that we can check for certain conflicts. Right now, that’s not stored anywhere. But we could start adding it to Snapshot metadata. Ryan On Fri, Sep 30, 2022 at 9:41 AM Pucheng Yang <py...@pinterest.com.invalid> wrote: > Thanks Ryan, how about an overwrite commit (insert overwrite)? What should > I be aware of? Thanks. > > On Fri, Sep 30, 2022 at 9:26 AM Ryan Blue <b...@tabular.io> wrote: > >> Pucheng, >> >> I think you'd want to add a new option to the SnapshotManager to revert a >> commit by ID. That would need to get the changes from the commit and >> reverse them. We'd want to start small because reverting the file-level >> changes isn't always the same thing as reverting the semantic changes. But >> for simple cases like an append commit, it would work just fine. >> >> Ryan >> >> On Thu, Sep 29, 2022 at 3:13 PM Pucheng Yang <py...@pinterest.com.invalid> >> wrote: >> >>> Thank you, I will take a look. >>> >>> On Thu, Sep 29, 2022 at 2:40 PM Ye, Jack <yzhao...@amazon.com.invalid> >>> wrote: >>> >>>> Hi, >>>> >>>> >>>> >>>> There is a PR published just today for something similar that you might >>>> be able to reference: https://github.com/apache/iceberg/pull/5888, >>>> which rolls back a compaction commit on conflict and then reapply the >>>> changes. The logic seems to be similar as what you want, to rollback to >>>> that specific snapshot and try to reapply the ones you still want. >>>> >>>> >>>> >>>> Best, >>>> >>>> Jack Ye >>>> >>>> >>>> >>>> *From: *Pucheng Yang <py...@pinterest.com.INVALID> >>>> *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org> >>>> *Date: *Thursday, September 29, 2022 at 2:27 PM >>>> *To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org> >>>> *Subject: *[EXTERNAL] Reverting a commit in the table history? >>>> >>>> >>>> >>>> Hi all, >>>> >>>> >>>> >>>> I wonder if any discussion happened about the idea of reverting a >>>> commit in the table history? >>>> >>>> >>>> >>>> My clients have such a use case: they are writing some data into a >>>> partition, and later want to revert that. But since there are new snapshots >>>> generated, thus they can not use snapshot rollback. >>>> >>>> >>>> >>>> Any comments are welcome! Thanks! >>>> >>>> >>>> >>>> Best, >>>> >>>> Pucheng >>>> >>> >> >> -- >> Ryan Blue >> Tabular >> > -- Ryan Blue Tabular