Re: [PROPOSAL] Commit Deconfliction

Robert Stupp Tue, 08 Jul 2025 01:20:04 -0700

The general idea to resolve commit-conflicts  in Polaris is fine.
However I miss some information about the tricky details.

The tricky part is how to detect and in turn how to resolve those
conflicts. That requires knowledge of the change being performed and
its context.

While it sounds simple to let one "append" operation succeed on top of
another conflicting "append" operation, it's in practice not that
simple. At least Iceberg's sequence numbers get in the way here,
because you'd get duplicate sequence IDs in that case.
Other cases, for example writes with "merge on write" are even
trickier (one commit deletes existing and adds new data files) -
resolving two conflicting "merge on write" operations is very
difficult (or say: extremely expensive as you'd have to perform a
`diff` on the data).
Another case that comes to mind is one schema change ("happens
before") dropping a column plus an append referring to the dropped
column ("happens after").

In Nessie we've been thinking about this problem for quite a while
[1], the outcome every time was that it would be an awesome feature,
but a lot of necessary contextual information (aka what Iceberg stores
and what Iceberg provides in commits) is missing.

IMHO we should think about the actual conflict resolution first and
have the necessary changes in Iceberg.

[1] https://github.com/projectnessie/nessie/issues/2513

On Tue, Jul 8, 2025 at 1:27 AM Dmitri Bourlatchkov <di...@apache.org> wrote:
>
> ... but the Polaris Server still has to reconcile the metadata for
> conflicting changes (before it commits), right?
>
> Thanks,
> Dmitri.
>
> On Mon, Jul 7, 2025 at 7:22 PM Eric Maynard <eric.w.mayn...@gmail.com>
> wrote:
>
> > Hi Dmitri, thanks for checking the doc out.
> >
> > Indeed, in this implementation, the server does not apply any "decision
> > logic" at all to the commits. Or perhaps it's more accurate to say that the
> > decision logic applied is only to inspect the commits and check for their
> > mutual consent to deconflict. The server trusts this mutual consent.
> >
> > There's a small section about other strategies at the end of the doc --
> > essentially, I think we could implement various deconfliction strategies
> > and allow them to be mixed together, like we do with the FileIOFactory
> > implementations for example.
> >
> > --EM
> >
> > On Mon, Jul 7, 2025 at 3:57 PM Dmitri Bourlatchkov <di...@apache.org>
> > wrote:
> >
> > > Hi Eric,
> > >
> > > This sounds like an interesting approach to me.
> > >
> > > I wonder how much decision logic do you envision Polaris to perform for
> > > de-conflictling? Is it mostly approving based submitted "Writer" ID
> > checks
> > > or will Polaris validate actual table changes?
> > >
> > > I added some comments to the doc too.
> > >
> > > Thanks,
> > > Dmitri.
> > >
> > > On Mon, Jul 7, 2025 at 6:33 PM Eric Maynard <eric.w.mayn...@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Wanted to share this short design doc
> > > > <
> > > >
> > >
> > https://docs.google.com/document/d/1tkqBOYtkcA7fbDmhIAE6_6Jmus5WwP6vS6jA_JHp4Ms
> > > > >
> > > > for
> > > > a simple method of allowing conflicting commits to both be committed.
> > If
> > > > implemented, this would allow e.g. two writers doing append-only
> > > operations
> > > > to a table in Polaris to always succeed.
> > > >
> > > > If you're interested, please take a look. In the meantime, I'll be
> > > > preparing a small draft PR to serve as a reference implementation.
> > > >
> > > > --EM
> > > >
> > >
> >

Re: [PROPOSAL] Commit Deconfliction

Reply via email to