I do like this proposal because it essentially avoids all the issues that Robert mentions by instead just offering the ability for a client to decide in advance which commits would succeed. Leaving more advanced automatic or server side determined deconfliction is a good future direction but I think it’s orthogonal to this proposal.
On Tue, Jul 8, 2025 at 3:19 AM Robert Stupp <sn...@snazy.de> wrote: > The general idea to resolve commit-conflicts in Polaris is fine. > However I miss some information about the tricky details. > > The tricky part is how to detect and in turn how to resolve those > conflicts. That requires knowledge of the change being performed and > its context. > > While it sounds simple to let one "append" operation succeed on top of > another conflicting "append" operation, it's in practice not that > simple. At least Iceberg's sequence numbers get in the way here, > because you'd get duplicate sequence IDs in that case. > Other cases, for example writes with "merge on write" are even > trickier (one commit deletes existing and adds new data files) - > resolving two conflicting "merge on write" operations is very > difficult (or say: extremely expensive as you'd have to perform a > `diff` on the data). > Another case that comes to mind is one schema change ("happens > before") dropping a column plus an append referring to the dropped > column ("happens after"). > > In Nessie we've been thinking about this problem for quite a while > [1], the outcome every time was that it would be an awesome feature, > but a lot of necessary contextual information (aka what Iceberg stores > and what Iceberg provides in commits) is missing. > > IMHO we should think about the actual conflict resolution first and > have the necessary changes in Iceberg. > > [1] https://github.com/projectnessie/nessie/issues/2513 > > > On Tue, Jul 8, 2025 at 1:27 AM Dmitri Bourlatchkov <di...@apache.org> > wrote: > > > > ... but the Polaris Server still has to reconcile the metadata for > > conflicting changes (before it commits), right? > > > > Thanks, > > Dmitri. > > > > On Mon, Jul 7, 2025 at 7:22 PM Eric Maynard <eric.w.mayn...@gmail.com> > > wrote: > > > > > Hi Dmitri, thanks for checking the doc out. > > > > > > Indeed, in this implementation, the server does not apply any "decision > > > logic" at all to the commits. Or perhaps it's more accurate to say > that the > > > decision logic applied is only to inspect the commits and check for > their > > > mutual consent to deconflict. The server trusts this mutual consent. > > > > > > There's a small section about other strategies at the end of the doc -- > > > essentially, I think we could implement various deconfliction > strategies > > > and allow them to be mixed together, like we do with the FileIOFactory > > > implementations for example. > > > > > > --EM > > > > > > On Mon, Jul 7, 2025 at 3:57 PM Dmitri Bourlatchkov <di...@apache.org> > > > wrote: > > > > > > > Hi Eric, > > > > > > > > This sounds like an interesting approach to me. > > > > > > > > I wonder how much decision logic do you envision Polaris to perform > for > > > > de-conflictling? Is it mostly approving based submitted "Writer" ID > > > checks > > > > or will Polaris validate actual table changes? > > > > > > > > I added some comments to the doc too. > > > > > > > > Thanks, > > > > Dmitri. > > > > > > > > On Mon, Jul 7, 2025 at 6:33 PM Eric Maynard < > eric.w.mayn...@gmail.com> > > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > Wanted to share this short design doc > > > > > < > > > > > > > > > > > > > https://docs.google.com/document/d/1tkqBOYtkcA7fbDmhIAE6_6Jmus5WwP6vS6jA_JHp4Ms > > > > > > > > > > > for > > > > > a simple method of allowing conflicting commits to both be > committed. > > > If > > > > > implemented, this would allow e.g. two writers doing append-only > > > > operations > > > > > to a table in Polaris to always succeed. > > > > > > > > > > If you're interested, please take a look. In the meantime, I'll be > > > > > preparing a small draft PR to serve as a reference implementation. > > > > > > > > > > --EM > > > > > > > > > > > > >