Russell, can you clarify? The proposal as it stands would make one of the conflicting commit's resulting snapshot the latest snapshot, superseding data added in the conflicting commits, leading to data loss ... that's correct, isn't it?
On Tue, Jul 8, 2025 at 11:22 AM Russell Spitzer <russell.spit...@gmail.com> wrote: > > I do like this proposal because it essentially avoids all the issues that > Robert mentions by instead just offering the ability for a client to decide > in advance which commits would succeed. Leaving more advanced automatic or > server side determined deconfliction is a good future direction but I think > it’s orthogonal to this proposal. > > > On Tue, Jul 8, 2025 at 3:19 AM Robert Stupp <sn...@snazy.de> wrote: > > > The general idea to resolve commit-conflicts in Polaris is fine. > > However I miss some information about the tricky details. > > > > The tricky part is how to detect and in turn how to resolve those > > conflicts. That requires knowledge of the change being performed and > > its context. > > > > While it sounds simple to let one "append" operation succeed on top of > > another conflicting "append" operation, it's in practice not that > > simple. At least Iceberg's sequence numbers get in the way here, > > because you'd get duplicate sequence IDs in that case. > > Other cases, for example writes with "merge on write" are even > > trickier (one commit deletes existing and adds new data files) - > > resolving two conflicting "merge on write" operations is very > > difficult (or say: extremely expensive as you'd have to perform a > > `diff` on the data). > > Another case that comes to mind is one schema change ("happens > > before") dropping a column plus an append referring to the dropped > > column ("happens after"). > > > > In Nessie we've been thinking about this problem for quite a while > > [1], the outcome every time was that it would be an awesome feature, > > but a lot of necessary contextual information (aka what Iceberg stores > > and what Iceberg provides in commits) is missing. > > > > IMHO we should think about the actual conflict resolution first and > > have the necessary changes in Iceberg. > > > > [1] https://github.com/projectnessie/nessie/issues/2513 > > > > > > On Tue, Jul 8, 2025 at 1:27 AM Dmitri Bourlatchkov <di...@apache.org> > > wrote: > > > > > > ... but the Polaris Server still has to reconcile the metadata for > > > conflicting changes (before it commits), right? > > > > > > Thanks, > > > Dmitri. > > > > > > On Mon, Jul 7, 2025 at 7:22 PM Eric Maynard <eric.w.mayn...@gmail.com> > > > wrote: > > > > > > > Hi Dmitri, thanks for checking the doc out. > > > > > > > > Indeed, in this implementation, the server does not apply any "decision > > > > logic" at all to the commits. Or perhaps it's more accurate to say > > that the > > > > decision logic applied is only to inspect the commits and check for > > their > > > > mutual consent to deconflict. The server trusts this mutual consent. > > > > > > > > There's a small section about other strategies at the end of the doc -- > > > > essentially, I think we could implement various deconfliction > > strategies > > > > and allow them to be mixed together, like we do with the FileIOFactory > > > > implementations for example. > > > > > > > > --EM > > > > > > > > On Mon, Jul 7, 2025 at 3:57 PM Dmitri Bourlatchkov <di...@apache.org> > > > > wrote: > > > > > > > > > Hi Eric, > > > > > > > > > > This sounds like an interesting approach to me. > > > > > > > > > > I wonder how much decision logic do you envision Polaris to perform > > for > > > > > de-conflictling? Is it mostly approving based submitted "Writer" ID > > > > checks > > > > > or will Polaris validate actual table changes? > > > > > > > > > > I added some comments to the doc too. > > > > > > > > > > Thanks, > > > > > Dmitri. > > > > > > > > > > On Mon, Jul 7, 2025 at 6:33 PM Eric Maynard < > > eric.w.mayn...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > Wanted to share this short design doc > > > > > > < > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1tkqBOYtkcA7fbDmhIAE6_6Jmus5WwP6vS6jA_JHp4Ms > > > > > > > > > > > > > for > > > > > > a simple method of allowing conflicting commits to both be > > committed. > > > > If > > > > > > implemented, this would allow e.g. two writers doing append-only > > > > > operations > > > > > > to a table in Polaris to always succeed. > > > > > > > > > > > > If you're interested, please take a look. In the meantime, I'll be > > > > > > preparing a small draft PR to serve as a reference implementation. > > > > > > > > > > > > --EM > > > > > > > > > > > > > > > > >