Hi JB, I think your doc also has a lot of valuable information and proposals.
Unfortunately, I'm afraid my point about lack of concurrent updates discussion applies to your doc too. I think it is important to define conflicting change resolution at the persistence layer across all backends because we cannot assume that "natural" implementations (e.g. transaction exceptions in RDBMS) will map well to the service layer code in Polaris. I think Persistence has to expose conflicts the same way regardless of the backing database and do the conversion from "natural" errors to some Polaris-specific form. More specifically, for example, if two multi-table changes clash, how will this be exposed to the service code? What do the services need to do to "declare" what is the expected state before the change to allow Persistence to detect conflicts. This applies to all approaches to refactoring Persistence, I believe. WDYT? Thanks, Dmitri. On Tue, Feb 4, 2025 at 2:57 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Dennis > > Thanks for starting this thread ! > I will read the resources you mentioned. > > As I also started some design proposal doc with Jack, I'm linking to > this thread (it's a WIP): > > https://docs.google.com/document/d/1LlNhEy4cBjjE_um694fcsnizqd3rDm5pbewXkLxvu1o/edit?usp=sharing > I will continue the code "illustration" tomorrow (my time). > > Regards > JB > > On Tue, Feb 4, 2025 at 8:02 PM Dennis Huo <huoi...@gmail.com> wrote: > > > > Hello all, > > > > We've had some discussions and github issues ( > > https://github.com/apache/polaris/issues/775, > > https://github.com/apache/polaris/issues/766, etc) scattered between > > community syncs, slack threads, etc., related to how to adapt the Polaris > > persistence layer to new DB backends, so I'm hoping we can consolidate > > discussions towards an incremental path forward that is minimally > invasive. > > > > I wrote this analysis of the persistence layer in the context of a couple > > persistence backends that have been suggested (MongoDB, DynamoDB) and > also > > tried to retroactively clarify the current structure and intent of some > of > > the persistence internals: > > > > > https://docs.google.com/document/d/1U9rprj8w8-Q0SnQvRMvoVlbX996z-89eOkVwWTQaZG0/edit?tab=t.0 > > > > It's a bit into the weeds, so will be most accessible for folks who have > > taken a bit of a deep dive into the current > > PolarisMetaStoreManager/PolarisMetaStoreSession layers. > > > > I'm not a MongoDB or DynamoDB expert though, so I'd appreciate any input > to > > help keep me honest on the capabilities :) > > > > At a high level, the biggest takeaway is that we *don't* need generalized > > transactions, and some refactoring will allow us to have an abstract > > top-level interface where 99% of use cases are covered either by: > > > > - Secondary indexes with "UNIQUE" constraints plus single-entity > > Compare-and-Swap > > - Multi-statement transactions for those that support it > (FoundationDB, > > Postgres) > > > > One notable outlier is Iceberg's "commitTransaction" API for multi-table > > Iceberg transactions, which will require at least a "TransactBatch with > > conditional writes per entity" semantic. We could evolve this one over > > time, but on the plus side it seems this is still mostly supported by all > > the mentioned backends so far. >