Hi Dmitri I agree, and it's what I mean in the doc by "PolarisStore interface has to clearly document the methods expectation about the backend" in the cons section.
The interface should not strictly enforce the change resolution, but it should document what the expectations are. Then, each backend implementation can implement the logic specific to the backend. The PolarisException thrown by a PolarisStorage method will be "propagated" to the rest of the business logic via PolarisMetaStoreManager. But the backend decides when the PolarisException is thrown. For instance, some backends could implement a retry mechanism, some backends can implement a locking mechanism, etc. At the end of the day, it's up to the backend implementation to define the way it deals with "conflict change resolution". The PolarisStore interface should not be opinionated: imho, it's an implementation concern. Regards JB On Wed, Feb 5, 2025 at 3:05 PM Dmitri Bourlatchkov <dmitri.bourlatch...@dremio.com.invalid> wrote: > > Hi JB, > > I think your doc also has a lot of valuable information and proposals. > > Unfortunately, I'm afraid my point about lack of concurrent updates > discussion applies to your doc too. > > I think it is important to define conflicting change resolution at the > persistence layer across all backends because we cannot assume that > "natural" implementations (e.g. transaction exceptions in RDBMS) will map > well to the service layer code in Polaris. I think Persistence has to > expose conflicts the same way regardless of the backing database and do the > conversion from "natural" errors to some Polaris-specific form. > > More specifically, for example, if two multi-table changes clash, how will > this be exposed to the service code? What do the services need to do to > "declare" what is the expected state before the change to allow Persistence > to detect conflicts. > > This applies to all approaches to refactoring Persistence, I believe. > > WDYT? > > Thanks, > Dmitri. > > On Tue, Feb 4, 2025 at 2:57 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > > > Hi Dennis > > > > Thanks for starting this thread ! > > I will read the resources you mentioned. > > > > As I also started some design proposal doc with Jack, I'm linking to > > this thread (it's a WIP): > > > > https://docs.google.com/document/d/1LlNhEy4cBjjE_um694fcsnizqd3rDm5pbewXkLxvu1o/edit?usp=sharing > > I will continue the code "illustration" tomorrow (my time). > > > > Regards > > JB > > > > On Tue, Feb 4, 2025 at 8:02 PM Dennis Huo <huoi...@gmail.com> wrote: > > > > > > Hello all, > > > > > > We've had some discussions and github issues ( > > > https://github.com/apache/polaris/issues/775, > > > https://github.com/apache/polaris/issues/766, etc) scattered between > > > community syncs, slack threads, etc., related to how to adapt the Polaris > > > persistence layer to new DB backends, so I'm hoping we can consolidate > > > discussions towards an incremental path forward that is minimally > > invasive. > > > > > > I wrote this analysis of the persistence layer in the context of a couple > > > persistence backends that have been suggested (MongoDB, DynamoDB) and > > also > > > tried to retroactively clarify the current structure and intent of some > > of > > > the persistence internals: > > > > > > > > https://docs.google.com/document/d/1U9rprj8w8-Q0SnQvRMvoVlbX996z-89eOkVwWTQaZG0/edit?tab=t.0 > > > > > > It's a bit into the weeds, so will be most accessible for folks who have > > > taken a bit of a deep dive into the current > > > PolarisMetaStoreManager/PolarisMetaStoreSession layers. > > > > > > I'm not a MongoDB or DynamoDB expert though, so I'd appreciate any input > > to > > > help keep me honest on the capabilities :) > > > > > > At a high level, the biggest takeaway is that we *don't* need generalized > > > transactions, and some refactoring will allow us to have an abstract > > > top-level interface where 99% of use cases are covered either by: > > > > > > - Secondary indexes with "UNIQUE" constraints plus single-entity > > > Compare-and-Swap > > > - Multi-statement transactions for those that support it > > (FoundationDB, > > > Postgres) > > > > > > One notable outlier is Iceberg's "commitTransaction" API for multi-table > > > Iceberg transactions, which will require at least a "TransactBatch with > > > conditional writes per entity" semantic. We could evolve this one over > > > time, but on the plus side it seems this is still mostly supported by all > > > the mentioned backends so far. > >