Hello all, We've had some discussions and github issues ( https://github.com/apache/polaris/issues/775, https://github.com/apache/polaris/issues/766, etc) scattered between community syncs, slack threads, etc., related to how to adapt the Polaris persistence layer to new DB backends, so I'm hoping we can consolidate discussions towards an incremental path forward that is minimally invasive.
I wrote this analysis of the persistence layer in the context of a couple persistence backends that have been suggested (MongoDB, DynamoDB) and also tried to retroactively clarify the current structure and intent of some of the persistence internals: https://docs.google.com/document/d/1U9rprj8w8-Q0SnQvRMvoVlbX996z-89eOkVwWTQaZG0/edit?tab=t.0 It's a bit into the weeds, so will be most accessible for folks who have taken a bit of a deep dive into the current PolarisMetaStoreManager/PolarisMetaStoreSession layers. I'm not a MongoDB or DynamoDB expert though, so I'd appreciate any input to help keep me honest on the capabilities :) At a high level, the biggest takeaway is that we *don't* need generalized transactions, and some refactoring will allow us to have an abstract top-level interface where 99% of use cases are covered either by: - Secondary indexes with "UNIQUE" constraints plus single-entity Compare-and-Swap - Multi-statement transactions for those that support it (FoundationDB, Postgres) One notable outlier is Iceberg's "commitTransaction" API for multi-table Iceberg transactions, which will require at least a "TransactBatch with conditional writes per entity" semantic. We could evolve this one over time, but on the plus side it seems this is still mostly supported by all the mentioned backends so far.