Hello all,

We've had some discussions and github issues (
https://github.com/apache/polaris/issues/775,
https://github.com/apache/polaris/issues/766, etc) scattered between
community syncs, slack threads, etc., related to how to adapt the Polaris
persistence layer to new DB backends, so I'm hoping we can consolidate
discussions towards an incremental path forward that is minimally invasive.

I wrote this analysis of the persistence layer in the context of a couple
persistence backends that have been suggested (MongoDB, DynamoDB) and also
tried to retroactively clarify the current structure and intent of some of
the persistence internals:

https://docs.google.com/document/d/1U9rprj8w8-Q0SnQvRMvoVlbX996z-89eOkVwWTQaZG0/edit?tab=t.0

It's a bit into the weeds, so will be most accessible for folks who have
taken a bit of a deep dive into the current
PolarisMetaStoreManager/PolarisMetaStoreSession layers.

I'm not a MongoDB or DynamoDB expert though, so I'd appreciate any input to
help keep me honest on the capabilities :)

At a high level, the biggest takeaway is that we *don't* need generalized
transactions, and some refactoring will allow us to have an abstract
top-level interface where 99% of use cases are covered either by:

   - Secondary indexes with "UNIQUE" constraints plus single-entity
   Compare-and-Swap
   - Multi-statement transactions for those that support it (FoundationDB,
   Postgres)

One notable outlier is Iceberg's "commitTransaction" API for multi-table
Iceberg transactions, which will require at least a "TransactBatch with
conditional writes per entity" semantic. We could evolve this one over
time, but on the plus side it seems this is still mostly supported by all
the mentioned backends so far.

Reply via email to