> > What about scheduling a meeting (community) specific to persistence ? > I think it would be great to discuss. > We can give time to anyone interested to read the proposals, and > discuss next week ?
That works for me! Are you able to schedule it since you have the Google Meet features? I propose to delegate this to the Persistence implementation and not expose > anything about "indexes" in the interfaces used by Polaris services. > > I believe it is best to formulate the interface in terms of what queries > need to be supported and let the implementation find the most appropriate > way to do that in each particular case. Agreed, that is also my proposal. I'm using the term "design" to include both the interface design and the database-specific design since folks seemed to have questions about database-specific details. I've structured the doc now so that you can ignore the database-level proposals if you want. The interface says nothing about indexes (indeed, it is fundamental to my proposal not to expose indexes, because FDB does not have indexes). Maybe we can point to specific parts of the Java interface to have clarity For example, this is the interface for updating an entity from PolarisMetaStoreManager: /** * Update some properties of this entity assuming it can still be resolved the same way and itself * has not changed. If this is not the case we will return false. Else we will update both the * internal and visible properties and return true * * @param session the metastore session * @param catalogPath path to that entity. Could be null if this entity is top-level * @param entity entity to update, cannot be null * @return the entity we updated or null if the client should retry */ @Nonnull EntityResult updateEntityPropertiesIfNotChanged( @Nonnull PolarisMetaStoreSession session, @Nullable List<PolarisEntityCore> catalogPath, @Nonnull PolarisBaseEntity entity); I'd say "queries" imply a particular backend implementation as well -- we should focus on the Java interface, and then the implementation could use a "query" or other APIs as they see fit. On Wed, Feb 5, 2025 at 10:34 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Dennis, > > What about scheduling a meeting (community) specific to persistence ? > I think it would be great to discuss. > We can give time to anyone interested to read the proposals, and > discuss next week ? > > Maybe we can do it on Thursday, Feb 13, 9am PST (same slot as Polaris > Community Meeting) ? > > Thanks > Regards > JB > > On Tue, Feb 4, 2025 at 8:02 PM Dennis Huo <huoi...@gmail.com> wrote: > > > > Hello all, > > > > We've had some discussions and github issues ( > > https://github.com/apache/polaris/issues/775, > > https://github.com/apache/polaris/issues/766, etc) scattered between > > community syncs, slack threads, etc., related to how to adapt the Polaris > > persistence layer to new DB backends, so I'm hoping we can consolidate > > discussions towards an incremental path forward that is minimally > invasive. > > > > I wrote this analysis of the persistence layer in the context of a couple > > persistence backends that have been suggested (MongoDB, DynamoDB) and > also > > tried to retroactively clarify the current structure and intent of some > of > > the persistence internals: > > > > > https://docs.google.com/document/d/1U9rprj8w8-Q0SnQvRMvoVlbX996z-89eOkVwWTQaZG0/edit?tab=t.0 > > > > It's a bit into the weeds, so will be most accessible for folks who have > > taken a bit of a deep dive into the current > > PolarisMetaStoreManager/PolarisMetaStoreSession layers. > > > > I'm not a MongoDB or DynamoDB expert though, so I'd appreciate any input > to > > help keep me honest on the capabilities :) > > > > At a high level, the biggest takeaway is that we *don't* need generalized > > transactions, and some refactoring will allow us to have an abstract > > top-level interface where 99% of use cases are covered either by: > > > > - Secondary indexes with "UNIQUE" constraints plus single-entity > > Compare-and-Swap > > - Multi-statement transactions for those that support it > (FoundationDB, > > Postgres) > > > > One notable outlier is Iceberg's "commitTransaction" API for multi-table > > Iceberg transactions, which will require at least a "TransactBatch with > > conditional writes per entity" semantic. We could evolve this one over > > time, but on the plus side it seems this is still mostly supported by all > > the mentioned backends so far. >