Thanks all for the very informative discussion on this topic in the
community sync meeting today!

Let's have a dedicated discussion session as suggested.

Dennis: As to your doc, would you mind isolating caching concerns into a
separate section so that we could think about the main Persistence
API/contract and how it can work in the simple case (no cache). Then the
caching concerns could be addressed based on the simple API and discuss
where caching can be helpful and how it integrates with Persistence.
Ideally, caching should work with any persistence impl., but I think
database-specific caching is also file if there are efficiencies to be
realised only when some particular database feature is available. WDYT?

Re: existing TransactionWorkspaceMetaStoreManager. I believe the pattern it
implements (intercepting individual calls and re-grouping them into a
"change-many-objects" call) is not logically correct, given the discussion
around scoping conflict detection to each Persistence API invocation. That
is precisely because TransactionWorkspaceMetaStoreManager re-groups
individual changes into a "batched" change. So a caller of one method on
TransactionWorkspaceMetaStoreManager, after the call succeeds, still cannot
be sure that the change is effective because the "real" change is made only
later, in a batched way, which includes other changes and the batch may
fail as a whole.

Do you think it is possible to refactor the code related to
TransactionWorkspaceMetaStoreManager so as to avoid this pattern? We can do
it after the main Persistence API refactoring, of course.

Thanks,
Dmitri.

On Wed, Feb 5, 2025 at 9:06 PM Dennis Huo <huoi...@gmail.com> wrote:

> >
> > What about scheduling a meeting (community) specific to persistence ?
> > I think it would be great to discuss.
> > We can give time to anyone interested to read the proposals, and
> > discuss next week ?
>
>
> That works for me! Are you able to schedule it since you have the Google
> Meet features?
>
> I propose to delegate this to the Persistence implementation and not expose
> > anything about "indexes" in the interfaces used by Polaris services.
> >
> > I believe it is best to formulate the interface in terms of what queries
> > need to be supported and let the implementation find the most appropriate
> > way to do that in each particular case.
>
>
> Agreed, that is also my proposal. I'm using the term "design" to include
> both the interface design and the database-specific design since folks
> seemed to have questions about database-specific details.
>
>  I've structured the doc now so that you can ignore the database-level
> proposals if you want. The interface says nothing about indexes (indeed, it
> is fundamental to my proposal not to expose indexes, because FDB does not
> have indexes).
>
> Maybe we can point to specific parts of the Java interface to have clarity
>
> For example, this is the interface for updating an entity from
> PolarisMetaStoreManager:
>
>   /**
>    * Update some properties of this entity assuming it can still be
> resolved the same way and itself
>    * has not changed. If this is not the case we will return false. Else we
> will update both the
>    * internal and visible properties and return true
>    *
>    * @param session the metastore session
>    * @param catalogPath path to that entity. Could be null if this entity
> is top-level
>    * @param entity entity to update, cannot be null
>    * @return the entity we updated or null if the client should retry
>    */
>   @Nonnull
>   EntityResult updateEntityPropertiesIfNotChanged(
>       @Nonnull PolarisMetaStoreSession session,
>       @Nullable List<PolarisEntityCore> catalogPath,
>       @Nonnull PolarisBaseEntity entity);
>
> I'd say "queries" imply a particular backend implementation as well -- we
> should focus on the Java interface, and then the implementation could use a
> "query" or other APIs as they see fit.
>
> On Wed, Feb 5, 2025 at 10:34 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
> > Hi Dennis,
> >
> > What about scheduling a meeting (community) specific to persistence ?
> > I think it would be great to discuss.
> > We can give time to anyone interested to read the proposals, and
> > discuss next week ?
> >
> > Maybe we can do it on Thursday, Feb 13, 9am PST (same slot as Polaris
> > Community Meeting) ?
> >
> > Thanks
> > Regards
> > JB
> >
> > On Tue, Feb 4, 2025 at 8:02 PM Dennis Huo <huoi...@gmail.com> wrote:
> > >
> > > Hello all,
> > >
> > > We've had some discussions and github issues (
> > > https://github.com/apache/polaris/issues/775,
> > > https://github.com/apache/polaris/issues/766, etc) scattered between
> > > community syncs, slack threads, etc., related to how to adapt the
> Polaris
> > > persistence layer to new DB backends, so I'm hoping we can consolidate
> > > discussions towards an incremental path forward that is minimally
> > invasive.
> > >
> > > I wrote this analysis of the persistence layer in the context of a
> couple
> > > persistence backends that have been suggested (MongoDB, DynamoDB) and
> > also
> > > tried to retroactively clarify the current structure and intent of some
> > of
> > > the persistence internals:
> > >
> > >
> >
> https://docs.google.com/document/d/1U9rprj8w8-Q0SnQvRMvoVlbX996z-89eOkVwWTQaZG0/edit?tab=t.0
> > >
> > > It's a bit into the weeds, so will be most accessible for folks who
> have
> > > taken a bit of a deep dive into the current
> > > PolarisMetaStoreManager/PolarisMetaStoreSession layers.
> > >
> > > I'm not a MongoDB or DynamoDB expert though, so I'd appreciate any
> input
> > to
> > > help keep me honest on the capabilities :)
> > >
> > > At a high level, the biggest takeaway is that we *don't* need
> generalized
> > > transactions, and some refactoring will allow us to have an abstract
> > > top-level interface where 99% of use cases are covered either by:
> > >
> > >    - Secondary indexes with "UNIQUE" constraints plus single-entity
> > >    Compare-and-Swap
> > >    - Multi-statement transactions for those that support it
> > (FoundationDB,
> > >    Postgres)
> > >
> > > One notable outlier is Iceberg's "commitTransaction" API for
> multi-table
> > > Iceberg transactions, which will require at least a "TransactBatch with
> > > conditional writes per entity" semantic. We could evolve this one over
> > > time, but on the plus side it seems this is still mostly supported by
> all
> > > the mentioned backends so far.
> >
>

Reply via email to