Re: [DISCUSS] Polaris Persistence Contract

Michael Collado Fri, 06 Dec 2024 20:22:24 -0800

That sounds great. It seems like a hard task to get right, but it’s
definitely the right thing to do.


Mike

On Fri, Dec 6, 2024 at 10:47 AM Dmitri Bourlatchkov
<[email protected]> wrote:

> Very good point about business logic in PolarisMetaStoreManagerImpl.
>
> I'd also add that the Resolve also mixes different concerns, specifically,
> it appears to perform cache invalidation / synchronization as part of
> object "resolution" phase, which also complicates reasoning about what
> services expect from persistence.
>
> Perhaps step 1 could be abstracting the caching layer in a way that
> services do not 'see' cache APIs directly, but interact with some other
> interface, which then can have multiple implementations: with and without
> caching, with and without "remote" caching, etc. WDYT?
>
> Thanks,
> Dmitri.
>
> On Fri, Dec 6, 2024 at 1:02 PM Michael Collado <[email protected]>
> wrote:
>
> > My intention when splitting up the PolarisMetaStoreManager interface was
> > always to cut the ties between the persistence manager and the other
> > responsibilities. For me, the first step seemed to break up the
> interface,
> > then change the consumers to depend on the most specific interface needed
> > to accomplish its tasks (e.g., depend on Grant Manager if it needed to
> read
> > grant records).
> >
> > Unfortunately, however, without DI support, it's too hard to also break
> the
> > PolarisMetaStoreManager inheritance on those other interfaces because we
> > don't have a way of knowing that a specific instance of
> > PolarisMetaStoreManager also implements the other interfaces (this is
> > something I think even a partial CDI implementation can unblock for us).
> >
> > I 100% agree on splitting up the interfaces and isolate
> secrets-management
> > and grant-management. The remote cache interface is a performance concern
> > and, I think, should be underneath the higher-level interfaces.
> >
> > The transaction and consistency guarantees are harder, I think. Not every
> > persistence layer will allow for transactions or for batch entity
> updates.
> > Personally, I think designing the persistence interface to support
> > transaction-like operations and batch updates will allow for different
> > persistence implementations to operate within the constraints of the
> > specific engine without tying the application to any particular details.
> > Those engines that support a WAL or other form of transaction log can
> > commit appropriately, whereas others may implement a "best-effort"
> approach
> > for multi-entity updates.
> >
> > One big problem I see is that the PolarisMetaStoreManagerImpl itself
> isn't
> > really a persistence layer, but an extension of the business logic.
> That's
> > really where the reliance on a begin/commit transaction workflow is
> > evident. I'd love to see the business logic pulled out of the metastore
> > manager and see it become more of a pure persistence layer. The
> > MetaStoreSession interface could be a hidden detail of some
> > implementations, but wouldn't need to be accessible anywhere else.
> >
> > Anyway, I'm looking forward to some ideas on how we can support some
> > specific NoSQL implementations better. Adding support for one or two more
> > specific backends will help us highlight the broken assumptions we have
> > around the persistence layer today.
> >
> > Mike
> >
> > On Tue, Dec 3, 2024 at 10:05 AM Eric Maynard <[email protected]>
> > wrote:
> >
> > > I think this is a great idea. Even if we put aside the NoSQL / RDBMS
> > point,
> > > simply clarifying the roles & responsibilities of the persistence
> > > interface(s) would be a welcome improvement.
> > >
> > > --EM
> > >
> > > On Tue, Dec 3, 2024 at 5:57 AM Dmitri Bourlatchkov
> > > <[email protected]> wrote:
> > >
> > > > Hi All,
> > > >
> > > > I believe it was already discussed elsewhere that it is valuable to
> > allow
> > > > Apache Polaris to be extensible, and in particular extensible in how
> it
> > > > interacts with its own Persistence backend (not to be confused with
> > > Iceberg
> > > > data storage).
> > > >
> > > > I’d like to formalize the expectations Polaris Core has on
> > Persistence. I
> > > > think it will be extremely valuable for contributors wishing to add
> > > support
> > > > for backends beyond the current EclipseLink implementation.
> > > >
> > > > Currently, the closest abstraction layer for Persistence appears to
> be
> > > > PolarisMetaStoreManager, however this interface combines a few other
> > > > interfaces, not directly related to persistence per se and having
> > > different
> > > > concerns:
> > > >
> > > >
> > > >    -
> > > >
> > > >    The Grant Manager
> > > >    -
> > > >
> > > >    Remote Cache
> > > >    -
> > > >
> > > >    Secrets Manager
> > > >    -
> > > >
> > > >    Credential Vendor
> > > >
> > > >
> > > > I’d like to propose:
> > > >
> > > >    1.
> > > >
> > > >    Interface delineation. Split off a “pure” persistence SPI that
> does
> > > not
> > > >    directly deal with grants or caching, but could be used by the
> grant
> > > >    manager and by caches in their respective contexts. Many of the
> > > >    PolarisMetaStoreManager sub-interfaces are not related to
> > persistence.
> > > > Once
> > > >    isolated, they will be outside the scope of this discussion.
> > > >    2.
> > > >
> > > >    Bootstrapping. This should probably be an external concern
> > implemented
> > > >    generically for any Persistence implementation.
> > > >    3.
> > > >
> > > >    Consistency guarantees. Catalog API implementations have to
> perform
> > > >    several changes as part of one logical transaction (e.g.
> multi-table
> > > >    commit). Several servers acting in a distributed system on the
> same
> > > > backend
> > > >    should know what consistency expectations they can have on the
> > > > Persistence
> > > >    layer in order to function correctly. I think these guarantees
> > should
> > > be
> > > >    stated explicitly in java or .md docs for the sake of clarity.
> > > >    4.
> > > >
> > > >    Transactions. I believe that it would be valuable to avoid
> > > specifically
> > > >    binding to the RDBMS transaction concept and if possible formulate
> > the
> > > >    Persistence SPI in a way that could be mapped to RDBMS as well as
> > to a
> > > >    NoSQL backend.
> > > >
> > > >
> > > > Please share your thoughts on this.
> > > >
> > > > Thanks,
> > > >
> > > > Dmitri.
> > > >
> > >
> >
>

Re: [DISCUSS] Polaris Persistence Contract

Reply via email to