Re: [DISCUSS] Polaris Persistence Contract

Michael Collado Fri, 06 Dec 2024 10:02:18 -0800

My intention when splitting up the PolarisMetaStoreManager interface was
always to cut the ties between the persistence manager and the other
responsibilities. For me, the first step seemed to break up the interface,
then change the consumers to depend on the most specific interface needed
to accomplish its tasks (e.g., depend on Grant Manager if it needed to read
grant records).

Unfortunately, however, without DI support, it's too hard to also break the
PolarisMetaStoreManager inheritance on those other interfaces because we
don't have a way of knowing that a specific instance of
PolarisMetaStoreManager also implements the other interfaces (this is
something I think even a partial CDI implementation can unblock for us).

I 100% agree on splitting up the interfaces and isolate secrets-management
and grant-management. The remote cache interface is a performance concern
and, I think, should be underneath the higher-level interfaces.

The transaction and consistency guarantees are harder, I think. Not every
persistence layer will allow for transactions or for batch entity updates.
Personally, I think designing the persistence interface to support
transaction-like operations and batch updates will allow for different
persistence implementations to operate within the constraints of the
specific engine without tying the application to any particular details.
Those engines that support a WAL or other form of transaction log can
commit appropriately, whereas others may implement a "best-effort" approach
for multi-entity updates.

One big problem I see is that the PolarisMetaStoreManagerImpl itself isn't
really a persistence layer, but an extension of the business logic. That's
really where the reliance on a begin/commit transaction workflow is
evident. I'd love to see the business logic pulled out of the metastore
manager and see it become more of a pure persistence layer. The
MetaStoreSession interface could be a hidden detail of some
implementations, but wouldn't need to be accessible anywhere else.

Anyway, I'm looking forward to some ideas on how we can support some
specific NoSQL implementations better. Adding support for one or two more
specific backends will help us highlight the broken assumptions we have
around the persistence layer today.

Mike

On Tue, Dec 3, 2024 at 10:05 AM Eric Maynard <eric.w.mayn...@gmail.com>
wrote:

> I think this is a great idea. Even if we put aside the NoSQL / RDBMS point,
> simply clarifying the roles & responsibilities of the persistence
> interface(s) would be a welcome improvement.
>
> --EM
>
> On Tue, Dec 3, 2024 at 5:57 AM Dmitri Bourlatchkov
> <dmitri.bourlatch...@dremio.com.invalid> wrote:
>
> > Hi All,
> >
> > I believe it was already discussed elsewhere that it is valuable to allow
> > Apache Polaris to be extensible, and in particular extensible in how it
> > interacts with its own Persistence backend (not to be confused with
> Iceberg
> > data storage).
> >
> > I’d like to formalize the expectations Polaris Core has on Persistence. I
> > think it will be extremely valuable for contributors wishing to add
> support
> > for backends beyond the current EclipseLink implementation.
> >
> > Currently, the closest abstraction layer for Persistence appears to be
> > PolarisMetaStoreManager, however this interface combines a few other
> > interfaces, not directly related to persistence per se and having
> different
> > concerns:
> >
> >
> >    -
> >
> >    The Grant Manager
> >    -
> >
> >    Remote Cache
> >    -
> >
> >    Secrets Manager
> >    -
> >
> >    Credential Vendor
> >
> >
> > I’d like to propose:
> >
> >    1.
> >
> >    Interface delineation. Split off a “pure” persistence SPI that does
> not
> >    directly deal with grants or caching, but could be used by the grant
> >    manager and by caches in their respective contexts. Many of the
> >    PolarisMetaStoreManager sub-interfaces are not related to persistence.
> > Once
> >    isolated, they will be outside the scope of this discussion.
> >    2.
> >
> >    Bootstrapping. This should probably be an external concern implemented
> >    generically for any Persistence implementation.
> >    3.
> >
> >    Consistency guarantees. Catalog API implementations have to perform
> >    several changes as part of one logical transaction (e.g. multi-table
> >    commit). Several servers acting in a distributed system on the same
> > backend
> >    should know what consistency expectations they can have on the
> > Persistence
> >    layer in order to function correctly. I think these guarantees should
> be
> >    stated explicitly in java or .md docs for the sake of clarity.
> >    4.
> >
> >    Transactions. I believe that it would be valuable to avoid
> specifically
> >    binding to the RDBMS transaction concept and if possible formulate the
> >    Persistence SPI in a way that could be mapped to RDBMS as well as to a
> >    NoSQL backend.
> >
> >
> > Please share your thoughts on this.
> >
> > Thanks,
> >
> > Dmitri.
> >
>

Re: [DISCUSS] Polaris Persistence Contract

Reply via email to