Re: [DISCUSS] Polaris Persistence Contract

Dmitri Bourlatchkov Fri, 06 Dec 2024 10:47:04 -0800

Very good point about business logic in PolarisMetaStoreManagerImpl.

I'd also add that the Resolve also mixes different concerns, specifically,
it appears to perform cache invalidation / synchronization as part of
object "resolution" phase, which also complicates reasoning about what
services expect from persistence.


Perhaps step 1 could be abstracting the caching layer in a way that
services do not 'see' cache APIs directly, but interact with some other
interface, which then can have multiple implementations: with and without
caching, with and without "remote" caching, etc. WDYT?

Thanks,
Dmitri.

On Fri, Dec 6, 2024 at 1:02 PM Michael Collado <collado.m...@gmail.com>
wrote:

> My intention when splitting up the PolarisMetaStoreManager interface was
> always to cut the ties between the persistence manager and the other
> responsibilities. For me, the first step seemed to break up the interface,
> then change the consumers to depend on the most specific interface needed
> to accomplish its tasks (e.g., depend on Grant Manager if it needed to read
> grant records).
>
> Unfortunately, however, without DI support, it's too hard to also break the
> PolarisMetaStoreManager inheritance on those other interfaces because we
> don't have a way of knowing that a specific instance of
> PolarisMetaStoreManager also implements the other interfaces (this is
> something I think even a partial CDI implementation can unblock for us).
>
> I 100% agree on splitting up the interfaces and isolate secrets-management
> and grant-management. The remote cache interface is a performance concern
> and, I think, should be underneath the higher-level interfaces.
>
> The transaction and consistency guarantees are harder, I think. Not every
> persistence layer will allow for transactions or for batch entity updates.
> Personally, I think designing the persistence interface to support
> transaction-like operations and batch updates will allow for different
> persistence implementations to operate within the constraints of the
> specific engine without tying the application to any particular details.
> Those engines that support a WAL or other form of transaction log can
> commit appropriately, whereas others may implement a "best-effort" approach
> for multi-entity updates.
>
> One big problem I see is that the PolarisMetaStoreManagerImpl itself isn't
> really a persistence layer, but an extension of the business logic. That's
> really where the reliance on a begin/commit transaction workflow is
> evident. I'd love to see the business logic pulled out of the metastore
> manager and see it become more of a pure persistence layer. The
> MetaStoreSession interface could be a hidden detail of some
> implementations, but wouldn't need to be accessible anywhere else.
>
> Anyway, I'm looking forward to some ideas on how we can support some
> specific NoSQL implementations better. Adding support for one or two more
> specific backends will help us highlight the broken assumptions we have
> around the persistence layer today.
>
> Mike
>
> On Tue, Dec 3, 2024 at 10:05 AM Eric Maynard <eric.w.mayn...@gmail.com>
> wrote:
>
> > I think this is a great idea. Even if we put aside the NoSQL / RDBMS
> point,
> > simply clarifying the roles & responsibilities of the persistence
> > interface(s) would be a welcome improvement.
> >
> > --EM
> >
> > On Tue, Dec 3, 2024 at 5:57 AM Dmitri Bourlatchkov
> > <dmitri.bourlatch...@dremio.com.invalid> wrote:
> >
> > > Hi All,
> > >
> > > I believe it was already discussed elsewhere that it is valuable to
> allow
> > > Apache Polaris to be extensible, and in particular extensible in how it
> > > interacts with its own Persistence backend (not to be confused with
> > Iceberg
> > > data storage).
> > >
> > > I’d like to formalize the expectations Polaris Core has on
> Persistence. I
> > > think it will be extremely valuable for contributors wishing to add
> > support
> > > for backends beyond the current EclipseLink implementation.
> > >
> > > Currently, the closest abstraction layer for Persistence appears to be
> > > PolarisMetaStoreManager, however this interface combines a few other
> > > interfaces, not directly related to persistence per se and having
> > different
> > > concerns:
> > >
> > >
> > >    -
> > >
> > >    The Grant Manager
> > >    -
> > >
> > >    Remote Cache
> > >    -
> > >
> > >    Secrets Manager
> > >    -
> > >
> > >    Credential Vendor
> > >
> > >
> > > I’d like to propose:
> > >
> > >    1.
> > >
> > >    Interface delineation. Split off a “pure” persistence SPI that does
> > not
> > >    directly deal with grants or caching, but could be used by the grant
> > >    manager and by caches in their respective contexts. Many of the
> > >    PolarisMetaStoreManager sub-interfaces are not related to
> persistence.
> > > Once
> > >    isolated, they will be outside the scope of this discussion.
> > >    2.
> > >
> > >    Bootstrapping. This should probably be an external concern
> implemented
> > >    generically for any Persistence implementation.
> > >    3.
> > >
> > >    Consistency guarantees. Catalog API implementations have to perform
> > >    several changes as part of one logical transaction (e.g. multi-table
> > >    commit). Several servers acting in a distributed system on the same
> > > backend
> > >    should know what consistency expectations they can have on the
> > > Persistence
> > >    layer in order to function correctly. I think these guarantees
> should
> > be
> > >    stated explicitly in java or .md docs for the sake of clarity.
> > >    4.
> > >
> > >    Transactions. I believe that it would be valuable to avoid
> > specifically
> > >    binding to the RDBMS transaction concept and if possible formulate
> the
> > >    Persistence SPI in a way that could be mapped to RDBMS as well as
> to a
> > >    NoSQL backend.
> > >
> > >
> > > Please share your thoughts on this.
> > >
> > > Thanks,
> > >
> > > Dmitri.
> > >
> >
>

Re: [DISCUSS] Polaris Persistence Contract

Reply via email to