Very good point about business logic in PolarisMetaStoreManagerImpl. I'd also add that the Resolve also mixes different concerns, specifically, it appears to perform cache invalidation / synchronization as part of object "resolution" phase, which also complicates reasoning about what services expect from persistence.
Perhaps step 1 could be abstracting the caching layer in a way that services do not 'see' cache APIs directly, but interact with some other interface, which then can have multiple implementations: with and without caching, with and without "remote" caching, etc. WDYT? Thanks, Dmitri. On Fri, Dec 6, 2024 at 1:02 PM Michael Collado <collado.m...@gmail.com> wrote: > My intention when splitting up the PolarisMetaStoreManager interface was > always to cut the ties between the persistence manager and the other > responsibilities. For me, the first step seemed to break up the interface, > then change the consumers to depend on the most specific interface needed > to accomplish its tasks (e.g., depend on Grant Manager if it needed to read > grant records). > > Unfortunately, however, without DI support, it's too hard to also break the > PolarisMetaStoreManager inheritance on those other interfaces because we > don't have a way of knowing that a specific instance of > PolarisMetaStoreManager also implements the other interfaces (this is > something I think even a partial CDI implementation can unblock for us). > > I 100% agree on splitting up the interfaces and isolate secrets-management > and grant-management. The remote cache interface is a performance concern > and, I think, should be underneath the higher-level interfaces. > > The transaction and consistency guarantees are harder, I think. Not every > persistence layer will allow for transactions or for batch entity updates. > Personally, I think designing the persistence interface to support > transaction-like operations and batch updates will allow for different > persistence implementations to operate within the constraints of the > specific engine without tying the application to any particular details. > Those engines that support a WAL or other form of transaction log can > commit appropriately, whereas others may implement a "best-effort" approach > for multi-entity updates. > > One big problem I see is that the PolarisMetaStoreManagerImpl itself isn't > really a persistence layer, but an extension of the business logic. That's > really where the reliance on a begin/commit transaction workflow is > evident. I'd love to see the business logic pulled out of the metastore > manager and see it become more of a pure persistence layer. The > MetaStoreSession interface could be a hidden detail of some > implementations, but wouldn't need to be accessible anywhere else. > > Anyway, I'm looking forward to some ideas on how we can support some > specific NoSQL implementations better. Adding support for one or two more > specific backends will help us highlight the broken assumptions we have > around the persistence layer today. > > Mike > > On Tue, Dec 3, 2024 at 10:05 AM Eric Maynard <eric.w.mayn...@gmail.com> > wrote: > > > I think this is a great idea. Even if we put aside the NoSQL / RDBMS > point, > > simply clarifying the roles & responsibilities of the persistence > > interface(s) would be a welcome improvement. > > > > --EM > > > > On Tue, Dec 3, 2024 at 5:57 AM Dmitri Bourlatchkov > > <dmitri.bourlatch...@dremio.com.invalid> wrote: > > > > > Hi All, > > > > > > I believe it was already discussed elsewhere that it is valuable to > allow > > > Apache Polaris to be extensible, and in particular extensible in how it > > > interacts with its own Persistence backend (not to be confused with > > Iceberg > > > data storage). > > > > > > I’d like to formalize the expectations Polaris Core has on > Persistence. I > > > think it will be extremely valuable for contributors wishing to add > > support > > > for backends beyond the current EclipseLink implementation. > > > > > > Currently, the closest abstraction layer for Persistence appears to be > > > PolarisMetaStoreManager, however this interface combines a few other > > > interfaces, not directly related to persistence per se and having > > different > > > concerns: > > > > > > > > > - > > > > > > The Grant Manager > > > - > > > > > > Remote Cache > > > - > > > > > > Secrets Manager > > > - > > > > > > Credential Vendor > > > > > > > > > I’d like to propose: > > > > > > 1. > > > > > > Interface delineation. Split off a “pure” persistence SPI that does > > not > > > directly deal with grants or caching, but could be used by the grant > > > manager and by caches in their respective contexts. Many of the > > > PolarisMetaStoreManager sub-interfaces are not related to > persistence. > > > Once > > > isolated, they will be outside the scope of this discussion. > > > 2. > > > > > > Bootstrapping. This should probably be an external concern > implemented > > > generically for any Persistence implementation. > > > 3. > > > > > > Consistency guarantees. Catalog API implementations have to perform > > > several changes as part of one logical transaction (e.g. multi-table > > > commit). Several servers acting in a distributed system on the same > > > backend > > > should know what consistency expectations they can have on the > > > Persistence > > > layer in order to function correctly. I think these guarantees > should > > be > > > stated explicitly in java or .md docs for the sake of clarity. > > > 4. > > > > > > Transactions. I believe that it would be valuable to avoid > > specifically > > > binding to the RDBMS transaction concept and if possible formulate > the > > > Persistence SPI in a way that could be mapped to RDBMS as well as > to a > > > NoSQL backend. > > > > > > > > > Please share your thoughts on this. > > > > > > Thanks, > > > > > > Dmitri. > > > > > >