That sounds great. It seems like a hard task to get right, but it’s definitely the right thing to do.
Mike On Fri, Dec 6, 2024 at 10:47 AM Dmitri Bourlatchkov <dmitri.bourlatch...@dremio.com.invalid> wrote: > Very good point about business logic in PolarisMetaStoreManagerImpl. > > I'd also add that the Resolve also mixes different concerns, specifically, > it appears to perform cache invalidation / synchronization as part of > object "resolution" phase, which also complicates reasoning about what > services expect from persistence. > > Perhaps step 1 could be abstracting the caching layer in a way that > services do not 'see' cache APIs directly, but interact with some other > interface, which then can have multiple implementations: with and without > caching, with and without "remote" caching, etc. WDYT? > > Thanks, > Dmitri. > > On Fri, Dec 6, 2024 at 1:02 PM Michael Collado <collado.m...@gmail.com> > wrote: > > > My intention when splitting up the PolarisMetaStoreManager interface was > > always to cut the ties between the persistence manager and the other > > responsibilities. For me, the first step seemed to break up the > interface, > > then change the consumers to depend on the most specific interface needed > > to accomplish its tasks (e.g., depend on Grant Manager if it needed to > read > > grant records). > > > > Unfortunately, however, without DI support, it's too hard to also break > the > > PolarisMetaStoreManager inheritance on those other interfaces because we > > don't have a way of knowing that a specific instance of > > PolarisMetaStoreManager also implements the other interfaces (this is > > something I think even a partial CDI implementation can unblock for us). > > > > I 100% agree on splitting up the interfaces and isolate > secrets-management > > and grant-management. The remote cache interface is a performance concern > > and, I think, should be underneath the higher-level interfaces. > > > > The transaction and consistency guarantees are harder, I think. Not every > > persistence layer will allow for transactions or for batch entity > updates. > > Personally, I think designing the persistence interface to support > > transaction-like operations and batch updates will allow for different > > persistence implementations to operate within the constraints of the > > specific engine without tying the application to any particular details. > > Those engines that support a WAL or other form of transaction log can > > commit appropriately, whereas others may implement a "best-effort" > approach > > for multi-entity updates. > > > > One big problem I see is that the PolarisMetaStoreManagerImpl itself > isn't > > really a persistence layer, but an extension of the business logic. > That's > > really where the reliance on a begin/commit transaction workflow is > > evident. I'd love to see the business logic pulled out of the metastore > > manager and see it become more of a pure persistence layer. The > > MetaStoreSession interface could be a hidden detail of some > > implementations, but wouldn't need to be accessible anywhere else. > > > > Anyway, I'm looking forward to some ideas on how we can support some > > specific NoSQL implementations better. Adding support for one or two more > > specific backends will help us highlight the broken assumptions we have > > around the persistence layer today. > > > > Mike > > > > On Tue, Dec 3, 2024 at 10:05 AM Eric Maynard <eric.w.mayn...@gmail.com> > > wrote: > > > > > I think this is a great idea. Even if we put aside the NoSQL / RDBMS > > point, > > > simply clarifying the roles & responsibilities of the persistence > > > interface(s) would be a welcome improvement. > > > > > > --EM > > > > > > On Tue, Dec 3, 2024 at 5:57 AM Dmitri Bourlatchkov > > > <dmitri.bourlatch...@dremio.com.invalid> wrote: > > > > > > > Hi All, > > > > > > > > I believe it was already discussed elsewhere that it is valuable to > > allow > > > > Apache Polaris to be extensible, and in particular extensible in how > it > > > > interacts with its own Persistence backend (not to be confused with > > > Iceberg > > > > data storage). > > > > > > > > I’d like to formalize the expectations Polaris Core has on > > Persistence. I > > > > think it will be extremely valuable for contributors wishing to add > > > support > > > > for backends beyond the current EclipseLink implementation. > > > > > > > > Currently, the closest abstraction layer for Persistence appears to > be > > > > PolarisMetaStoreManager, however this interface combines a few other > > > > interfaces, not directly related to persistence per se and having > > > different > > > > concerns: > > > > > > > > > > > > - > > > > > > > > The Grant Manager > > > > - > > > > > > > > Remote Cache > > > > - > > > > > > > > Secrets Manager > > > > - > > > > > > > > Credential Vendor > > > > > > > > > > > > I’d like to propose: > > > > > > > > 1. > > > > > > > > Interface delineation. Split off a “pure” persistence SPI that > does > > > not > > > > directly deal with grants or caching, but could be used by the > grant > > > > manager and by caches in their respective contexts. Many of the > > > > PolarisMetaStoreManager sub-interfaces are not related to > > persistence. > > > > Once > > > > isolated, they will be outside the scope of this discussion. > > > > 2. > > > > > > > > Bootstrapping. This should probably be an external concern > > implemented > > > > generically for any Persistence implementation. > > > > 3. > > > > > > > > Consistency guarantees. Catalog API implementations have to > perform > > > > several changes as part of one logical transaction (e.g. > multi-table > > > > commit). Several servers acting in a distributed system on the > same > > > > backend > > > > should know what consistency expectations they can have on the > > > > Persistence > > > > layer in order to function correctly. I think these guarantees > > should > > > be > > > > stated explicitly in java or .md docs for the sake of clarity. > > > > 4. > > > > > > > > Transactions. I believe that it would be valuable to avoid > > > specifically > > > > binding to the RDBMS transaction concept and if possible formulate > > the > > > > Persistence SPI in a way that could be mapped to RDBMS as well as > > to a > > > > NoSQL backend. > > > > > > > > > > > > Please share your thoughts on this. > > > > > > > > Thanks, > > > > > > > > Dmitri. > > > > > > > > > >