Hi Vignesh, > Is the lower-level CacheBackend considered the intended caching > strategy for NoSQL, [...]
Yes. > or is there still interest in also supporting InMemoryEntityCache > for consistency with the other backends? I'm not sure the term "consistency" is applicable here. One can certainly put the Entity Cache on top of NoSQL Persistence, however, the underlying impl. is very different between NoSQL and JDBC. So, even with the Entity Cache in the call path, I do not think we can talk about consistency in impl. between NoSQL and JDBC. What does the Entity Cache achieve? IMHO, it prevents multiple database lookups at the Persistence layer issued by the Resolver. The NoSQL backend cache achieves the same effect. Do you see another effect of the Entity Cache that may be worth implementing on the NoSQL call paths? > Are there any benchmarks or hit-rate numbers comparing the > NoSQL persistence cache vs the JDBC/TreeMap entity cache? I'm not sure it is technically possible to compare the performance of the Entity Cache vs. the NoSQL backend cache in isolation. Both caches work on different call paths. I believe meaningful comparisons are possible only at the common API level, which is the REST Catalog API. Pierre created a nice benchmarking tool for that [1]. Unfortunately, it does not look like anyone is available these days to run those benchmarks with scientific rigour :) If you're interested, please do make such a comparison and we can certainly discuss this in more details. This is probably going to be an iterative process. I'm posting some preliminary thoughts below. Attention to the env. setup and data collection / analysis is going to be essential to make the JDBC vs. NoSQL comparison meaningful. I'd suggest creating the testbed so that all resources are utilized well below their limits (Network, Disk, CPU, Memory) and compare response times. Another approach is to load the system until first failure and compare saturated requests per second. [1] https://github.com/apache/polaris-tools/tree/main/benchmarks Cheers, Dmitri. On Thu, Jun 25, 2026 at 8:17 AM vignesh a <[email protected]> wrote: > Hi Dmitri, > > Thanks for the pointer. *Now understand the split.* > > At the PolarisMetaStoreManager / Resolver layer, my original observation > holds: > NoSqlMetaStoreManagerFactory.getOrCreateEntityCache() returns null, and > NoSqlMetaStoreManager doesn't implement the change-tracking methods that > InMemoryEntityCache needs. So Resolver bypasses the entity cache that JDBC > and > TreeMap backends use. > > However, as you noted, NoSQL has caching one level down. The per-realm > Persistence > is wrapped by PersistenceCacheDecorator → CachingPersistenceImpl backed by > CaffeineCacheBackend (enabled by default via > polaris.persistence.cache.enable). > This cache intercepts fetches and reference lookups and invalidates on > writes. > > A couple of questions before I follow up on #4874: > > Is the lower-level CacheBackend considered the intended caching strategy > for NoSQL, > or is there still interest in also supporting InMemoryEntityCache for > consistency > with the other backends? > > Are there any benchmarks or hit-rate numbers comparing the NoSQL > persistence cache > vs the JDBC/TreeMap entity cache? It would be useful to know how different > the > behavior is under load. > > If the current design is intentional and performs well, I'll update the > issue with > a summary and possibly send a small docs PR clarifying the NoSQL caching > architecture. > > Cheers, > Vignesh > > On Wed, 24 Jun 2026 at 04:30, Dmitri Bourlatchkov <[email protected]> > wrote: > > > Hi Vignesh, > > > > NoSQL Persistence has caching at a different level [1]. Not every > > Persistence SPI call hits the database. > > > > [1] > > > > > https://github.com/apache/polaris/blob/c6d966d6a701e356284671d81d5ce1af94bf8e7e/persistence/nosql/persistence/api/src/main/java/org/apache/polaris/persistence/nosql/api/cache/CacheBackend.java#L35 > > > > Cheers, > > Dmitri. > > > > On Tue, Jun 23, 2026 at 4:37 PM vignesh a <[email protected]> > wrote: > > > > > Hi all, > > > > > > I opened GitHub issue #4874 < > > https://github.com/apache/polaris/issues/4874 > > > > > > > after noticing something while reading through the metastore > > > implementations, and I wanted to get some feedback before diving into a > > PR. > > > > > > From what I can tell, the NoSQL backend currently doesn't use the > entity > > > cache at all. Since NoSqlMetaStoreManager doesn't implement change > > > tracking, NoSqlMetaStoreManagerFactory ends up returning null instead > of > > > creating an InMemoryEntityCache. > > > > > > That means every Resolver operation - principal lookups, catalog > > > resolution, privilege checks, location validation, and so on - goes > > > directly to the backing store. By comparison, the JDBC and in-memory > > > TreeMap implementations both benefit from the existing cache. > > > > > > The details are in issue #4874 > > > <https://github.com/apache/polaris/issues/4874>, but I was curious > about > > > the intent here. > > > > > > A few questions: > > > > > > - > > > > > > Is this a known limitation, or is it something that simply hasn't > been > > > addressed yet? > > > - > > > > > > Is the expected long-term solution to add change tracking support > for > > > NoSQL? > > > - > > > > > > Has anyone considered a lighter-weight approach for NoSQL caching, > or > > > are there consistency concerns that make that undesirable? > > > - > > > > > > More generally, should we expect similar performance characteristics > > > across the supported metastore backends, or is this difference > > > intentional? > > > > > > The NoSQL backend is a supported production backend, so the lack of > > caching > > > stood out to me as a potentially significant behavioral difference > rather > > > than just an implementation detail. > > > > > > I'd appreciate any context before I spend time exploring solutions. > > > > > > Thanks, > > > > > > Vignesh > > > > > >
