On Fri, May 2, 2025 at 7:01 AM Dennis Huo <huoi...@gmail.com> wrote:
> Hello all, > > Following up from today's community sync where we were discussing 1.0 > blockers, I just wanted to finalize discussion on > https://github.com/apache/polaris/issues/761 to check if everyone's > aligned > on the current state of things. > > My understanding is that the originally reported issues regarding > concurrency problems for the EntityCache as a standalone class and its > internal lack of enforcement of cohesion between the "byId" and "byName" > lookups is not actually a concurrency issue for the overall system because > the design of EntityCache today is simply tightly coupled to Resolver.java. Hi Dennis Indeed, most of the discussion focused on the correctness of the EntityCache under concurrency. In my experience working on the EntityCache code, it is because that's where the reported issues happen. All those issues arise from the lack of cohesion between the internal by id and by name internal caches: - Returning a different version of the entity depending on how it is accessed - Returning a different entity using the by-name lookup, in the event of renames - Incorrectly removing a different entity by name in the event of evictions/deletions and renames - Leaking memory in the by-name cache in the event of evictions/deletions and renames The core of the problem comes from the combination of using two distinct primary keys to access an entity (id and name) and the second key being mutable (the entity can be renamed). > In particular, the main way to reason about where the Resolver gets its own > internal guarantees about monotonicity of increasing versions it sees is in > the "bulkValidate" method ( > > https://github.com/apache/polaris/blob/3603fa317ad8313f178c09137965bc0e5c7f256d/polaris-core/src/main/java/org/apache/polaris/core/persistence/resolver/Resolver.java#L545 > ) > where the "loadEntitiesChangeTracking" comes directly from the underlying > persistence store to supply the "minVersions" subsequently requested from > the EntityCache. > One maybe naive question though: the EntityCache cannot be trusted as a regular Cache implementation, and all valid cache lookups must be preceded by a database query. Doesn't it mean that the EntityCache brings none of the benefits one could reasonably expect from a cache but all of its downside (eviction concerns, staleness, memory usage)? I think we also agreed that using an IndexedCache to provide cohesion > between byId and byName would be nice-to-have but in this situation not a > strict requirement for correctness, and if we do encounter any challenges > modifying the EntityCache to use an IndexedCache, we should probably take > our time on it instead of rushing it. > nit: Isn't it rather that correctness is not needed by the Resolver code? Because in my opinion, reworking the EntityCache is the only way to get correctness. Either using an IndexedCache or a Cache+HashBiMap, as currently is in https://github.com/apache/polaris/pull/1339. > We discussed that a followup might be to change make EntityCache > package-private in a separate package with Resolver to better make it clear > that the EntityCache isn't intended to be used independently. > > For now though, are there any remaining concerns about removing the > "1.0-blocker" label from https://github.com/apache/polaris/issues/761 ? > If the Resolver is able to deal with the correctness issues, then the only concern that could possibly remain is the memory leak. If should be rare though, so would it make this qualify just as a major issue and not a blocker for 1.0? If the "blocker" qualifier remains, note that the proposed fix in https://github.com/apache/polaris/pull/1339 is complete and passes all the tests. I just need to rebase `main` because if does not yet include https://github.com/apache/polaris/pull/1193/files. Cheers -- Pierre