On Fri, May 2, 2025 at 7:01 AM Dennis Huo <huoi...@gmail.com> wrote:

> Hello all,
>
> Following up from today's community sync where we were discussing 1.0
> blockers, I just wanted to finalize discussion on
> https://github.com/apache/polaris/issues/761 to check if everyone's
> aligned
> on the current state of things.
>
> My understanding is that the originally reported issues regarding
> concurrency problems for the EntityCache as a standalone class and its
> internal lack of enforcement of cohesion between the "byId" and "byName"
> lookups is not actually a concurrency issue for the overall system because
> the design of EntityCache today is simply tightly coupled to Resolver.java.


Hi Dennis

Indeed, most of the discussion focused on the correctness of the
EntityCache under concurrency.  In my experience working on the EntityCache
code, it is because that's where the reported issues happen.  All those
issues arise from the lack of cohesion between the internal by id and by
name internal caches:

   - Returning a different version of the entity depending on how it is
   accessed
   - Returning a different entity using the by-name lookup, in the event of
   renames
   - Incorrectly removing a different entity by name in the event of
   evictions/deletions and renames
   - Leaking memory in the by-name cache in the event of
   evictions/deletions and renames

The core of the problem comes from the combination of using two distinct
primary keys to access an entity (id and name) and the second key being
mutable (the entity can be renamed).


> In particular, the main way to reason about where the Resolver gets its own
> internal guarantees about monotonicity of increasing versions it sees is in
> the "bulkValidate" method (
>
> https://github.com/apache/polaris/blob/3603fa317ad8313f178c09137965bc0e5c7f256d/polaris-core/src/main/java/org/apache/polaris/core/persistence/resolver/Resolver.java#L545
> )
> where the "loadEntitiesChangeTracking" comes directly from the underlying
> persistence store to supply the "minVersions" subsequently requested from
> the EntityCache.
>

One maybe naive question though: the EntityCache cannot be trusted as a
regular Cache implementation, and all valid cache lookups must be preceded
by a database query.  Doesn't it mean that the EntityCache brings none of
the benefits one could reasonably expect from a cache but all of its
downside (eviction concerns, staleness, memory usage)?

I think we also agreed that using an IndexedCache to provide cohesion
> between byId and byName would be nice-to-have but in this situation not a
> strict requirement for correctness, and if we do encounter any challenges
> modifying the EntityCache to use an IndexedCache, we should probably take
> our time on it instead of rushing it.
>

nit: Isn't it rather that correctness is not needed by the Resolver code?

Because in my opinion, reworking the EntityCache is the only way to get
correctness. Either using an IndexedCache or a Cache+HashBiMap, as
currently is in https://github.com/apache/polaris/pull/1339.


> We discussed that a followup might be to change make EntityCache
> package-private in a separate package with Resolver to better make it clear
> that the EntityCache isn't intended to be used independently.
>
> For now though, are there any remaining concerns about removing the
> "1.0-blocker" label from https://github.com/apache/polaris/issues/761 ?
>

If the Resolver is able to deal with the correctness issues, then the only
concern that could possibly remain is the memory leak.  If should be rare
though, so would it make this qualify just as a major issue and not a
blocker for 1.0?

If the "blocker" qualifier remains, note that the proposed fix in
https://github.com/apache/polaris/pull/1339 is complete and passes all the
tests.  I just need to rebase `main` because if does not yet include
https://github.com/apache/polaris/pull/1193/files.

Cheers
--

Pierre

Reply via email to