Thanks for the thoughtful discussion. Given that no consistency issue and a
rare memory leak, I think we could remove #761 as a 1.0 blocker. We can
continue to work on #1339, which is a good improvement, as well as
providing better structure and clear message that EntityCache and Resolver
should be used together for correctness.

Yufei


On Fri, May 2, 2025 at 1:25 AM Pierre Laporte <pie...@pingtimeout.fr> wrote:

> On Fri, May 2, 2025 at 7:01 AM Dennis Huo <huoi...@gmail.com> wrote:
>
> > Hello all,
> >
> > Following up from today's community sync where we were discussing 1.0
> > blockers, I just wanted to finalize discussion on
> > https://github.com/apache/polaris/issues/761 to check if everyone's
> > aligned
> > on the current state of things.
> >
> > My understanding is that the originally reported issues regarding
> > concurrency problems for the EntityCache as a standalone class and its
> > internal lack of enforcement of cohesion between the "byId" and "byName"
> > lookups is not actually a concurrency issue for the overall system
> because
> > the design of EntityCache today is simply tightly coupled to
> Resolver.java.
>
>
> Hi Dennis
>
> Indeed, most of the discussion focused on the correctness of the
> EntityCache under concurrency.  In my experience working on the EntityCache
> code, it is because that's where the reported issues happen.  All those
> issues arise from the lack of cohesion between the internal by id and by
> name internal caches:
>
>    - Returning a different version of the entity depending on how it is
>    accessed
>    - Returning a different entity using the by-name lookup, in the event of
>    renames
>    - Incorrectly removing a different entity by name in the event of
>    evictions/deletions and renames
>    - Leaking memory in the by-name cache in the event of
>    evictions/deletions and renames
>
> The core of the problem comes from the combination of using two distinct
> primary keys to access an entity (id and name) and the second key being
> mutable (the entity can be renamed).
>
>
> > In particular, the main way to reason about where the Resolver gets its
> own
> > internal guarantees about monotonicity of increasing versions it sees is
> in
> > the "bulkValidate" method (
> >
> >
> https://github.com/apache/polaris/blob/3603fa317ad8313f178c09137965bc0e5c7f256d/polaris-core/src/main/java/org/apache/polaris/core/persistence/resolver/Resolver.java#L545
> > )
> > where the "loadEntitiesChangeTracking" comes directly from the underlying
> > persistence store to supply the "minVersions" subsequently requested from
> > the EntityCache.
> >
>
> One maybe naive question though: the EntityCache cannot be trusted as a
> regular Cache implementation, and all valid cache lookups must be preceded
> by a database query.  Doesn't it mean that the EntityCache brings none of
> the benefits one could reasonably expect from a cache but all of its
> downside (eviction concerns, staleness, memory usage)?
>
> I think we also agreed that using an IndexedCache to provide cohesion
> > between byId and byName would be nice-to-have but in this situation not a
> > strict requirement for correctness, and if we do encounter any challenges
> > modifying the EntityCache to use an IndexedCache, we should probably take
> > our time on it instead of rushing it.
> >
>
> nit: Isn't it rather that correctness is not needed by the Resolver code?
>
> Because in my opinion, reworking the EntityCache is the only way to get
> correctness. Either using an IndexedCache or a Cache+HashBiMap, as
> currently is in https://github.com/apache/polaris/pull/1339.
>
>
> > We discussed that a followup might be to change make EntityCache
> > package-private in a separate package with Resolver to better make it
> clear
> > that the EntityCache isn't intended to be used independently.
> >
> > For now though, are there any remaining concerns about removing the
> > "1.0-blocker" label from https://github.com/apache/polaris/issues/761 ?
> >
>
> If the Resolver is able to deal with the correctness issues, then the only
> concern that could possibly remain is the memory leak.  If should be rare
> though, so would it make this qualify just as a major issue and not a
> blocker for 1.0?
>
> If the "blocker" qualifier remains, note that the proposed fix in
> https://github.com/apache/polaris/pull/1339 is complete and passes all the
> tests.  I just need to rebase `main` because if does not yet include
> https://github.com/apache/polaris/pull/1193/files.
>
> Cheers
> --
>
> Pierre
>

Reply via email to