Re: Being smarter about refreshing object graph

Andrus Adamchik Tue, 21 Jan 2025 11:07:34 -0800

Ah, I overlooked this one. Thanks for the reminder. In theory all combinations 
of prefetches should be valid, so def. a bug.


As to the overall caching architecture, the ORM can't work efficiently without 
cache. But the issue is that the cache structure is complex making it extremely 
hard to reason about as a user. Shared vs local; query vs objects; objects vs 
snapshots; read vs write operations (each using cache differently). Too many 
dimensions with unclear interaction with each other.  

I am still looking for a single simple model to be able to fully replace the 
current one. 

Andrus 


> On Jan 21, 2025, at 12:55 PM, John Huss <johnth...@gmail.com> wrote:
> 
> Thanks, that's helpful! The other prefetching problem I've had is when
> mixing *joint* prefetches with other types of prefetches (*disjoint* or
> *disjointById*), which is documented here:
> https://github.com/apache/cayenne/pull/624
> In that case the prefetched data is ignored and the relationship value
> appears as null when it isn't, which is *bad*.
> 
> From a higher level perspective, I think the default of fetching objects
> into the snapshot cache where they can live forever is a bad default. It
> trades correctness (freshness) for performance. I'd like to have better
> ways of determining which entities are eligible for the snapshot cache and
> how long they are allowed to be there before they are stale, and by
> default I wouldn't allow anything there except for objects in the local
> context.
> 
> There are also problems with caching even for the local context when the
> app gets low on memory - then prefetched objects or objects in the snapshot
> cache can become evicted and result in a ton of single row fetches as it
> re-resolves them one by one. That tradeoff may be worth it to avoid
> crashing, but it would be nice to have a more intentional decision around
> the behavior as the programmer.
> 
> On Thu, Jan 16, 2025 at 3:30 PM Andrus Adamchik <aadamc...@gmail.com> wrote:
> 
>> Hi there,
>> 
>> I wanted to share some findings on our object graph refresh algorithms.
>> 
>> For many years I've mostly relied on query cache to refresh data graphs,
>> almost never depending on on-demand faulting and the shared snapshot cache.
>> But recently I came across a few use cases that exposed pretty big holes in
>> our object graph management:
>> 
>> 1. https://issues.apache.org/jira/browse/CAY-2877
>> 
>> Here multiple somewhat unrelated queries instead of collaborating in
>> retrieving data, stomp on each other, invalidating other's prefetches.
>> 
>> 2. https://issues.apache.org/jira/browse/CAY-2878
>> 
>> When resolving a certain category of to-one relationships (optional PK to
>> PK), we run a query where we could've taken the object from the cache.
>> 
>> I probably wouldn't have easily identified #1, if #2 worked as expected,
>> as all those invalidated relationships would've been picked up
>> transparently from the cache. But that of course wouldn't have been very
>> efficient.
>> 
>> I suspect there may be more similar issues, but these are the ones I was
>> able to reproduce.
>> 
>> Andrus
>> 
>> 
>> 
>>

Re: Being smarter about refreshing object graph

Reply via email to