Hi Jörg,

My guess is that you are running up against the problem where the Model is
referencing its adaptable, directly or indirectly. In that situation, the
model would not be collectable because it is referenced more strongly than
by weak reference. The reference path of these might look like this:

Model Cache Holder (the Model Adapter Factory
(strong reference)
Model Cache
(soft reference)
Model
(strong reference)
Resource Resolver
(strong reference, maybe indirectly)
Resource [the adaptable]


The resource is strongly or softly referenced, possibly indirectly, making
it ineligible for collection on normal GC cycles. The resource & resolver,
will not be collected until after the model is collected. Since the model
is not collected until there is memory pressure, that would explain why you
are seeing this (expensive) GC behavior.

When memory pressure occurs, first the models (soft references) are
collected prior to the OOM, which releases both resolver (no longer
referenced via field in the model) and resource.

The quickest fix is to release the resource resolver at the end of the
model’s constructor or @PostConstruct [2]. Alternatively, you can eliminate
the cache=true, provided it does not negatively impact your application
performance.
Another option, though more involved, is that the entire caching impl could
be changed to better handle these kinds of reference loops, by putting the
cache in the adaptable itself (Resolver, Resource, Request, etc). Possible
good candidates of these are [4] or request attributes. [4] seems to be the
overall best candidate, especially since you can hook into it using
Closeable ([5]) improving the cache eviction even more.

As long as the object holding the cache is not referenced strongly outside
that reference loop (cache holder > cache > model > ... > cache holder),
then the loop's objects would be eligible for GC as soon as the cache
holder is eligible.

[1]:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/ref/SoftReference.html
[2]:
https://sling.apache.org/documentation/bundles/models.html#a-note-about-cache-true-and-using-the-self-injector
[3]: https://github.com/apache/sling-org-apache-sling-models-impl/pull/18
[4]:
https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L888
[5]:
https://github.com/apache/sling-org-apache-sling-api/blob/master/src/main/java/org/apache/sling/api/resource/ResourceResolver.java#L610

// Paul


On Tue, Jan 23, 2024 at 6:48 AM Jörg Hoh <jhoh...@googlemail.com.invalid>
wrote:

> Hi Sling community,
>
> I want to share a recent experience I had with Sling Models, Sling Model
> caching and Garbage Collection problems.
>
> I had a case, where an AEM instance had massive garbage collection
> problems, but no memory problems. We saw the regular sawtooth pattern in
> the heap consumption, but heavy GC activity (in a stop-the-world manner)
> almost constantly. But no OutOfMemory situation, there it's not a memory
> leak.
>
> I manually captured a heapdump and found a lot of Sling models being
> referenced by the Sling ModelAdapterFactory cache, and rechecking these
> model classes in detail I found them to specify "cache=true" in their
> @Model annotation.When these statements were removed, the situation looks
> completely different, and the garbage collection was normal again.
>
> I don't have a full explanation for this behavior yet. The Sling Models had
> a reference to a ResourceResolver (which was properly closed), but I assume
> that this reference somehow "disabled" the cleaning of the cache on major
> GCs (as its a WeakHashMap), but tied the collection of these models to the
> collection of the ResourceResolver objects, which have finalizers
> registered. And finalizers are only executed under memory pressure. Having
> this connetion might have led to the situation that the SlingModel objects
> were not disposed eagerly, but only alongside the finalizers in situation
> of high memory pressure in a "stop-the-world" situation.
>
> I try to get some more examples for that behavior; but I am not sure of the
> caching of Sling Models as-is is something we should continue to use. This
> case was quite hard to crack, and I don't know if/how we can avoid such a
> situation by design.
>
> Jörg
>
> --
> https://cqdump.joerghoh.de
>

Reply via email to