mpetris commented on code in PR #1155:
URL: https://github.com/apache/solr/pull/1155#discussion_r1045286686
##########
solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java:
##########
@@ -47,6 +49,14 @@ public class TransientSolrCoreCacheDefault extends
TransientSolrCoreCache {
*/
protected final Cache<String, SolrCore> transientCores;
+ /**
+ * This explicitly models capacity overflow of transientCores with cores
that are still in use. A
+ * cores cache which holds cores evicted from transientCores because of
limited size but which
+ * shall remain cached because they are still referenced or because they are
part of an ongoing
+ * operation (pending ops).
+ */
+ protected final Cache<String, SolrCore> overflowCores;
Review Comment:
> The PR description says _"a separate thread will schedule a ping-pong
behaviour in the eviction handler on a full cache with SolrCores still
referenced"_. Could you be more specific?
> Did you see this ping pong behavior, with a core being evicted, then put
back, then evicted again, and so on?
Since we don't use transient cores in our environment, I tested the
behaviour via the tests in
[TestLazyCores](https://github.com/apache/solr/blob/main/solr/core/src/test/org/apache/solr/core/TestLazyCores.java).
I observed the ping-pong behaviour with the separation in read and write locks
in place in SolrCores and still with the original behaviour in the
TransientSolrCoreCacheDefault but with the difference that the onEvict() method
uses the new write lock from SolrCores and the executor of the Caffeine cache
is set to the ForkJoinPool.commonPool() (to avoid the deadlock when the
Caffeine maintenance task and the eviction happens during a get on the cache
holding a read lock already).
With this scenario the
[testCreateTransientFromAdmin](https://github.com/apache/solr/blob/37c34664bab9a5d6df9ba6f3544d09432845aa51/solr/core/src/test/org/apache/solr/core/TestLazyCores.java#L486)
test shows, [after holding references to five
cores](https://github.com/apache/solr/blob/37c34664bab9a5d6df9ba6f3544d09432845aa51/solr/core/src/test/org/apache/solr/core/TestLazyCores.java#L508)
with a max size of four, that the ForkJoinPool.commonPool() based Caffeine
maintenance task tries to evict cores that get put back in by `onEvict()` over
and over again until the first core gets closed in the
[TestThread](https://github.com/apache/solr/blob/37c34664bab9a5d6df9ba6f3544d09432845aa51/solr/core/src/test/org/apache/solr/core/TestLazyCores.java#L545).
So, yes this behaviour is transient but it seem to be present as long as the
max size is exceeded. Not sure how long this can be but it felt bad to let the
maintenance task go crazy during that time. That's why we came up with the
overflowCores structure.
Note that it might be different when the executor is set to a custom pool.
In Caffeine's BoundedLocalCache the scheduling of the maintenance task happens
more frequently when the ForkJoinPool.commonPool() is used. But it seemed to
make things even more complicated introducing a custom pool so we didn't
explore that road.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]