[GitHub] [solr] bruno-roustant commented on a diff in pull request #1155: [SOLR-16497] Allow finer grained locking in SolrCores

GitBox Thu, 08 Dec 2022 03:08:37 -0800


bruno-roustant commented on code in PR #1155:
URL: https://github.com/apache/solr/pull/1155#discussion_r1043152924



##########
solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java:
##########
@@ -47,6 +49,14 @@ public class TransientSolrCoreCacheDefault extends 
TransientSolrCoreCache {
    */
   protected final Cache<String, SolrCore> transientCores;
 
+  /**
+   * This explicitly models capacity overflow of transientCores with cores 
that are still in use. A
+   * cores cache which holds cores evicted from transientCores because of 
limited size but which
+   * shall remain cached because they are still referenced or because they are 
part of an ongoing
+   * operation (pending ops).
+   */
+  protected final Cache<String, SolrCore> overflowCores;

Review Comment:
   As I understand, the cores evicted from the cache but still in use are put 
in this overflowCores. There is a hook called when the core is closed, but 
there is no action enqueued to close the core when possible. So the core will 
remain open until it is closed explicitly. This overflowCores cache can grow, 
but there is no action taken when it reaches it size limit. The cores evicted 
from overflowCores are not counted anymore.
   
   The PR description says _"a separate thread will schedule a ping-pong 
behaviour in the eviction handler on a full cache with SolrCores still 
referenced"_. Could you be more specific?
   Did you see this ping pong behavior, with a core being evicted, then put 
back, then evicted again, and so on?
   
   I believe we should not have this additional overflowCores, which brings 
complexity and fragility. We should either find a solution to work with the 
Caffeine cache transientCores (see my question below), or we should use another 
cache.



##########
solr/core/src/java/org/apache/solr/core/SolrCores.java:
##########
@@ -555,23 +658,41 @@ public boolean isCoreLoading(String name) {
   }
 
   public void queueCoreToClose(SolrCore coreToClose) {
-    synchronized (modifyLock) {
+    READ_WRITE_LOCK.writeLock().lock();
+    try {
       pendingCloses.add(coreToClose); // Essentially just queue this core up 
for closing.
-      modifyLock.notifyAll(); // Wakes up closer thread too
+      WRITE_LOCK_CONDITION.signalAll(); // Wakes up closer thread too
+    } finally {
+      READ_WRITE_LOCK.writeLock().unlock();
     }
   }
 
   /**
    * @return the cache holding the transient cores; never null.
    */
   public TransientSolrCoreCache getTransientCacheHandler() {
-    synchronized (modifyLock) {
-      if (transientSolrCoreCacheFactory == null) {
-        throw new SolrException(
-            SolrException.ErrorCode.SERVER_ERROR,
-            getClass().getName() + " not loaded; call load() before using it");
-      }
-      return transientSolrCoreCacheFactory.getTransientSolrCoreCache();
+    READ_WRITE_LOCK.writeLock().lock();
+    try {
+      return getInternalTransientCacheHandler();
+    } finally {
+      READ_WRITE_LOCK.writeLock().unlock();
+    }
+  }
+
+  /**
+   * Explicitly does not use a lock to provide the flexibility to choose 
between a read-lock and a
+   * write-lock before calling this method. Choosing a lock and locking before 
calling this method
+   * is mandatory. Note: Using always a write-lock would be costly. Using 
always a read-lock would
+   * block all calls which already hold a write-lock.
+   *
+   * @return the cache holding the transient cores; never null.
+   */
+  private TransientSolrCoreCache getInternalTransientCacheHandler() {

Review Comment:
   The javadoc should state that either lock must be hold when calling this 
method.
   Would it be clearer to have this method named getTransientCacheWithLock()?



##########
solr/core/src/java/org/apache/solr/core/SolrCores.java:
##########
@@ -555,23 +658,41 @@ public boolean isCoreLoading(String name) {
   }
 
   public void queueCoreToClose(SolrCore coreToClose) {
-    synchronized (modifyLock) {
+    READ_WRITE_LOCK.writeLock().lock();
+    try {
       pendingCloses.add(coreToClose); // Essentially just queue this core up 
for closing.
-      modifyLock.notifyAll(); // Wakes up closer thread too
+      WRITE_LOCK_CONDITION.signalAll(); // Wakes up closer thread too
+    } finally {
+      READ_WRITE_LOCK.writeLock().unlock();
     }
   }
 
   /**
    * @return the cache holding the transient cores; never null.
    */
   public TransientSolrCoreCache getTransientCacheHandler() {
-    synchronized (modifyLock) {
-      if (transientSolrCoreCacheFactory == null) {
-        throw new SolrException(
-            SolrException.ErrorCode.SERVER_ERROR,
-            getClass().getName() + " not loaded; call load() before using it");
-      }
-      return transientSolrCoreCacheFactory.getTransientSolrCoreCache();
+    READ_WRITE_LOCK.writeLock().lock();

Review Comment:
   I think this method should take the read lock, and document it.
   Anyway the caller would need to take the write lock to modify it outside of 
this method. So this method does not have to enforce the write lock.



##########
solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java:
##########
@@ -70,10 +80,15 @@ public TransientSolrCoreCacheDefault(CoreContainer 
coreContainer) {
     Caffeine<String, SolrCore> transientCoresCacheBuilder =
         Caffeine.newBuilder()
             .initialCapacity(initialCapacity)
-            // Use the current thread to queue evicted cores for closing. This 
ensures the
-            // cache max size is respected (with a different thread the max 
size would be
-            // respected asynchronously only eventually).
-            .executor(Runnable::run)
+            // Do NOT use the current thread to queue evicted cores for 
closing. Although this
+            // ensures the cache max size is respected only eventually, it 
should be a very
+            // short period of time until the cache maintenance task kicks in.
+            // The onEvict method needs the writeLock from SolrCores to 
operate safely.
+            // However, the current thread most probably has acquired a 
read-lock already
+            // somewhere up the call stack and would deadlock.
+            // Note that Caffeine cache has an internal maintenance task 
rescheduling logic which
+            // explicitly checks on the common pool.
+            .executor(ForkJoinPool.commonPool())

Review Comment:
   @ben-manes we had a discussion on this subject some time ago. You told us 
about a way to pin entries. But in our case here, the is-in-use status of an 
entry is transient, we cannot change the weight of many entries each time they 
are used.
   So we decided to un-evict (put back) the entry to the cache if it is in-use. 
But we see now the limit of the approach when we have to manage a read lock and 
a write lock, and a different thread to put back.
   
   The solution I see would be to have an eviction approver (hard no) before 
the entry is selected for eviction (so we prefer to not evict rather than 
putting back). In this case we accept the O(n) scan for the entry selection, 
and we also accept that the cache size exceeds the limit, knowing that Caffeine 
maintenance tasks will kick periodically to try to evict entries again later. 
Would that be a possible option for Caffeine?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [solr] bruno-roustant commented on a diff in pull request #1155: [SOLR-16497] Allow finer grained locking in SolrCores

Reply via email to