bruno-roustant commented on code in PR #1155:
URL: https://github.com/apache/solr/pull/1155#discussion_r1043152924
##########
solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java:
##########
@@ -47,6 +49,14 @@ public class TransientSolrCoreCacheDefault extends
TransientSolrCoreCache {
*/
protected final Cache<String, SolrCore> transientCores;
+ /**
+ * This explicitly models capacity overflow of transientCores with cores
that are still in use. A
+ * cores cache which holds cores evicted from transientCores because of
limited size but which
+ * shall remain cached because they are still referenced or because they are
part of an ongoing
+ * operation (pending ops).
+ */
+ protected final Cache<String, SolrCore> overflowCores;
Review Comment:
As I understand, the cores evicted from the cache but still in use are put
in this overflowCores. There is a hook called when the core is closed, but
there is no action enqueued to close the core when possible. So the core will
remain open until it is closed explicitly. This overflowCores cache can grow,
but there is no action taken when it reaches it size limit. The cores evicted
from overflowCores are not counted anymore.
The PR description says _"a separate thread will schedule a ping-pong
behaviour in the eviction handler on a full cache with SolrCores still
referenced"_. Could you be more specific?
Did you see this ping pong behavior, with a core being evicted, then put
back, then evicted again, and so on?
I believe we should not have this additional overflowCores, which brings
complexity and fragility. We should either find a solution to work with the
Caffeine cache transientCores (see my question below), or we should use another
cache.
##########
solr/core/src/java/org/apache/solr/core/SolrCores.java:
##########
@@ -555,23 +658,41 @@ public boolean isCoreLoading(String name) {
}
public void queueCoreToClose(SolrCore coreToClose) {
- synchronized (modifyLock) {
+ READ_WRITE_LOCK.writeLock().lock();
+ try {
pendingCloses.add(coreToClose); // Essentially just queue this core up
for closing.
- modifyLock.notifyAll(); // Wakes up closer thread too
+ WRITE_LOCK_CONDITION.signalAll(); // Wakes up closer thread too
+ } finally {
+ READ_WRITE_LOCK.writeLock().unlock();
}
}
/**
* @return the cache holding the transient cores; never null.
*/
public TransientSolrCoreCache getTransientCacheHandler() {
- synchronized (modifyLock) {
- if (transientSolrCoreCacheFactory == null) {
- throw new SolrException(
- SolrException.ErrorCode.SERVER_ERROR,
- getClass().getName() + " not loaded; call load() before using it");
- }
- return transientSolrCoreCacheFactory.getTransientSolrCoreCache();
+ READ_WRITE_LOCK.writeLock().lock();
+ try {
+ return getInternalTransientCacheHandler();
+ } finally {
+ READ_WRITE_LOCK.writeLock().unlock();
+ }
+ }
+
+ /**
+ * Explicitly does not use a lock to provide the flexibility to choose
between a read-lock and a
+ * write-lock before calling this method. Choosing a lock and locking before
calling this method
+ * is mandatory. Note: Using always a write-lock would be costly. Using
always a read-lock would
+ * block all calls which already hold a write-lock.
+ *
+ * @return the cache holding the transient cores; never null.
+ */
+ private TransientSolrCoreCache getInternalTransientCacheHandler() {
Review Comment:
The javadoc should state that either lock must be hold when calling this
method.
Would it be clearer to have this method named getTransientCacheWithLock()?
##########
solr/core/src/java/org/apache/solr/core/SolrCores.java:
##########
@@ -555,23 +658,41 @@ public boolean isCoreLoading(String name) {
}
public void queueCoreToClose(SolrCore coreToClose) {
- synchronized (modifyLock) {
+ READ_WRITE_LOCK.writeLock().lock();
+ try {
pendingCloses.add(coreToClose); // Essentially just queue this core up
for closing.
- modifyLock.notifyAll(); // Wakes up closer thread too
+ WRITE_LOCK_CONDITION.signalAll(); // Wakes up closer thread too
+ } finally {
+ READ_WRITE_LOCK.writeLock().unlock();
}
}
/**
* @return the cache holding the transient cores; never null.
*/
public TransientSolrCoreCache getTransientCacheHandler() {
- synchronized (modifyLock) {
- if (transientSolrCoreCacheFactory == null) {
- throw new SolrException(
- SolrException.ErrorCode.SERVER_ERROR,
- getClass().getName() + " not loaded; call load() before using it");
- }
- return transientSolrCoreCacheFactory.getTransientSolrCoreCache();
+ READ_WRITE_LOCK.writeLock().lock();
Review Comment:
I think this method should take the read lock, and document it.
Anyway the caller would need to take the write lock to modify it outside of
this method. So this method does not have to enforce the write lock.
##########
solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java:
##########
@@ -70,10 +80,15 @@ public TransientSolrCoreCacheDefault(CoreContainer
coreContainer) {
Caffeine<String, SolrCore> transientCoresCacheBuilder =
Caffeine.newBuilder()
.initialCapacity(initialCapacity)
- // Use the current thread to queue evicted cores for closing. This
ensures the
- // cache max size is respected (with a different thread the max
size would be
- // respected asynchronously only eventually).
- .executor(Runnable::run)
+ // Do NOT use the current thread to queue evicted cores for
closing. Although this
+ // ensures the cache max size is respected only eventually, it
should be a very
+ // short period of time until the cache maintenance task kicks in.
+ // The onEvict method needs the writeLock from SolrCores to
operate safely.
+ // However, the current thread most probably has acquired a
read-lock already
+ // somewhere up the call stack and would deadlock.
+ // Note that Caffeine cache has an internal maintenance task
rescheduling logic which
+ // explicitly checks on the common pool.
+ .executor(ForkJoinPool.commonPool())
Review Comment:
@ben-manes we had a discussion on this subject some time ago. You told us
about a way to pin entries. But in our case here, the is-in-use status of an
entry is transient, we cannot change the weight of many entries each time they
are used.
So we decided to un-evict (put back) the entry to the cache if it is in-use.
But we see now the limit of the approach when we have to manage a read lock and
a write lock, and a different thread to put back.
The solution I see would be to have an eviction approver (hard no) before
the entry is selected for eviction (so we prefer to not evict rather than
putting back). In this case we accept the O(n) scan for the entry selection,
and we also accept that the cache size exceeds the limit, knowing that Caffeine
maintenance tasks will kick periodically to try to evict entries again later.
Would that be a possible option for Caffeine?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]