murblanc commented on a change in pull request #580:
URL: https://github.com/apache/solr/pull/580#discussion_r797700536
##########
File path:
solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer
coreContainer) {
transientDescriptors = new LinkedHashMap<>(initialCapacity);
}
+ private void onEvict(SolrCore core) {
+ if (coreContainer.hasPendingCoreOps(core.getName())) {
+ if (log.isInfoEnabled()) {
+ log.info("NOT evicting transient core [{}]; it's loading or something
else", core.getName());
+ }
+ transientCores.put(core.getName(), core); // put back
Review comment:
This `put` and the one below being called from a `removalListener`, they
happen **after** the eviction from the cache.
I assume that if another thread tries to access the core (and therefore
tries to create a new one?) it will see the error this PR is fixing, but once
the core is put back here, things should be ok.
Not suggesting any change, just checking my understanding that there might
be some race condition transient errors around this.
##########
File path:
solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer
coreContainer) {
transientDescriptors = new LinkedHashMap<>(initialCapacity);
}
+ private void onEvict(SolrCore core) {
+ if (coreContainer.hasPendingCoreOps(core.getName())) {
+ if (log.isInfoEnabled()) {
+ log.info("NOT evicting transient core [{}]; it's loading or something
else", core.getName());
+ }
+ transientCores.put(core.getName(), core); // put back
+ } else if (core.getOpenCount() > 1) {
+ if (log.isInfoEnabled()) {
+ log.info("NOT evicting transient core [{}]; it's still in use",
core.getName());
+ }
+ transientCores.put(core.getName(), core); // put back
+ } else {
+ if (log.isInfoEnabled()) {
+ log.info("Closing transient core [{}] evicted from the cache",
core.getName());
+ }
+ coreContainer.queueCoreToClose(core);
Review comment:
What happens if once the core was added here to the
`SolrCores.pendingCloses` but before the close actually happens, some thread
reopens and reloads the core?
##########
File path: solr/core/src/test/org/apache/solr/core/TestLazyCores.java
##########
@@ -212,43 +225,44 @@ public void testCachingLimit() throws Exception {
checkCoresNotLoaded(cc, "collection3", "collection4", "collection6",
"collection7", "collection8", "collection9");
// By putting these in non-alpha order, we're also checking that we're
not just seeing an artifact.
- SolrCore core1 = cc.getCore("collection1");
- SolrCore core3 = cc.getCore("collection3");
- SolrCore core4 = cc.getCore("collection4");
- SolrCore core2 = cc.getCore("collection2");
- SolrCore core5 = cc.getCore("collection5");
+ getCoreAndPutBack(cc, "collection1");
+ getCoreAndPutBack(cc, "collection3");
+ getCoreAndPutBack(cc, "collection4");
+ getCoreAndPutBack(cc, "collection2");
+ getCoreAndPutBack(cc, "collection5");
checkLoadedCores(cc, "collection1", "collection2", "collection3",
"collection4", "collection5");
checkCoresNotLoaded(cc, "collection6", "collection7", "collection8",
"collection9");
// map should be full up, add one more and verify
- SolrCore core6 = cc.getCore("collection6");
+ getCoreAndPutBack(cc, "collection6");
checkLoadedCores(cc, "collection1", "collection2", "collection3",
"collection4", "collection5",
"collection6");
checkCoresNotLoaded(cc, "collection7", "collection8", "collection9");
- SolrCore core7 = cc.getCore("collection7");
+ getCoreAndPutBack(cc, "collection7");
checkLoadedCores(cc, "collection1", "collection2", "collection3",
"collection4", "collection5",
"collection6", "collection7");
checkCoresNotLoaded(cc, "collection8", "collection9");
- SolrCore core8 = cc.getCore("collection8");
+ getCoreAndPutBack(cc, "collection8");
checkLoadedCores(cc, "collection1", "collection4", "collection5",
"collection8");
Review comment:
Help me: why are we sure here that collection1, collection4 and
collection5 cores are loaded whereas collection2 and collection3 for example
might not? @bruno-roustant maybe you know?
##########
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##########
@@ -2284,8 +2285,10 @@ public void run() {
SolrCore core;
while (!container.isShutDown() && (core = solrCores.getCoreToClose()) !=
null) {
+ assert core.getOpenCount() == 1;
Review comment:
assert is only enabled in tests usually, right?
What happens if the core gets reopened while we close it here? Is it a case
of the open count being higher than one in a legitimate way?
##########
File path:
solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer
coreContainer) {
transientDescriptors = new LinkedHashMap<>(initialCapacity);
}
+ private void onEvict(SolrCore core) {
+ if (coreContainer.hasPendingCoreOps(core.getName())) {
Review comment:
After offline discussion with David, this can happen when a core load
takes too much time (registering in ZK etc) and we try to unload it before it
has completed. Likely deserves a comment here.
This means the cache can grow pretty big, as evictions will not be allowed
for a while, or if we prevent additions to the cache when it's over max size,
this can cause global slowdown on node startup.
If the cache grows big, we need to make sure there is some process shrinking
it back to reasonable size in reasonable time. If we only evict one core when
we add one (I don't know if that's the case), the cache might never reduce in
size.
##########
File path: solr/core/src/test/org/apache/solr/core/TestLazyCores.java
##########
@@ -261,23 +275,16 @@ public void testCachingLimit() throws Exception {
assertNotNull(o);
}
-
- // Note decrementing the count when the core is removed from the
lazyCores list is appropriate, since the
- // refcount is 1 when constructed. anyone _else_ who's opened up one has
to close it.
- core1.close();
- core2.close();
- core3.close();
- core4.close();
- core5.close();
- core6.close();
- core7.close();
- core8.close();
- core9.close();
} finally {
cc.shutdown();
}
}
+ private void getCoreAndPutBack(CoreContainer cc, String name) {
Review comment:
`getAndCloseCore` a better name maybe?
And comment that core needs to be closed for it to be an eviction candidate.
##########
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##########
@@ -2284,8 +2285,10 @@ public void run() {
SolrCore core;
while (!container.isShutDown() && (core = solrCores.getCoreToClose()) !=
null) {
+ assert core.getOpenCount() == 1;
try {
- core.close();
+ MDCLoggingContext.setCore(core);
Review comment:
Nice. Wasn't obvious for me at first glance that it was for setting the
right logging context for the `close()` call.
##########
File path:
solr/core/src/java/org/apache/solr/core/TransientSolrCoreCacheDefault.java
##########
@@ -89,6 +86,25 @@ public TransientSolrCoreCacheDefault(CoreContainer
coreContainer) {
transientDescriptors = new LinkedHashMap<>(initialCapacity);
}
+ private void onEvict(SolrCore core) {
+ if (coreContainer.hasPendingCoreOps(core.getName())) {
+ if (log.isInfoEnabled()) {
+ log.info("NOT evicting transient core [{}]; it's loading or something
else", core.getName());
Review comment:
Need to update comment line 72 above since the cache max size is not
strictly enforced anymore. I don't think the cache can grow extremely big
unless all the cores we try to evict happen to be still in use, which is
unlikely.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]