Chris M. Hostetter created SOLR-17200:
-----------------------------------------
Summary: Race conditions on startup using
/health?requireHealthyCores=true
Key: SOLR-17200
URL: https://issues.apache.org/jira/browse/SOLR-17200
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Chris M. Hostetter
There seem to be at least two possible thread race conditions that can lead
{{/health?requireHealthyCores=true}} to returning false positive while
{{CoreContainer}} is in the process of starting up.
# If the request comes in _after_ {{CoreContainer}} has initialized
{{healthCheckHandler}} but _before_ initializing & running the
{{coreLoadExecutor}}
# A more complex situation where the request comes in _while_
{{coreLoadExecutor}} is loading cores, and all of the cores that have
_finished_ initialization are "active" in SolrCloud, but other SolrCores remain
to be initialized (and may need recovery)
In both cases, the root of the issue is that {{requireHealthyCores=true}} works
by checking...
{code:java}
Collection<CloudDescriptor> coreDescriptors =
coreContainer.getCores().stream()
.map(c -> c.getCoreDescriptor().getCloudDescriptor())
.collect(Collectors.toList());
long unhealthyCores = findUnhealthyCores(coreDescriptors, clusterState);
{code}
..but that means the only {{CloudDescriptor}} s that are checked are the ones
that come from _loaded_ cores (which is what {{coreContainer.getCores()}}
returns). and any {{currentlyLoadingCores}} (registered by CoreContainer
calling {{solrCores.markCoreAsLoading(cd)}} before starting the
{{coreLoadExecutor}} ) are not considered.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]