Chris M. Hostetter created SOLR-16918:
-----------------------------------------
Summary: MetricUtils.instrumentedExecutorService() is not safe in
any excutors with a lifecycle that isn't forever
Key: SOLR-16918
URL: https://issues.apache.org/jira/browse/SOLR-16918
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Chris M. Hostetter
I have a custom Solr RequestHandler that internally uses a subclass of
{{HttpShardHandler}} for dispatching some special types of distributed requests.
I recently upgraded form Solr 9.1 to Solr 9.3 - and in doing so started getting
the following exceptions on SolrCore RELOAD...
{noformat}
2023-07-31 17:46:19.548 ERROR (qtp1690101810-23) [] o.a.s.s.HttpSolrCall 500
Exception => org.apache.solr.common.SolrException: Unable to reload core
[foo_shard1_replica_n2]
at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:2028)
org.apache.solr.common.SolrException: Unable to reload core
[foo_shard1_replica_n2]
at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:2028)
~[?:?]
at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1946)
~[?:?]
at
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:136)
~[?:?]
at
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:400)
~[?:?]
at
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:353)
~[?:?]
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:219)
~[?:?]
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:224)
~[?:?]
at
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:928) ~[?:?]
...
Caused by: org.apache.solr.common.SolrException: A metric named
QUERY./bar.shardHandler.threadPool.httpShardExecutor.pool.size already exists
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1228) ~[?:?]
at org.apache.solr.core.SolrCore.reload(SolrCore.java:788) ~[?:?]
at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1978)
~[?:?]
... 62 more
Caused by: java.lang.IllegalArgumentException: A metric named
QUERY./bar.shardHandler.threadPool.httpShardExecutor.pool.size already exists
at
com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:168)
~[metrics-core-4.2.19.jar:4.2.19]
at
com.codahale.metrics.MetricRegistry.registerGauge(MetricRegistry.java:88)
~[metrics-core-4.2.19.jar:4.2.19]
at
com.codahale.metrics.InstrumentedExecutorService.<init>(InstrumentedExecutorService.java:61)
~[metrics-core-4.2.19.jar:4.2.19]
at
org.apache.solr.util.stats.MetricUtils.instrumentedExecutorService(MetricUtils.java:751)
~[?:?]
at
org.apache.solr.handler.component.HttpShardHandlerFactory.initializeMetrics(HttpShardHandlerFactory.java:415)
~[?:?]
...
{noformat}
The problem evidently comes from {{{}HttpShardHandler{}}}'s usage of
{{MetricUtils.instrumentedExecutorService()}} which returns an instance of the
(third-party) dropwizard {{{}InstrumentedExecutorService{}}}.
{{InstrumentedExecutorService}} made a change in their 4.2.x releases (now used
by Solr 9.3 ... not sure if it was in 9.2) so that in addition to some previous
{{Metric}} instances that {{InstrumentedExecutorService}} registered in older
versions, it also includes some "type specific" metrics that are _Gauges_
driven by the behavior of the delegate {{{}ExecutorService{}}}...
[https://github.com/dropwizard/metrics/commit/5f7ba7f4eb15bab0ddd458b56cdd313ea795eba7]
The lifecycles of Gauges are extremely finicky, because {{MetricRegistry}}
can't just "re-use" existing instances (like it can with {{{}Meter{}}},
{{{}Counter{}}}, {{{}Timer{}}}, etc...). This is why Solr's
{{SolrMetricManager}} jumps through a lot of hoops when registering gauges -
and even has a custom {{GaugeWrapper}} - to ensure we can handle things like
SolrCore {{RELOAD}} (where we have two copies of the SolrCore, and all of it's
associated plugins, running at once - and reporting metrics - until the new
SolrCore is fully initialized and we swap over to it)
But {{InstrumentedExecutorService}} and it's new use of Gauges doesn't know
anything about {{SolrMetricManager}} - it doesn't even de-register it's Gauges
in it's {{shutdown()}} method, so you can't even do this...
{code:java}
(new InstrumentedExecutorService(delegate, registry, "xyz")).shutdown();
(new InstrumentedExecutorService(delegate, registry, "xyz")).shutdown();
{code}
...w/o triggering this type of exception.
----
{{HttpShardHandler}} (and {{{}UpdateShardHandler{}}}) _tend_ to only be
initialized at the {{CoreContainer}} level, so I think it's unlikely you'll
encounter this kind of SolrCore RELOAD error in "stock" Solr - but
{{MetricUtils.instrumentedExecutorService(...)}} is really just a time bomb
waiting to go off.
If anyone starts using method in other places in solr code, it's unlikely any
tests will notice the problem unless they do a core reload.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]