[
https://issues.apache.org/jira/browse/SOLR-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris M. Hostetter updated SOLR-16918:
--------------------------------------
Attachment: SOLR-16918.test.patch
> MetricUtils.instrumentedExecutorService() is not safe in any excutors with a
> lifecycle that isn't forever
> ---------------------------------------------------------------------------------------------------------
>
> Key: SOLR-16918
> URL: https://issues.apache.org/jira/browse/SOLR-16918
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Chris M. Hostetter
> Priority: Major
> Attachments: SOLR-16918.test.patch
>
>
> I have a custom Solr RequestHandler that internally uses a subclass of
> {{HttpShardHandler}} for dispatching some special types of distributed
> requests.
> I recently upgraded form Solr 9.1 to Solr 9.3 - and in doing so started
> getting the following exceptions on SolrCore RELOAD...
> {noformat}
> 2023-07-31 17:46:19.548 ERROR (qtp1690101810-23) [] o.a.s.s.HttpSolrCall 500
> Exception => org.apache.solr.common.SolrException: Unable to reload core
> [foo_shard1_replica_n2]
> at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:2028)
> org.apache.solr.common.SolrException: Unable to reload core
> [foo_shard1_replica_n2]
> at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:2028)
> ~[?:?]
> at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1946)
> ~[?:?]
> at
> org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:136)
> ~[?:?]
> at
> org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:400)
> ~[?:?]
> at
> org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:353)
> ~[?:?]
> at
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:219)
> ~[?:?]
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:224)
> ~[?:?]
> at
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:928) ~[?:?]
> ...
> Caused by: org.apache.solr.common.SolrException: A metric named
> QUERY./bar.shardHandler.threadPool.httpShardExecutor.pool.size already exists
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1228) ~[?:?]
> at org.apache.solr.core.SolrCore.reload(SolrCore.java:788) ~[?:?]
> at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1978)
> ~[?:?]
> ... 62 more
> Caused by: java.lang.IllegalArgumentException: A metric named
> QUERY./bar.shardHandler.threadPool.httpShardExecutor.pool.size already exists
> at
> com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:168)
> ~[metrics-core-4.2.19.jar:4.2.19]
> at
> com.codahale.metrics.MetricRegistry.registerGauge(MetricRegistry.java:88)
> ~[metrics-core-4.2.19.jar:4.2.19]
> at
> com.codahale.metrics.InstrumentedExecutorService.<init>(InstrumentedExecutorService.java:61)
> ~[metrics-core-4.2.19.jar:4.2.19]
> at
> org.apache.solr.util.stats.MetricUtils.instrumentedExecutorService(MetricUtils.java:751)
> ~[?:?]
> at
> org.apache.solr.handler.component.HttpShardHandlerFactory.initializeMetrics(HttpShardHandlerFactory.java:415)
> ~[?:?]
> ...
> {noformat}
> The problem evidently comes from {{{}HttpShardHandler{}}}'s usage of
> {{MetricUtils.instrumentedExecutorService()}} which returns an instance of
> the (third-party) dropwizard {{{}InstrumentedExecutorService{}}}.
> {{InstrumentedExecutorService}} made a change in their 4.2.x releases (now
> used by Solr 9.3 ... not sure if it was in 9.2) so that in addition to some
> previous {{Metric}} instances that {{InstrumentedExecutorService}} registered
> in older versions, it also includes some "type specific" metrics that are
> _Gauges_ driven by the behavior of the delegate {{{}ExecutorService{}}}...
> [https://github.com/dropwizard/metrics/commit/5f7ba7f4eb15bab0ddd458b56cdd313ea795eba7]
> The lifecycles of Gauges are extremely finicky, because {{MetricRegistry}}
> can't just "re-use" existing instances (like it can with {{{}Meter{}}},
> {{{}Counter{}}}, {{{}Timer{}}}, etc...). This is why Solr's
> {{SolrMetricManager}} jumps through a lot of hoops when registering gauges -
> and even has a custom {{GaugeWrapper}} - to ensure we can handle things like
> SolrCore {{RELOAD}} (where we have two copies of the SolrCore, and all of
> it's associated plugins, running at once - and reporting metrics - until the
> new SolrCore is fully initialized and we swap over to it)
> But {{InstrumentedExecutorService}} and it's new use of Gauges doesn't know
> anything about {{SolrMetricManager}} - it doesn't even de-register it's
> Gauges in it's {{shutdown()}} method, so you can't even do this...
> {code:java}
> (new InstrumentedExecutorService(delegate, registry, "xyz")).shutdown();
> (new InstrumentedExecutorService(delegate, registry, "xyz")).shutdown();
> {code}
> ...w/o triggering this type of exception.
> ----
> {{HttpShardHandler}} (and {{{}UpdateShardHandler{}}}) _tend_ to only be
> initialized at the {{CoreContainer}} level, so I think it's unlikely you'll
> encounter this kind of SolrCore RELOAD error in "stock" Solr - but
> {{MetricUtils.instrumentedExecutorService(...)}} is really just a time bomb
> waiting to go off.
> If anyone starts using method in other places in solr code, it's unlikely any
> tests will notice the problem unless they do a core reload.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]