[ 
https://issues.apache.org/jira/browse/SOLR-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-16918:
--------------------------------------
    Attachment: SOLR-16918.test.patch

> MetricUtils.instrumentedExecutorService() is not safe in any excutors with a 
> lifecycle that isn't forever
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-16918
>                 URL: https://issues.apache.org/jira/browse/SOLR-16918
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Priority: Major
>         Attachments: SOLR-16918.test.patch
>
>
> I have a custom Solr RequestHandler that internally uses a subclass of 
> {{HttpShardHandler}} for dispatching some special types of distributed 
> requests.
> I recently upgraded form Solr 9.1 to Solr 9.3 - and in doing so started 
> getting the following exceptions on SolrCore RELOAD...
> {noformat}
> 2023-07-31 17:46:19.548 ERROR (qtp1690101810-23) [] o.a.s.s.HttpSolrCall 500 
> Exception => org.apache.solr.common.SolrException: Unable to reload core 
> [foo_shard1_replica_n2]
>       at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:2028)
> org.apache.solr.common.SolrException: Unable to reload core 
> [foo_shard1_replica_n2]
>       at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:2028) 
> ~[?:?]
>       at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1946) 
> ~[?:?]
>       at 
> org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$2(CoreAdminOperation.java:136)
>  ~[?:?]
>       at 
> org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:400)
>  ~[?:?]
>       at 
> org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:353)
>  ~[?:?]
>       at 
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:219)
>  ~[?:?]
>       at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:224)
>  ~[?:?]
>       at 
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:928) ~[?:?]
> ...
> Caused by: org.apache.solr.common.SolrException: A metric named 
> QUERY./bar.shardHandler.threadPool.httpShardExecutor.pool.size already exists
>       at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1228) ~[?:?]
>       at org.apache.solr.core.SolrCore.reload(SolrCore.java:788) ~[?:?]
>       at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1978) 
> ~[?:?]
>       ... 62 more
> Caused by: java.lang.IllegalArgumentException: A metric named 
> QUERY./bar.shardHandler.threadPool.httpShardExecutor.pool.size already exists
>       at 
> com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:168) 
> ~[metrics-core-4.2.19.jar:4.2.19]
>       at 
> com.codahale.metrics.MetricRegistry.registerGauge(MetricRegistry.java:88) 
> ~[metrics-core-4.2.19.jar:4.2.19]
>       at 
> com.codahale.metrics.InstrumentedExecutorService.<init>(InstrumentedExecutorService.java:61)
>  ~[metrics-core-4.2.19.jar:4.2.19]
>       at 
> org.apache.solr.util.stats.MetricUtils.instrumentedExecutorService(MetricUtils.java:751)
>  ~[?:?]
>       at 
> org.apache.solr.handler.component.HttpShardHandlerFactory.initializeMetrics(HttpShardHandlerFactory.java:415)
>  ~[?:?]
> ...
> {noformat}
> The problem evidently comes from {{{}HttpShardHandler{}}}'s usage of 
> {{MetricUtils.instrumentedExecutorService()}} which returns an instance of 
> the (third-party) dropwizard {{{}InstrumentedExecutorService{}}}.
> {{InstrumentedExecutorService}} made a change in their 4.2.x releases (now 
> used by Solr 9.3 ... not sure if it was in 9.2) so that in addition to some 
> previous {{Metric}} instances that {{InstrumentedExecutorService}} registered 
> in older versions, it also includes some "type specific" metrics that are 
> _Gauges_ driven by the behavior of the delegate {{{}ExecutorService{}}}...
> [https://github.com/dropwizard/metrics/commit/5f7ba7f4eb15bab0ddd458b56cdd313ea795eba7]
> The lifecycles of Gauges are extremely finicky, because {{MetricRegistry}} 
> can't just "re-use" existing instances (like it can with {{{}Meter{}}}, 
> {{{}Counter{}}}, {{{}Timer{}}}, etc...). This is why Solr's 
> {{SolrMetricManager}} jumps through a lot of hoops when registering gauges - 
> and even has a custom {{GaugeWrapper}} - to ensure we can handle things like 
> SolrCore {{RELOAD}} (where we have two copies of the SolrCore, and all of 
> it's associated plugins, running at once - and reporting metrics - until the 
> new SolrCore is fully initialized and we swap over to it)
> But {{InstrumentedExecutorService}} and it's new use of Gauges doesn't know 
> anything about {{SolrMetricManager}} - it doesn't even de-register it's 
> Gauges in it's {{shutdown()}} method, so you can't even do this...
> {code:java}
> (new InstrumentedExecutorService(delegate, registry, "xyz")).shutdown();
> (new InstrumentedExecutorService(delegate, registry, "xyz")).shutdown();
> {code}
> ...w/o triggering this type of exception.
> ----
> {{HttpShardHandler}} (and {{{}UpdateShardHandler{}}}) _tend_ to only be 
> initialized at the {{CoreContainer}} level, so I think it's unlikely you'll 
> encounter this kind of SolrCore RELOAD error in "stock" Solr - but 
> {{MetricUtils.instrumentedExecutorService(...)}} is really just a time bomb 
> waiting to go off.
> If anyone starts using method in other places in solr code, it's unlikely any 
> tests will notice the problem unless they do a core reload.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to