[
https://issues.apache.org/jira/browse/SOLR-17419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Gerlowski updated SOLR-17419:
-----------------------------------
Attachment: shardhandler-perf-graph.png
> Improve HttpShardHandler performance in many-shard collections
> --------------------------------------------------------------
>
> Key: SOLR-17419
> URL: https://issues.apache.org/jira/browse/SOLR-17419
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Affects Versions: 9.0, 9.6.1
> Reporter: Jason Gerlowski
> Priority: Major
> Attachments: shardhandler-perf-graph.png
>
>
> In Solr 8, HttpShardHandler sends shard-requests by submitting Callables to
> an ExecutorService. As a result, both the "request-sending" and
> "response-awaiting" happened asynchronous to the original request-thread.
> {code:java}
> @Override
> public void submit(final ShardRequest sreq, final String shard, final
> ModifiableSolrParams params) {
> ShardRequestor shardRequestor = new ShardRequestor(sreq, shard, params,
> this); // Callable
> try {
> shardRequestor.init();
> pending.add(completionService.submit(shardRequestor));
> } finally {
> shardRequestor.end();
> }
> }
> {code}
> However, in Solr 9.x HttpShardHandler ditched the
> ExecutorService/per-request-thread approach in favor of [sending all requests
> serially using
> "SolrClient.requestAsync"|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java#L163].
> SOLR-14354, which made this change, did this in an effort to avoid
> unnecessary thread and CPU context-switching. As Dat described in SOLR-14354:
> {quote}after sending a request that thread basically do nothing just waiting
> for response from other side. That thread will be swapped out and CPU will
> try to handle another thread (this is called context switch, CPU will save
> the context of the current thread and switch to another one). When some data
> (not all) come back, that thread will be called to parsing these data, then
> it will wait until more data come back. So there will be lots of context
> switching in CPU. That is quite inefficient
> {quote}
> This approach comes with a downside though - all the shard requests are sent
> serially. If sending each request takes ~1ms, then a user is unlikely to
> notice this in their collection with 5 or 10 shards. But the cost here
> scales linearly, so in *a collection with 50 shards - this approach would
> bake a ~50ms delay into the critical path of every single query!*
> This issue is intended to reevaluate whether there's a better way to balance
> these concerns. Ideally we can come up with an approach that improves all
> scenarios. Lacking that, maybe Solr could choose between one of several
> approaches semi-intelligently based on the number of shards or other factors?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]