Jason Gerlowski created SOLR-17419:
--------------------------------------
Summary: Improve HttpShardHandler performance in many-shard
collections
Key: SOLR-17419
URL: https://issues.apache.org/jira/browse/SOLR-17419
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: SolrCloud
Affects Versions: 9.6.1, 9.0
Reporter: Jason Gerlowski
In Solr 8, HttpShardHandler sends shard-requests by submitting Callables to an
ExecutorService. As a result, both the "request-sending" and
"response-awaiting" happened asynchronous to the original request-thread.
{code:java}
@Override
public void submit(final ShardRequest sreq, final String shard, final
ModifiableSolrParams params) {
ShardRequestor shardRequestor = new ShardRequestor(sreq, shard, params,
this); // Callable
try {
shardRequestor.init();
pending.add(completionService.submit(shardRequestor));
} finally {
shardRequestor.end();
}
}
{code}
However, in Solr 9.x HttpShardHandler ditched the
ExecutorService/per-request-thread approach in favor of [sending all requests
serially using
"SolrClient.requestAsync"|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java#L163].
SOLR-14354, which made this change, did this in an effort to avoid unnecessary
thread and CPU context-switching. As Dat described in SOLR-14354:
{quote}after sending a request that thread basically do nothing just waiting
for response from other side. That thread will be swapped out and CPU will try
to handle another thread (this is called context switch, CPU will save the
context of the current thread and switch to another one). When some data (not
all) come back, that thread will be called to parsing these data, then it will
wait until more data come back. So there will be lots of context switching in
CPU. That is quite inefficient
{quote}
This approach comes with a downside though - all the shard requests are sent
serially. If sending each request takes ~1ms, then a user is unlikely to notice
this in their collection with 5 or 10 shards. But the cost here scales
linearly, so in *a collection with 50 shards - this approach would bake a ~50ms
delay into the critical path of every single query!*
This issue is intended to reevaluate whether there's a better way to balance
these concerns. Ideally we can come up with an approach that improves all
scenarios. Lacking that, maybe Solr could choose between one of several
approaches semi-intelligently based on the number of shards or other factors?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]