Hi devs,
While working on SOLR-17869 two things occurred to me:
1. Discounting the `timeAllowed`
When `timeAllowed` is > 0 `TopGroupsShardRequestFactory` discounts the time
already spent at the query coordinator when creating shard requests. This
happens ONLY for the grouping queries, for all other queries `timeAllowed` is
passed as-is to shard requests.
The net effect is that for other query types the same value of `timeAllowed` is
applied to the processing at the coordinator as well as to the per-shard local
processing. Since `timeAllowed` is a wall-clock time relative to the time of
the local request, not discounting the time already spent increases the
likelihood of the query timing out at the coordinator (because even if shard
requests all fit within the `timeAllowed` *as started at each shard*, it still
takes some time to return the results and post-process them at the coordinator
where timeAllowed was started much earlier).
From this POV it seems that `timeAllowed` should always be discounted before
sending shard requests, for all types of queries, which is not the case today.
Another question comes to mind, too: should we discount other types of query
limits, too? For other limit types the same “budget” is applied both at the
coordinator and locally per-shard - however, CPU and memory limits are not tied
to an absolute point in time so maybe it doesn’t make sense to discount them by
the amount spent at the coordinator, unless we want to limit the “end-to-end”
impact of a query across both local and distrib sub-requests. WDYT?
2. QueryCommand.timeAllowed
During the upgrade to Lucene 10 a few blocks of code were commented out in
SolrIndexSearcher, and one of them was this (SolrIndexSearcher:309):
/* final long timeAllowed = cmd.getTimeAllowed();
if (timeAllowed > 0) {
setTimeout(new QueryTimeoutImpl(timeAllowed));
}*/
I was surprised that none of the tests failed with this change and started
digging. QueryCommand.timeAllowed is initialized from request params. But it
looks like this property is no longer needed because now for each request we
initialize the QueryLimits from the same request params, which serve the same
purpose.
What’s more, I think the commented-out section was a bug because it would call
`IndexSearcher.setTimeout` on a searcher instance that was used for multiple
queries, and that setting would persist even for other subsequent requests.
I propose to remove `QueryCommand.timeAllowed` altogether.
—
Andrzej Białecki
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]