Re: timeAllowed discounting and QueryLimits

Andrzej Białecki Tue, 23 Sep 2025 07:46:41 -0700

Yes, I understand that … OTOH I don’t think it would make sense to move this 
particular logic to a servlet filter now, without doing the other refactors / 
splitting of SolrDispatchFilter first. There needs to be more support for 
connecting the processing at the filter level and the Solr level.


For example these parameters that you suggested - TimeAllowedLimit doesn’t have 
access to HTTP headers, besides that would violate too many layers. It's not 
really convenient to modify the HTTP queryParams in the original request - so 
the only way to pass on additional information to the other filters would be to 
use HttpServletRequest.setAttribute. But then in HttpSolrCall.init() you would 
have to extract them and add to the SolrParams.

I guess my point is that until we’ve done the groundwork for refactoring 
SolrDispatchFilter then adding the timeAllowed logic there still looks like the 
least messy approach. 

> On 23 Sep 2025, at 14:50, Gus Heck <[email protected]> wrote:
> 
> I am generally looking for ways to make HttpSolrCall and SolrDispatchFilter
> smaller and simpler. They both do way too many things (security, tracing,
> dispatch to fundamentally different functions, rate limiting, etc ). I'm
> not enthusiastic about adding another essentially request wrapping
> functionality to the existing bloat.
> 
> On Tue, Sep 23, 2025 at 8:34 AM Andrzej Białecki <[email protected]> wrote:
> 
>> Gus,
>> 
>> Hmm… this sounds better than using NOW, which may be significantly off if
>> nodes are out of sync and as a result could trigger immediate termination
>> of sub-requests.
>> 
>> The downside to this approach is that we can’t properly account for the
>> in-flight time (between tagging and sending a shard request up to the
>> receiving node starting the processing) but that seems a small price to pay
>> and it should only introduce a small and consistent error.
>> 
>> We could add this logic somewhere in HttpSolrCall.init() instead of a
>> servlet filter, because at this point it’s still very early into the
>> request processing but we already have the request params. WDYT?
>> 
>>> On 22 Sep 2025, at 23:43, Gus Heck <[email protected]> wrote:
>>> 
>>> For timeAllowed, the proper design would seem to be to establish a "must
>>> finish by" time, at initial query receipt, and then pass that to the
>>> shards. The fly in that ointment would be clock skew, so one probably
>> needs
>>> to also send along a notion of when the sub-request was made, which the
>>> recipient can compare to their own clock. This is similar to discounting
>>> but would implicitly allow network request time to be "free" (untimed),
>> but
>>> the advantage is that it also leaves the option that in a cluster where
>>> nodes run on machines with trustworthy clocks the skew check can be
>>> skipped, and skew checking code can be independent of all other request
>>> processing code.
>>> 
>>> Main request: timeAllowed=*1*0000
>>> Coordinator clock at request receipt: 17585758*5*06*05*
>>> Subrequest finishBefore=17585758*6*06*05*
>>> 
>>> Optionally on initial request: mitigateClockSkew=true (this could also
>>> default to true if we like)
>>> which then causes the subrequest to also contain:
>> clockSkewCheck=17585758*5*
>>> 06*73* (assuming it took 68 ms to decide to send the sub-request)
>>> 
>>> if the clockSkewCheck is sent the receiving node checks the supplied
>> value
>>> against its local clock and applies the (positive or negative) difference
>>> to the finishBefore value.
>>> 
>>> That way all logic is simple and consistent for checking the limit (just
>>> compare to current time), systems that have synchronized clocks can
>> benefit
>>> from it, and most importantly, the skew adjustment code can be a super
>>> simple HttpServletFilter that doesn't need to be worked into any of the
>>> existing code paths.
>>> 
>>> Then we could also allow the user to send the "finishBefore" to the
>>> coordinator if the original requesting system also has a reliably
>>> synchronized clock.
>>> 
>>> On Mon, Sep 22, 2025 at 10:21 AM David Smiley <[email protected]>
>> wrote:
>>> 
>>>> I think it definitely makes sense to discount timeAllowed for the shard
>>>> requests.  I was thinking about this and related matters recently at my
>>>> last job. I recall thinking that a possible solution is to use the NOW
>>>> passed along from coordinator to shard as the epoch to base the time
>>>> allowance on.  It's captured early by the coordinator. On the other
>> hand,
>>>> it could be considered a hack / abuse of the intent of NOW.  Maybe a
>>>> separate parameter (or HTTP header) should be used to pass this along.
>>>> 
>>>> I don't think it's worth bothering doing this for memory or CPU limits.
>>>> But I'm not against it.
>>>> 
>>>> On Mon, Sep 22, 2025 at 10:11 AM Andrzej Białecki <[email protected]>
>> wrote:
>>>> 
>>>>> Hi devs,
>>>>> 
>>>>> While working on SOLR-17869 two things occurred to me:
>>>>> 
>>>>> 1. Discounting the `timeAllowed`
>>>>> 
>>>>> When `timeAllowed` is > 0 `TopGroupsShardRequestFactory` discounts the
>>>>> time already spent at the query coordinator when creating shard
>> requests.
>>>>> This happens ONLY for the grouping queries, for all other queries
>>>>> `timeAllowed` is passed as-is to shard requests.
>>>>> 
>>>>> The net effect is that for other query types the same value of
>>>>> `timeAllowed` is applied to the processing at the coordinator as well
>> as
>>>> to
>>>>> the per-shard local processing. Since `timeAllowed` is a wall-clock
>> time
>>>>> relative to the time of the local request, not discounting the time
>>>> already
>>>>> spent increases the likelihood of the query timing out at the
>> coordinator
>>>>> (because even if shard requests all fit within the `timeAllowed` *as
>>>>> started at each shard*, it still takes some time to return the results
>>>> and
>>>>> post-process them at the coordinator where timeAllowed was started much
>>>>> earlier).
>>>>> 
>>>>> From this POV it seems that `timeAllowed` should always be discounted
>>>>> before sending shard requests, for all types of queries, which is not
>> the
>>>>> case today.
>>>>> 
>>>>> Another question comes to mind, too: should we discount other types of
>>>>> query limits, too? For other limit types the same “budget” is applied
>>>> both
>>>>> at the coordinator and locally per-shard - however, CPU and memory
>> limits
>>>>> are not tied to an absolute point in time so maybe it doesn’t make
>> sense
>>>> to
>>>>> discount them by the amount spent at the coordinator, unless we want to
>>>>> limit the “end-to-end” impact of a query across both local and distrib
>>>>> sub-requests. WDYT?
>>>>> 
>>>>> 
>>>>> 2. QueryCommand.timeAllowed
>>>>> 
>>>>> During the upgrade to Lucene 10 a few blocks of code were commented out
>>>> in
>>>>> SolrIndexSearcher, and one of them was this (SolrIndexSearcher:309):
>>>>> 
>>>>> /* final long timeAllowed = cmd.getTimeAllowed();
>>>>> if (timeAllowed > 0) {
>>>>> setTimeout(new QueryTimeoutImpl(timeAllowed));
>>>>> }*/
>>>>> 
>>>>> I was surprised that none of the tests failed with this change and
>>>> started
>>>>> digging. QueryCommand.timeAllowed is initialized from request params.
>> But
>>>>> it looks like this property is no longer needed because now for each
>>>>> request we initialize the QueryLimits from the same request params,
>> which
>>>>> serve the same purpose.
>>>>> 
>>>>> What’s more, I think the commented-out section was a bug because it
>> would
>>>>> call `IndexSearcher.setTimeout` on a searcher instance that was used
>> for
>>>>> multiple queries, and that setting would persist even for other
>>>> subsequent
>>>>> requests.
>>>>> 
>>>>> I propose to remove `QueryCommand.timeAllowed` altogether.
>>>>> 
>>>>> —
>>>>> 
>>>>> Andrzej Białecki
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> http://www.needhamsoftware.com (work)
>>> https://a.co/d/b2sZLD9 (my fantasy fiction book)
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
>> 
> 
> -- 
> http://www.needhamsoftware.com (work)
> https://a.co/d/b2sZLD9 (my fantasy fiction book)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: timeAllowed discounting and QueryLimits

Reply via email to