[
https://issues.apache.org/jira/browse/SOLR-14607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318161#comment-17318161
]
Christine Poerschke commented on SOLR-14607:
--------------------------------------------
Returning to this, I wonder if it might be helpful to "brain storm" some
scenarios and possiblities, away from and complimentary to code/pull request
discussions?
----
imaginary setup:
* our collection contains 1000 documents
* 500 documents match the query {{foobar:hello}}
* we want to rerank the best 100 of the 500 matching documents
* we want to return the best 10 of the 500 reranked documents
* we want features to be returned for the documents
search parameters:
{code}
q=foobar:hello&rows=10&rq={!ltr model=myModel
reRankDocs=100}&fl=id,foobar,[features],score
{code}
assumptions (just to help us think about different scenarios):
* matching the query takes 1ms per document i.e. 500ms for the 500 documents
that match the query
* reranking the documents takes 10ms per document i.e. 1000ms for the best 100
of the 500 matching documents
scenario 1: no timeAllowed limit
* 500ms + 1000ms = 1500ms
* the search completes in one and a half seconds, returning full results and
feature values
scenario 2a: timeAllowed=5ms
* the search logically hits the timeAllowed limit whilst matching the query,
after matching 5 documents
* there is no time left for reranking
* 5 documents is less than the rows=10 and also less than reRankDocs=100
scenario 2b: timeAllowed=80ms
* the search logically hits the timeAllowed limit whilst matching the query,
after matching 80 documents
* there is no time left for reranking
* 80 documents is more than the rows=10 and but it is less than reRankDocs=100
scenario 2c: timeAllowed=123ms
* the search logically hits the timeAllowed limit whilst matching the query,
after matching 123 documents
* there is no time left for reranking
* 123 documents is more than the rows=10 and also more than reRankDocs=100
scenario 3a: timeAllowed=550ms
* the search spends 500ms matching the query
* 50ms are left for reranking and in that time the features for 5 documents
could be computed
* 5 documents is less than the rows=10 and it is also less than reRankDocs=100
scenario 3b: timeAllowed=750ms
* the search spends 500ms matching the query
* 250ms are left for reranking and in that time the features for 25 documents
could be computed
* 25 documents is more than the rows=10 and but it is less than reRankDocs=100
----
brain storming possibilities:
all scenarios:
* possibility 0: disallow use of {{timeAllowed}} with {{ltr}} re-ranking
scenario 2:
* possibility 1: don't compute feature values, don't do re-ranking, return the
partial results with the existing {{partialResults}} flag set in the response,
from the flag the caller understands that re-ranking did not happen.
* possibility 2: don't do re-ranking but do compute and return feature values.
return the partial results with the existing {{partialResults}} flag set in the
response, from the flag the caller understands that re-ranking did not happen
and via the presence of feature values the caller understands that the
{{timeAllowed}} was not fully respected i.e. after the allowed time was used
still additional time was spent computing features
* possibility 3: do compute and return feature values, do do re-ranking, return
the results with the existing {{partialResults}} flag set in the response, from
the response the caller understands that the {{timeAllowed}} was not fully
respected i.e. after the allowed time was used still additional time was spent
computing features and re-ranking
scenario 3:
* possibility 4: don't compute remaining features, don't return any features,
don't do re-ranking, return the results based on original scores and indicate
via a new {{rerankingOmitted=true}} or similar flag in the response that
re-ranking was not done.
* possibility 5: do compute remaining features, return all features, don't do
re-ranking, return the results based on original scores and indicate via a new
{{rerankingOmitted=true}} flag in the response that re-ranking was skipped. via
the presence of feature values and the presence of the flag the caller
understands that the {{timeAllowed}} was not fully respected
* possibility 6: do compute remaining features, return all features, do do
re-ranking. the caller understands from documentation that {{timeAllowed}} is
not applied during re-ranking.
> LTR Query, timeAllowed parameter causes a timeout exception with no result
> --------------------------------------------------------------------------
>
> Key: SOLR-14607
> URL: https://issues.apache.org/jira/browse/SOLR-14607
> Project: Solr
> Issue Type: Improvement
> Components: contrib - LTR
> Affects Versions: main (9.0)
> Reporter: Dawn
> Priority: Minor
> Attachments: SOLR-14607-poc.patch
>
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> When using the LTR, open timeAllowed parameter, LTR feature of query may call
> 'ExitableFilterAtomicReader.CheckAndThrow' timeout checks.
> If a timeout occurs at this point, the exception ExitingReaderException is
> thrown, Lead to null result.
> Exception information:
> {code:java}
> The request took too long to iterate over terms. Timeout: timeoutAt:
> 50321611131050 (System.nanoTime(): 50321639573838),
> TermsEnum=org.apache.lucene.codecs.blocktree.SegmentTermsEnum@62eaeeaa
> {code}
>
> Can hold this exception in the LTR, returning partial results rather than
> null.
> This exception occurs in two places:
> 1. 'LTRScoringQuery.CreateWeight' or 'LTRScoringQuery.createWeightsParallel'.
> Here is the loading stage, timeout directly end is acceptable.
> 2. 'ModelWeight.scorer'. This is a stage that evaluates each Doc and can
> catch the exception, returns the computed document.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]