jpountz commented on issue #14561:
URL: https://github.com/apache/lucene/issues/14561#issuecomment-2834173590

   ```java
   new TopScoreDocCollectorManager(1, 5);
   ```
   
   The total hits threshold is 5 here, which means that the collector should 
count hits at least until it can confirm that there are that many hits that 
match the query. If there are more than 5 hits, it may return any number in 
`[5, numbef of matches]` in combination with 
`TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO`.
   
   In the test you shared, the query matches 10 documents, and the collector 
says that the query matched 6 documents or more, which is correct.
   
   In practice, the better a query is at skipping, the closer the returned hit 
count will be to the `totalHitsThreshold` because the query/collector will be 
able to skip hits very efficiently after having collected the first 
`totalHitsThreshold` ones. This is consistent with the change at #4511, which 
makes queries that don't index frequencies much better at skipping by returning 
lower (while still correct) score upper bounds.
   
   FWIW we released many similar changes to the returned hit counts in the past 
for various queries.
   
   For reference, some applications like Elasticsearch made the decision to 
never return a hit count that is greater than the configured 
`totalHitsThreshold` to avoid setting expectations too high with users wrt how 
many hits would be counted. So this change wouldn't even be noticed by 
Elasticsearch users (except a good speedup hopefully!)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to