jpountz commented on issue #14561: URL: https://github.com/apache/lucene/issues/14561#issuecomment-2834173590
```java new TopScoreDocCollectorManager(1, 5); ``` The total hits threshold is 5 here, which means that the collector should count hits at least until it can confirm that there are that many hits that match the query. If there are more than 5 hits, it may return any number in `[5, numbef of matches]` in combination with `TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO`. In the test you shared, the query matches 10 documents, and the collector says that the query matched 6 documents or more, which is correct. In practice, the better a query is at skipping, the closer the returned hit count will be to the `totalHitsThreshold` because the query/collector will be able to skip hits very efficiently after having collected the first `totalHitsThreshold` ones. This is consistent with the change at #4511, which makes queries that don't index frequencies much better at skipping by returning lower (while still correct) score upper bounds. FWIW we released many similar changes to the returned hit counts in the past for various queries. For reference, some applications like Elasticsearch made the decision to never return a hit count that is greater than the configured `totalHitsThreshold` to avoid setting expectations too high with users wrt how many hits would be counted. So this change wouldn't even be noticed by Elasticsearch users (except a good speedup hopefully!) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org