[ 
https://issues.apache.org/jira/browse/LUCENE-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555890#comment-16555890
 ] 

Adrien Grand commented on LUCENE-8060:
--------------------------------------

Our current hit count estimations are terrible. I don't think any user would 
want to rely on them or even display them in a UI. Problem is that hit counts 
are useful from a UI perspective, for instance for pagination, or to improve 
the user experience by giving users a sense of how many matches there are and 
giving confidence in the search engine by showing the user that there is a lot 
of content that matches his query.

I think an ok trade-off that would address the two above use-cases would be to 
only count up to a certain hit count? For instance if you allow users to 
paginate up to page 10 and have 20 hits per page, you only need to count up to 
200 hits to know how many pages to display. Similarly if your end goal is only 
to show users that you have lots of content, you could only count up to eg. 
10,000 matches and show something like "more than 10,000 hits" in the UI if 
that number is reached. In both cases, this should help keep the counting 
overhead contained so that it doesn't end up being the bottleneck of query 
processing?

I believe both TopScoreDocCollector and TopFieldCollector could easily be 
changed in order to replace `boolean trackTotalHits` with something like `int 
maxTotalHits` and we would stop counting after visiting maxTotalHits documents?

Regarding integration in IndexSearcher, I am thinking of 3 ideas:
 - hardcode a value for this parameter, maybe 10,000 and rename 
TopDocs.totalHits to make sure users get a compile error
 - add a parameter to the search() methods to require users to pass a 
maxTotalHits
 - add a required constructor argument to IndexSearcher that would affect all 
search() methods

We could also make the top docs collectors just compute a ScoreDoc[] (ie. no 
total hits) and require users to compute the hit count separately, but I'm 
concerned that it would make simple usage of Lucene harder.

Opinions?

> Require users to tell us whether they need total hit counts
> -----------------------------------------------------------
>
>                 Key: LUCENE-8060
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8060
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: master (8.0)
>
>
> We are getting optimizations when hit counts are not required (sorted 
> indexes, MAXSCORE, short-circuiting of phrase queries) but our users won't 
> benefit from them unless we disable exact hit counts by default or we require 
> them to tell us whether hit counts are required.
> I think making hit counts approximate by default is going to be a bit trappy, 
> so I'm rather leaning towards requiring users to tell us explicitly whether 
> they need total hit counts. I can think of two ways to do that: either by 
> passing a boolean to the IndexSearcher constructor or by adding a boolean to 
> all methods that produce TopDocs instances. I like the latter better but I'm 
> open to discussion or other ideas?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to