[ https://issues.apache.org/jira/browse/LUCENE-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556214#comment-16556214 ]
Adrien Grand commented on LUCENE-8060: -------------------------------------- {quote}I don't know a lot about how the current estimation code works {quote} It assumes that the density of matches is the same in the whole index. So if docs are collected exactly until doc id 1000 and there are 1M documents in the index, it just multiplies the number of collected documents by 1000. This is often a bad estimate and we have no idea of how large the error is. {quote}would that even be possible? {quote} I'm not aware of ways to get good estimates for queries that match many documents efficiently, especially conjunctions. So the error bound would be terrible in those cases I'm afraid. Maybe we could give a lower bound and an upper bound, or an enum that would say whether the hit count is accurate or a lower bound of the actual hit count. {quote}i would go so far as to suggest that in that in that situation, hardcoding maxTotalHits/minExactTotalHits to "0" (ie: don't bother trying to track exactly at all) would be fine. {quote} OK. Thanks for the feedback! > Require users to tell us whether they need total hit counts > ----------------------------------------------------------- > > Key: LUCENE-8060 > URL: https://issues.apache.org/jira/browse/LUCENE-8060 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Fix For: master (8.0) > > > We are getting optimizations when hit counts are not required (sorted > indexes, MAXSCORE, short-circuiting of phrase queries) but our users won't > benefit from them unless we disable exact hit counts by default or we require > them to tell us whether hit counts are required. > I think making hit counts approximate by default is going to be a bit trappy, > so I'm rather leaning towards requiring users to tell us explicitly whether > they need total hit counts. I can think of two ways to do that: either by > passing a boolean to the IndexSearcher constructor or by adding a boolean to > all methods that produce TopDocs instances. I like the latter better but I'm > open to discussion or other ideas? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org