[ https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203384#comment-16203384 ]
Robert Muir commented on LUCENE-4100: ------------------------------------- {quote} This is what I wanted to do first but I didn'l like the fact that it would allow passing needsScores=false and needsTotalHits=false, which doesn't make sense. If you still prefer the two booleans approach despite this, I'm happy to make the change. {quote} Why doesn't it make sense? If i do a query, sorting by reverse time (recency), and retrieve the top 20, then i don't need scores, why do i need an exact hit count too? I think an approximation would suffice. {quote} I agree with that statement, but how do we compute a good estimate? It sounds challenging as the number of collected documents might be much less than the actual number of hits while the cost of the scorer can be highly overestimated, eg. for phrase queries. Should I return the number of collected documents and add documentation that this is a lower bound of the total number of hits? {quote} I think naively we want to base it on where we early terminate (as oppose to maxdoc) but i get the idea with many clauses. still, i think this estimate may be "good enough" because as you paginate, the estimate would get better? > Maxscore - Efficient Scoring > ---------------------------- > > Key: LUCENE-4100 > URL: https://issues.apache.org/jira/browse/LUCENE-4100 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs, core/query/scoring, core/search > Affects Versions: 4.0-ALPHA > Reporter: Stefan Pohl > Labels: api-change, gsoc2014, patch, performance > Fix For: 4.9, 6.0 > > Attachments: LUCENE-4100.patch, LUCENE-4100.patch, > contrib_maxscore.tgz, maxscore.patch > > > At Berlin Buzzwords 2012, I will be presenting 'maxscore', an efficient > algorithm first published in the IR domain in 1995 by H. Turtle & J. Flood, > that I find deserves more attention among Lucene users (and developers). > I implemented a proof of concept and did some performance measurements with > example queries and lucenebench, the package of Mike McCandless, resulting in > very significant speedups. > This ticket is to get started the discussion on including the implementation > into Lucene's codebase. Because the technique requires awareness about it > from the Lucene user/developer, it seems best to become a contrib/module > package so that it consciously can be chosen to be used. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org