[jira] [Updated] (LUCENE-4100) Maxscore - Efficient Scoring

Adrien Grand (JIRA) Thu, 12 Oct 2017 08:51:19 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-4100:
---------------------------------
    Attachment: LUCENE-4100.patch

Here is a patch:
 - more docs and tests
 - replaces needsScores with a SearchMode enum as suggested by Robert
 - the MAXSCORE optimization work with top-level disjunctions and filtered 
disjunctions (FILTER or MUST_NOT)
 - TopScoreDocsCollector sets the totalHitCount to -1 when the optimization is 
used since the total hit count is unknown
 - MaxScoreScorer was changed to reason on integers rather than doubles to 
avoid floating-point arithmetic issues. To do that it scales all max scores 
into 0..2^16, rounding up when working on the max scores of sub clauses, and 
down when rounding the min competitive score in order to make sure to not miss 
matches (at the cost of potentially more false positives, but this is fine)

The patch is alreay huge (due to the needsScore/searchMode change mostly) so I 
wanted to do the strict minimum here for this feature to be useful, but we'll 
need follow-ups to make the optimization work with the paging collector, 
conjunctions that have more than one scoring clause, TopFieldCollector when the 
first sort field is the score, integrate it with IndexSearcher (currently you 
need to create the collector manually to use it), etc.

> Maxscore - Efficient Scoring
> ----------------------------
>
>                 Key: LUCENE-4100
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4100
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs, core/query/scoring, core/search
>    Affects Versions: 4.0-ALPHA
>            Reporter: Stefan Pohl
>              Labels: api-change, gsoc2014, patch, performance
>             Fix For: 4.9, 6.0
>
>         Attachments: LUCENE-4100.patch, LUCENE-4100.patch, 
> contrib_maxscore.tgz, maxscore.patch
>
>
> At Berlin Buzzwords 2012, I will be presenting 'maxscore', an efficient 
> algorithm first published in the IR domain in 1995 by H. Turtle & J. Flood, 
> that I find deserves more attention among Lucene users (and developers).
> I implemented a proof of concept and did some performance measurements with 
> example queries and lucenebench, the package of Mike McCandless, resulting in 
> very significant speedups.
> This ticket is to get started the discussion on including the implementation 
> into Lucene's codebase. Because the technique requires awareness about it 
> from the Lucene user/developer, it seems best to become a contrib/module 
> package so that it consciously can be chosen to be used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-4100) Maxscore - Efficient Scoring

Reply via email to