[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

Doug Cutting (JIRA) Wed, 17 Dec 2008 10:08:08 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657466#action_12657466
 ]


Doug Cutting commented on LUCENE-1483:
--------------------------------------

bq. I would actually be fine with keeping HitCollector, adding a default 
"setNextReader" method, that either throws UOE or (if we are strongly against 
exceptions) returns "false" indicating it cannot handle sequential readers.

Could we instead add a new HitCollector subclass, that adds the setNextReader, 
then use 'instanceof' to decide whether to wrap or not?

bq. I really don't fully understand BooleanScorer!

The original version of BooleanScorer uses a ~16k array to score windows of 
docs.  So it scores docs 0-16k first, then docs 16-32k, etc. For each window it 
iterates through all query terms and accumulates a score in table[doc%16k].  It 
also stores in the table a bitmask representing which terms contributed to the 
score.  Non-zero scores are chained in a linked list.  At the end of scoring 
each window it then iterates through the linked list and, if the bitmask 
matches the boolean constraints, collects a hit.  For boolean queries with lots 
of frequent terms this can be much faster, since it does not need to update a 
priority queue for each posting, instead performing constant-time operations 
per posting.  The only downside is that it results in hits being delivered 
out-of-order within the window, which means it cannot be nested within other 
scorers.  But it works well as a top-level scorer.  The new BooleanScorer2 
implementation instead works by merging priority queues of postings, albeit 
with some clever tricks.  For example, a pure conjunction (all terms required) 
does not require a priority queue.  Instead it sorts the posting streams at the 
start, then repeatedly skips the first to to the last.  If the first ever 
equals the last, then there's a hit.  When some terms are required and some 
terms are optional, the conjunction can be evaluated first, then the optional 
terms can all skip to the match and be added to the score.  Thus the 
conjunction can reduce the number of priority queue updates for the optional 
terms.  Does that help any?


> Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-1483
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1483
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.9
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch
>
>
> FieldCache and Filters are forced down to a single segment reader, allowing 
> for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

Reply via email to