[ 
https://issues.apache.org/jira/browse/LUCENE-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558386#action_12558386
 ] 

Paul Elschot commented on LUCENE-893:
-------------------------------------

I think the different results of 26 May 2007 for conjunction queries and 
disjunction queries may be caused by the use of TermScorer.skipTo() in 
conjunctions and TermScorer.next() in disjunctions.

That points to different optimal buffer sizes for conjunctions (smaller because 
of the skipping) and for disjunctions (larger because all postings are going to 
be needed).

LUCENE-430 is about reducing term buffer size for the case when the buffer is 
not going to be used completely because of the small number of documents 
containing the term.

In all, I think it makes sense to allow the  (conjunction/disjunction)Scorer to 
choose the maximum buffer size for the term, and let the term itself choose a 
lower value when it needs less than that.

Another way to promote sequential reading for disjunction queries is to process 
all their terms sequentially, i.e. one term at a time. In lucene this is 
currently done by Filters for prefix queries and ranges. Unfortunately this 
cannot be done when the combined frequency of the terms in each document is 
needed. In that case DisjunctionSumScorer could be used, with larger buffers on 
the terms that are contained in many documents.

> Increase buffer sizes used during searching
> -------------------------------------------
>
>                 Key: LUCENE-893
>                 URL: https://issues.apache.org/jira/browse/LUCENE-893
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.1
>            Reporter: Michael McCandless
>
> Spinoff of LUCENE-888.
> In LUCENE-888 we increased buffer sizes that impact indexing and found
> substantial (10-18%) overall performance gains.
> It's very likely that we can also gain some performance for searching
> by increasing the read buffers in BufferedIndexInput used by
> searching.
> We need to test performance impact to verify and then pick a good
> overall default buffer size, also being careful not to add too much
> overall HEAP RAM usage because a potentially very large number of
> BufferedIndexInput instances are created during searching
> (# segments X # index files per segment).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to