[jira] Commented: (LUCENE-1593) Optimizations to TopScoreDocCollector and TopFieldCollector

Michael McCandless (JIRA) Wed, 29 Apr 2009 13:56:38 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704323#action_12704323
 ]


Michael McCandless commented on LUCENE-1593:
--------------------------------------------

{quote}
I think we should have an issue handling interfaces deprecation in general for 
2.9, since just deprecating Weight does not solve it. You'd have to deprecate 
Searchable.search* methods which accept Weight, but Searchable is an interface, 
so you might want to deprecate it entirely and create an AbstractSearchable? 
That I think also deserves its own thread, don't you think?
{quote}
Yes, and this presumably depends on the outcome of the first "how much can 
change in 3.0" thread.

bq. I thought that perhaps we can make the following change

Once again I'm lacking clarity.... there are many related possible
improvements to searching:

  * This "top" vs "not-top" scorer difference being more explicit

  * Merging Query/Filter (LUCENE-1518), allowing Filter as a clause to
    BooleanQuery (LUCENE-1345): it still feels like Query should be a
    subclass of Filter, since Query "simply" adds scoring to a
    Filter.

  * Pushing random-access filters down to the TermScorers, and
    pre-multiplying in deletes when posible (LUCENE-1536)

  * Similarly pushing "bottomValue" down to TermScorers for
    field-sorted searching

  * Have a single query make a "cheap" and "expensive" scorer so that
    all "cheap" scorers are checked first and only if they pass are
    expensive ones checked (LUCENE-1252)

  * The possible "Scorer.check" (LUCENE-1614) to test if a doc passes
    w/o next'ing

  * For AND scoring, picking carefully in what order to test the
    iterators, maybe also choosing when to use "check" instead of
    "advance" for some.

  * "Multiplying out" compound queries.  EG +X (A OR B) makes a nested
    BooleanQuery; multiplying it out and then somehow sharing a single
    iterator for X's TermScorer, should give better performance.
    Other "structural" optimizations could apply.

  * Far-out, and not really affecting APIs, but still related: source
    code specialization (LUCENE-1594) to get speedups

I'm not yet sure what steps to take now (and how) vs later...


> Optimizations to TopScoreDocCollector and TopFieldCollector
> -----------------------------------------------------------
>
>                 Key: LUCENE-1593
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1593
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1593.patch, PerfTest.java
>
>
> This is a spin-off of LUCENE-1575 and proposes to optimize TSDC and TFC code 
> to remove unnecessary checks. The plan is:
> # Ensure that IndexSearcher returns segements in increasing doc Id order, 
> instead of numDocs().
> # Change TSDC and TFC's code to not use the doc id as a tie breaker. New docs 
> will always have larger ids and therefore cannot compete.
> # Pre-populate HitQueue with sentinel values in TSDC (score = Float.NEG_INF) 
> and remove the check if reusableSD == null.
> # Also move to use "changing top" and then call adjustTop(), in case we 
> update the queue.
> # some methods in Sort explicitly add SortField.FIELD_DOC as a "tie breaker" 
> for the last SortField. But, doing so should not be necessary (since we 
> already break ties by docID), and is in fact less efficient (once the above 
> optimization is in).
> # Investigate PQ - can we deprecate insert() and have only 
> insertWithOverflow()? Add a addDummyObjects method which will populate the 
> queue without "arranging" it, just store the objects in the array (this can 
> be used to pre-populate sentinel values)?
> I will post a patch as well as some perf measurements as soon as I have them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1593) Optimizations to TopScoreDocCollector and TopFieldCollector

Reply via email to