[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

Michael McCandless (JIRA) Sun, 28 Dec 2008 06:38:08 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659436#action_12659436
 ]


Michael McCandless commented on LUCENE-1483:
--------------------------------------------

{quote}
> Only other option I see off hand is a comparator that can do both, but not as 
> clean and probably adds a check in tightly looped code.
{quote}
Right, I wanted to avoid inner-loop check by swapping out the comparator in 
between segments.  Though, modern CPUs are quite good when an if-statement 
consistently goes one way, so it could be a single comparator that does 
internal switching might perform fine.  Still, if we fix the API to return a 
new comparator, we can then allow both options.

I think in some cases we'd even fall back to VAL comparison.

{quote}
>  Is largest to smallest best though?
{quote}

Good question; it's not obvious.  We should try both, and perhaps allow for the 
collector to optionally specify the order.

My thinking was the first large segment using ORD is "free" (because ORD is 
only costly on switching segments).  If there are many hits, likely the queue 
has done most of the work it'll do (ie, the majority of the total # insertions 
will have been done), unless search is "degenerate".  Perhaps the second 
segment, if large, warrants ORD, but them sometime soonish you'd switch to 
ORDDEM or VAL.

The "long tail" of tiny segments would then normally be zipped through w/ 
hardly any insertions, so a higher insertion cost (with zero segment transition 
cost) is OK.

But you're right: if we do the tiny segments first, then the queue would be 
small so transition cost is lower.

We should make it simple to override a method to implement your own "search 
plan", and then provide a default heuristic that decides when to switch 
comparators.  Probably that default heuristic should be based on how often 
compare was actually invoked for the segment.  EG if the String sort is 
secondary to a numeric sort then even if there are many hits, if the numeric 
sort mostly wins (doesn't have many compare(...) == 0's) then the String sort 
should probably immediately switch to VAL after the first segment.

> Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-1483
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1483
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.9
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, sortBench.py, sortCollate.py
>
>
> FieldCache and Filters are forced down to a single segment reader, allowing 
> for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

Reply via email to