[jira] Updated: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

Michael McCandless (JIRA) Sat, 13 Dec 2008 13:45:09 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael McCandless updated LUCENE-1483:
---------------------------------------

    Attachment: LUCENE-1483.patch

Attached initial patch (derived from one of the earlier patches).
Alot of work remains.  TestSort (and likely others) fail.

{quote}
> Thats were I don't follow though - its not ords in the queue right? Its 
> ScoreDocs. Thats whats getting me at the moment. 
{quote}

Exactly -- so I built first cut at the alternative "copy value"
approach, where the comparator (new FieldComparator abstract class) is
responsible for holding the values it needs for docs inserted into the
queue.  I also added TopFieldValueDocCollector (extends DocCollector),
and ByValueFieldSortedHitQueue (extends PriorityQueue) that interacts
with the FieldComparators.  (We can change these names...).  I updated
IndexSearcher to use this new queue for field sorting.

This patch only handles SortField.{DOC,SCORE,INT} now, but I think the
approach has early surprising promise: I'm seeing a sizable
performance gain for the "sort by int field" case (13.76 sec vs 17.95
sec for 300 queries getting top 100 hits from 1M results) --> 23%
faster.  I verified for the test sort alg (above) it's producing the
right results (at least top 40 docs match).

I didn't expect such performance gain (I was hoping for not much
performance loss, actually).  I think it may be that although the
initial value copy adds some cost, the within-queue comparsions are
then faster because you don't have to deref back to the fieldcache
array.  It seems we keep accidentally discovering performance gains
here :)

If we go forward with this approach I think it'd mean deprecating
FieldSortedHitQueue & ScoreDocComparator, because I think there's no
back-compatible way to migrate forward.  I also like that this
approach means we only need an iterator interface to FieldCache
values (for LUCENE-831).

Mark can you look this over and see if it makes sense and maybe try to
tackle the other sort types?  String will be the most interesting but
I think very doable.


> Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-1483
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1483
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.9
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch
>
>
> FieldCache and Filters are forced down to a single segment reader, allowing 
> for individual segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

Reply via email to