[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-1483:
---------------------------------------
Attachment: LUCENE-1483.patch
Attached initial patch (derived from one of the earlier patches).
Alot of work remains. TestSort (and likely others) fail.
{quote}
> Thats were I don't follow though - its not ords in the queue right? Its
> ScoreDocs. Thats whats getting me at the moment.
{quote}
Exactly -- so I built first cut at the alternative "copy value"
approach, where the comparator (new FieldComparator abstract class) is
responsible for holding the values it needs for docs inserted into the
queue. I also added TopFieldValueDocCollector (extends DocCollector),
and ByValueFieldSortedHitQueue (extends PriorityQueue) that interacts
with the FieldComparators. (We can change these names...). I updated
IndexSearcher to use this new queue for field sorting.
This patch only handles SortField.{DOC,SCORE,INT} now, but I think the
approach has early surprising promise: I'm seeing a sizable
performance gain for the "sort by int field" case (13.76 sec vs 17.95
sec for 300 queries getting top 100 hits from 1M results) --> 23%
faster. I verified for the test sort alg (above) it's producing the
right results (at least top 40 docs match).
I didn't expect such performance gain (I was hoping for not much
performance loss, actually). I think it may be that although the
initial value copy adds some cost, the within-queue comparsions are
then faster because you don't have to deref back to the fieldcache
array. It seems we keep accidentally discovering performance gains
here :)
If we go forward with this approach I think it'd mean deprecating
FieldSortedHitQueue & ScoreDocComparator, because I think there's no
back-compatible way to migrate forward. I also like that this
approach means we only need an iterator interface to FieldCache
values (for LUCENE-831).
Mark can you look this over and see if it makes sense and maybe try to
tackle the other sort types? String will be the most interesting but
I think very doable.
> Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
> ---------------------------------------------------------------------------
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
> Issue Type: Improvement
> Affects Versions: 2.9
> Reporter: Mark Miller
> Priority: Minor
> Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch
>
>
> FieldCache and Filters are forced down to a single segment reader, allowing
> for individual segment reloading on reopen.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]