[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669227#action_12669227
]
Michael McCandless commented on LUCENE-1483:
--------------------------------------------
bq. After a reopen, I could tear apart the MultiSegmentReader and create my own
MultiReader, wrapping each reader. The only real difference between the two is
getVersion() and isCurrent(), right?
That's a neat idea! But there are other differences (eg you wouldn't be able
to make changes, I think? Because no IndexReader has the SegmentInfos?).
> Change IndexSearcher multisegment searches to search each individual segment
> using a single HitCollector
> --------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
> Issue Type: Improvement
> Affects Versions: 2.9
> Reporter: Mark Miller
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1483-backcompat.patch, LUCENE-1483-partial.patch,
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
> LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, sortCollate.py
>
>
> This issue changes how an IndexSearcher searches over multiple segments. The
> current method of searching multiple segments is to use a MultiSegmentReader
> and treat all of the segments as one. This causes filters and FieldCaches to
> be keyed to the MultiReader and makes reopen expensive. If only a few
> segments change, the FieldCache is still loaded for all of them.
> This patch changes things by searching each individual segment one at a time,
> but sharing the HitCollector used across each segment. This allows
> FieldCaches and Filters to be keyed on individual SegmentReaders, making
> reopen much cheaper. FieldCache loading over multiple segments can be much
> faster as well - with the old method, all unique terms for every segment is
> enumerated against each segment - because of the likely logarithmic change in
> terms per segment, this can be very wasteful. Searching individual segments
> avoids this cost. The term/document statistics from the multireader are used
> to score results for each segment.
> When sorting, its more difficult to use a single HitCollector for each sub
> searcher. Ordinals are not comparable across segments. To account for this, a
> new field sort enabled HitCollector is introduced that is able to collect and
> sort across segments (because of its ability to compare ordinals across
> segments). This TopFieldCollector class will collect the values/ordinals for
> a given segment, and upon moving to the next segment, translate any
> ordinals/values so that they can be compared against the values for the new
> segment. This is done lazily.
> All and all, the switch seems to provide numerous performance benefits, in
> both sorted and non sorted search. We were seeing a good loss on indices with
> lots of segments (1000?) and certain queue sizes / queries, but the latest
> results seem to show thats been mostly taken care of (you shouldnt be using
> such a large queue on such a segmented index anyway).
> * Introduces
> ** MultiReaderHitCollector - a HitCollector that can collect across multiple
> IndexReaders. Old HitCollectors are wrapped to support multiple IndexReaders.
> ** TopFieldCollector - a HitCollector that can compare values/ordinals across
> IndexReaders and sort on fields.
> ** FieldValueHitQueue - a Priority queue that is part of the
> TopFieldCollector implementation.
> ** FieldComparator - a new Comparator class that works across IndexReaders.
> Part of the TopFieldCollector implementation.
> ** FieldComparatorSource - new class to allow for custom Comparators.
> * Alters
> ** IndexSearcher uses a single HitCollector to collect hits against each
> individual SegmentReader. All the other changes stem from this ;)
> * Deprecates
> ** TopFieldDocCollector
> ** FieldSortedHitQueue
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]