[
https://issues.apache.org/jira/browse/LUCENE-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219528#comment-14219528
]
Uwe Schindler edited comment on LUCENE-6065 at 11/20/14 4:11 PM:
-----------------------------------------------------------------
Maybe i was a little bit too complicated in my explanation, sorry. The main
problem I have is: _a public search API where all public methods are final and
the whole implementation is protected_, which is a horror when it comes to
delegation pattern used by a Filtering API. This feels like Analyzer, which is
unintuitive ([~mikemccand] also explained it with the complexity in analysis in
his post on the mailing list to make a better lucene) :-)
was (Author: thetaphi):
Maybe i was a little bit too complicated in my explanation, sorry. The main
problem I have is: _a public search API where all public methods are final and
the whole implementation is protected_
> remove "foreign readers" from merge, fix LeafReader instead.
> ------------------------------------------------------------
>
> Key: LUCENE-6065
> URL: https://issues.apache.org/jira/browse/LUCENE-6065
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Robert Muir
> Attachments: LUCENE-6065.patch
>
>
> Currently, SegmentMerger has supported two classes of citizens being merged:
> # SegmentReader
> # "foreign reader" (e.g. some FilterReader)
> It does an instanceof check and executes the merge differently. In the
> SegmentReader case: stored field and term vectors are bulk-merged, norms and
> docvalues are transferred directly without piling up on the heap, CRC32
> verification runs with IO locality of the data being merged, etc. Otherwise,
> we treat it as a "foreign" reader and its slow.
> This is just the low-level, it gets worse as you wrap with more stuff. A
> great example there is SortingMergePolicy: not only will it have the
> low-level slowdowns listed above, it will e.g. cache/pile up OrdinalMaps for
> all string docvalues fields being merged and other silliness that just makes
> matters worse.
> Another use case is 5.0 users wishing to upgrade from fieldcache to
> docvalues. This should be possible to implement with a simple incremental
> transition based on a mergepolicy that uses UninvertingReader. But we
> shouldnt populate internal fieldcache entries unnecessarily on merge and
> spike RAM until all those segment cores are released, and other issues like
> bulk merge of stored fields and not piling up norms should still work: its
> completely unrelated.
> There are more problems we can fix if we clean this up,
> checkindex/checkreader can run efficiently where it doesn't need to RAM spike
> like merging, we can remove the checkIntegrity() method completely from
> LeafReader, since it can always be accomplished on producers, etc. In general
> it would be nice to just have one codepath for merging that is as efficient
> as we can make it, and to support things like index modifications during
> merge.
> I spent a few weeks writing 3 different implementations to fix this
> (interface, optional abstract class, "fix LeafReader"), and the latter is the
> only one i don't completely hate: I think our APIs should be efficient for
> indexing as well as search.
> So the proposal is simple, its to instead refactor LeafReader to just require
> the producer APIs as abstract methods (and FilterReaders should work on
> that). The search-oriented APIs can just be final methods that defer to those.
> So we would add 5 abstract methods, but implement 10 current methods as final
> based on those, and then merging would always be efficient.
> {code}
> // new abstract codec-based apis
> /**
> * Expert: retrieve thread-private TermVectorsReader
> * @throws AlreadyClosedException if this reader is closed
> * @lucene.internal
> */
> protected abstract TermVectorsReader getTermVectorsReader();
> /**
> * Expert: retrieve thread-private StoredFieldsReader
> * @throws AlreadyClosedException if this reader is closed
> * @lucene.internal
> */
> protected abstract StoredFieldsReader getFieldsReader();
>
> /**
> * Expert: retrieve underlying NormsProducer
> * @throws AlreadyClosedException if this reader is closed
> * @lucene.internal
> */
> protected abstract NormsProducer getNormsReader();
>
> /**
> * Expert: retrieve underlying DocValuesProducer
> * @throws AlreadyClosedException if this reader is closed
> * @lucene.internal
> */
> protected abstract DocValuesProducer getDocValuesReader();
>
> /**
> * Expert: retrieve underlying FieldsProducer
> * @throws AlreadyClosedException if this reader is closed
> * @lucene.internal
> */
> protected abstract FieldsProducer getPostingsReader();
> // user/search oriented public apis based on the above
> public final Fields fields();
> public final void document(int, StoredFieldVisitor);
> public final Fields getTermVectors(int);
> public final NumericDocValues getNumericDocValues(String);
> public final Bits getDocsWithField(String);
> public final BinaryDocValues getBinaryDocValues(String);
> public final SortedDocValues getSortedDocValues(String);
> public final SortedNumericDocValues getSortedNumericDocValues(String);
> public final SortedSetDocValues getSortedSetDocValues(String);
> public final NumericDocValues getNormValues(String);
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]