Let Codec consume entire document
---------------------------------
Key: LUCENE-2935
URL: https://issues.apache.org/jira/browse/LUCENE-2935
Project: Lucene - Java
Issue Type: Improvement
Components: Codecs, Index
Affects Versions: CSF branch, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Fix For: CSF branch, 4.0
Currently the codec API is limited to consume Terms & Postings upon a segment
flush. To enable stored fields & DocValues to make use of the Codec abstraction
codecs should allow to pull a consumer ahead of flush time and consume all
values from a document's field though a consumer API. An alternative to
consuming the entire document would be extending FieldsConsumer to return a
StoredValueConsumer / DocValuesConsumer like it is done in DocValues - Branch
right now side by side to the TermsConsumer. Yet, extending this has proven to
be very tricky and error prone for several reasons:
* FieldsConsumer requires SegmentWriteState which might be different upon flush
compared to when the document is consumed. SegmentWriteState must therefor be
created twice 1. when the first docvalues field is indexed 2. when flushed.
* FieldsConsumer are current pulled for each indexed field no matter if there
are terms to be indexed or not. Yet, if we use something like DocValuesCodec
which essentially wraps another codec and creates FieldConsumer on demand the
wrapped codecs consumer might not be initialized even if the field is indexed.
This causes problems once such a field is opened but missing the required files
for that codec. I added some harsh logic to work around this which should be
prevented.
* SegmentCodecs are created for each SegmentWriteState which might yield wrong
codec IDs depending on how fields numbers are assigned. We currently depend on
the fact that all fields for a segment and therefore their codecs are known
when SegmentCodecs are build. To enable consuming perDoc values in codecs we
need to do that incrementally
Codecs should instead provide a DocumentConsumer side by side with the
FieldsConsumer created prior to flush. This is also a prerequisite for
LUCENE-2621
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]