Let Codec consume entire document
---------------------------------

                 Key: LUCENE-2935
                 URL: https://issues.apache.org/jira/browse/LUCENE-2935
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Codecs, Index
    Affects Versions: CSF branch, 4.0
            Reporter: Simon Willnauer
            Assignee: Simon Willnauer
             Fix For: CSF branch, 4.0


Currently the codec API is limited to consume Terms & Postings upon a segment 
flush. To enable stored fields & DocValues to make use of the Codec abstraction 
codecs should allow to pull a consumer ahead of flush time and consume all 
values from a document's field though a consumer API. An alternative to 
consuming the entire document would be extending FieldsConsumer to return a 
StoredValueConsumer / DocValuesConsumer like it is done in DocValues - Branch 
right now side by side to the TermsConsumer. Yet, extending this has proven to 
be very tricky and error prone for several reasons:
* FieldsConsumer requires SegmentWriteState which might be different upon flush 
compared to when the document is consumed. SegmentWriteState must therefor be 
created twice 1. when the first docvalues field is indexed 2. when flushed. 
* FieldsConsumer are current pulled for each indexed field no matter if there 
are terms to be indexed or not. Yet, if we use something like DocValuesCodec 
which essentially wraps another codec and creates FieldConsumer on demand the 
wrapped codecs consumer might not be initialized even if the field is indexed. 
This causes problems once such a field is opened but missing the required files 
for that codec. I added some harsh logic to work around this which should be 
prevented.
* SegmentCodecs are created for each SegmentWriteState which might yield wrong 
codec IDs depending on how fields numbers are assigned. We currently depend on 
the fact that all fields for a segment and therefore their codecs are known 
when SegmentCodecs are build. To enable consuming perDoc values in codecs we 
need to do that incrementally

Codecs should instead provide a DocumentConsumer side by side with the 
FieldsConsumer created prior to flush. This is also a prerequisite for 
LUCENE-2621

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to