[ 
https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992156#comment-12992156
 ] 

Simon Willnauer commented on LUCENE-2881:
-----------------------------------------

{quote}
New patch that removes the tracking of 'hasVectors' and 'hasProx' in 
SegmentInfo. Instead SegmentInfo now has a reference to its corresponding 
FieldInfos.
{quote}

Wow! nice work Michael! I like how you preserve bw compat in SegmentInfo and 
FieldInfos is now bound to SegmentInfo - yay! this solves two problems at once 
for DocValues branch.

bq.The alternative would be to rewrite the FieldInfos instead of just copying 
the files, but then we have to rewrite the cfs files.

I think  copying over is fine. Ideally we will move all those boolean etc to 
the codec level so that we don't need that at all. Once stored fields and 
vectors are written by the codec we can push all that into PreFlex codec 
(maybe!?) and get rid of the bw compat code.

I think you should commit that patch. I'll port to docvalues and run some tests 
that rely on this issue.

> Track FieldInfo per segment instead of per-IW-session
> -----------------------------------------------------
>
>                 Key: LUCENE-2881
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2881
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: Realtime Branch, CSF branch, 4.0
>            Reporter: Simon Willnauer
>            Assignee: Michael Busch
>             Fix For: Realtime Branch, CSF branch, 4.0
>
>         Attachments: lucene-2881.patch, lucene-2881.patch
>
>
> Currently FieldInfo is tracked per IW session to guarantee consistent global 
> field-naming / ordering. IW carries FI instances over from previous segments 
> which also carries over field properties like isIndexed etc. While having 
> consistent field ordering per IW session appears to be important due to bulk 
> merging stored fields etc. carrying over other properties might become 
> problematic with Lucene's Codec support.  Codecs that rely on consistent 
> properties in FI will fail if FI properties are carried over.
> The DocValuesCodec (DocValuesBranch) for instance writes files per segment 
> and field (using the field id within the file name). Yet, if a segment has no 
> DocValues indexed in a particular segment but a previous segment in the same 
> IW session had DocValues, FieldInfo#docValues will be true  since those 
> values are reused from previous segments. 
> We already work around this "limitation" in SegmentInfo with properties like 
> hasVectors or hasProx which is really something we should manage per Codec & 
> Segment. Ideally FieldInfo would be managed per Segment and Codec such that 
> its properties are valid per segment. It also seems to be necessary to bind 
> FieldInfoS to SegmentInfo logically since its really just per segment 
> metadata.  

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to