[
https://issues.apache.org/jira/browse/LUCENE-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010798#comment-13010798
]
Simon Willnauer commented on LUCENE-2985:
-----------------------------------------
bq. I wonder if we should pass the segmentCodecsBuilder to FieldInfos? This
way, FieldInfos.add/update could set the codecID, instead of caller doing it
after the fact (in DocFieldProcessorPerThread)?
here is the thing, I first added it to FieldInfos since it appears to be the
place for that kind of stuff. Yet, the first problem is that
DocFieldProcessorPerThread is caching the FI for each DFPPerField so I would
really need to add it to each FieldInfo (FI not FIs). Further having another
invariant in FIs that only applies if we are writing is something I tried to
prevent in the first place and eventually SegementCodecs is somewhat internal
to the SegmentInfo and not to the FieldInfos and I tried to couple them only by
the codec ID though. I agree this would be easier and less disturbing in the
code. I'd love to find a better way to do that really.... except of this part
in DocFieldProcessorPerThread is smooth though :/
> Build SegmentCodecs incrementally for consistent codecIDs during indexing
> -------------------------------------------------------------------------
>
> Key: LUCENE-2985
> URL: https://issues.apache.org/jira/browse/LUCENE-2985
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Codecs, Index
> Affects Versions: CSF branch, 4.0
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Fix For: CSF branch, 4.0
>
> Attachments: LUCENE-2985.patch
>
>
> currently we build the SegementCodecs during flush which is fine as long as
> no codec needs to know which fields it should handle. This will change with
> DocValues or when we expose StoredFields / TermVectors via Codec (see
> LUCENE-2621 or LUCENE-2935). The other downside it that we don't have a
> consistent view of which codec belongs to which field during indexing and all
> FieldInfo instances are unassigned (set to -1). Instead we should build the
> SegmentCodecs incrementally as fields come in so no matter when a codec needs
> to be selected to process a document / field we have the right codec ID
> assigned.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]