[ https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Simon Willnauer updated LUCENE-2881: ------------------------------------ Attachment: LUCENE-2881.patch Attached next iteration. * merged FieldInfoBiMap into FieldInfos * fixed AutoBoxing issues * reverted the deprecation of SegmentInfo#hasVector / hasProx since they where causing many failures due to missing files on exceptions. This also allowed me to make SegmentInfo.clearFilesCache private again. *the FieldInfoBiMap now asserts that if you add a field via setIfNotSet that if either the number or the name exists they are the same as in the mapping. * added lazy loading of FieldInfos in SegmentInfo to prevent opening the CFS file while opening SegmentInfo. SR#CoreReader now passes in the already opened CFSReader to load the FieldInfos * when we open a IW we load the BiMap directly from the SegmentInfos * SegmentInfos now owns the FieldInfoBiMap and writes the map upon commit into an additional .fnx file outside of the CFS file. The SegmentInfos persist the bimap only if it has changed otherwise references the previously written map. I still have one nocommit since I want to add more tests for the persisted global map. Beside that all tests pass and I would appreciate a review though. Mike would you be so kind and let beast break it? > Track FieldInfo per segment instead of per-IW-session > ----------------------------------------------------- > > Key: LUCENE-2881 > URL: https://issues.apache.org/jira/browse/LUCENE-2881 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: Realtime Branch, CSF branch, 4.0 > Reporter: Simon Willnauer > Assignee: Michael Busch > Fix For: Realtime Branch, CSF branch, 4.0 > > Attachments: LUCENE-2881.patch, LUCENE-2881.patch, lucene-2881.patch, > lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, lucene-2881.patch > > > Currently FieldInfo is tracked per IW session to guarantee consistent global > field-naming / ordering. IW carries FI instances over from previous segments > which also carries over field properties like isIndexed etc. While having > consistent field ordering per IW session appears to be important due to bulk > merging stored fields etc. carrying over other properties might become > problematic with Lucene's Codec support. Codecs that rely on consistent > properties in FI will fail if FI properties are carried over. > The DocValuesCodec (DocValuesBranch) for instance writes files per segment > and field (using the field id within the file name). Yet, if a segment has no > DocValues indexed in a particular segment but a previous segment in the same > IW session had DocValues, FieldInfo#docValues will be true since those > values are reused from previous segments. > We already work around this "limitation" in SegmentInfo with properties like > hasVectors or hasProx which is really something we should manage per Codec & > Segment. Ideally FieldInfo would be managed per Segment and Codec such that > its properties are valid per segment. It also seems to be necessary to bind > FieldInfoS to SegmentInfo logically since its really just per segment > metadata. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org