[ https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008530#comment-13008530 ]
Michael McCandless commented on LUCENE-2881: -------------------------------------------- Patch looks much better -- I think it's close! Comments: * I think we should put header (id + version, ie CodecUtil.write/readHeader) on fnx file? * When does FieldNumberBiMap init from another...? (We have ctor that takes a FieldNumberBiMap in... but I don't see who uses that). * Who uses BiMap.entries()? Looks like just for testing? Can you add comment saying "just for testing.."? * FieldInfos ctor that loads from index file name but makes a new bimap seems spooky...? Is this only used by tests now...? * The passed in "version" to SIS.readGlobalFieldMap is unused * I'm a little worried that we name the new file _X.fnx, because it will appear that this file 'belongs' to segment X, which is dangerous because in some recovery cases we will remove all files associated w/ a given segment (ie, _X.*). Maybe, we can name it without the leading _? Ie, 0.fnx, 1.fnx, etc.? * In IW.getGlobalFieldNumberMap... shouldn't that "legacy" logic be moved inside SegmentInfos? Ie it's a bit weird now how we first pull an (empty) biMap from the legacy SegmentInfos then we go and populate it ourselves...? Then we don't need this method in IW (we just ask the SIS for the bimap). * Should FieldInfo.putInternal(FieldInfo) assert that the global bimap "matches"? (Ie, it makes no effort to update the global bimap). * The addition of "si.hasProx = hasProx" in SegmentInfo.clone isn't needed; we pass hasProx to the ctor of that si. * Why do we default SegmentInfos.format now...? Seems spooky? * In SegmentInfos.rollbackCommit shouldn't we set the pendingMapVersion to -1? Should we backport this to 3.x (after sufficient aging)? > Track FieldInfo per segment instead of per-IW-session > ----------------------------------------------------- > > Key: LUCENE-2881 > URL: https://issues.apache.org/jira/browse/LUCENE-2881 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: Realtime Branch, CSF branch, 4.0 > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Fix For: Realtime Branch, CSF branch, 4.0 > > Attachments: LUCENE-2881.patch, LUCENE-2881.patch, LUCENE-2881.patch, > lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, > lucene-2881.patch > > > Currently FieldInfo is tracked per IW session to guarantee consistent global > field-naming / ordering. IW carries FI instances over from previous segments > which also carries over field properties like isIndexed etc. While having > consistent field ordering per IW session appears to be important due to bulk > merging stored fields etc. carrying over other properties might become > problematic with Lucene's Codec support. Codecs that rely on consistent > properties in FI will fail if FI properties are carried over. > The DocValuesCodec (DocValuesBranch) for instance writes files per segment > and field (using the field id within the file name). Yet, if a segment has no > DocValues indexed in a particular segment but a previous segment in the same > IW session had DocValues, FieldInfo#docValues will be true since those > values are reused from previous segments. > We already work around this "limitation" in SegmentInfo with properties like > hasVectors or hasProx which is really something we should manage per Codec & > Segment. Ideally FieldInfo would be managed per Segment and Codec such that > its properties are valid per segment. It also seems to be necessary to bind > FieldInfoS to SegmentInfo logically since its really just per segment > metadata. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org