[ https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008550#comment-13008550 ]
Simon Willnauer commented on LUCENE-2881: ----------------------------------------- {quote} I think we should put header (id + version, ie CodecUtil.write/readHeader) on fnx file? {quote} man, I told myself to add it about 10 times during that patch :) bq. When does FieldNumberBiMap init from another...? ah legacy {quote} I'm a little worried that we name the new file _X.fnx, because it will appear that this file 'belongs' to segment X, which is dangerous because in some recovery cases we will remove all files associated w/ a given segment (ie, _X.*). Maybe, we can name it without the leading _? Ie, 0.fnx, 1.fnx, etc.? {quote} right, good catch! that we can simple remove and then it should be clear that its not a file belonging to a certain segment bq. In IW.getGlobalFieldNumberMap... shouldn't that "legacy" logic be.... yeah thats is where is should be though. I will move it in. bq. The addition of "si.hasProx = hasProx" in SegmentInfo.clone isn't... true I will remove bq. Why do we default SegmentInfos.format now...? Seems spooky? this hasn't been used in SIS before so I think it didn't matter before. Yet, I check the format in files() so if you create the SIS without reading it its set to 0. I can certainly make that work with default to 0 but it seemed just natural to have it assigned the current_format. I think its fine.... bq. In SegmentInfos.rollbackCommit shouldn't we set the pendingMapVersion to -1 ah good catch! thanks I will fix those issues and upload another patch. Thanks mike for reviewing!!!! > Track FieldInfo per segment instead of per-IW-session > ----------------------------------------------------- > > Key: LUCENE-2881 > URL: https://issues.apache.org/jira/browse/LUCENE-2881 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: Realtime Branch, CSF branch, 4.0 > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Fix For: Realtime Branch, CSF branch, 4.0 > > Attachments: LUCENE-2881.patch, LUCENE-2881.patch, LUCENE-2881.patch, > lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, > lucene-2881.patch > > > Currently FieldInfo is tracked per IW session to guarantee consistent global > field-naming / ordering. IW carries FI instances over from previous segments > which also carries over field properties like isIndexed etc. While having > consistent field ordering per IW session appears to be important due to bulk > merging stored fields etc. carrying over other properties might become > problematic with Lucene's Codec support. Codecs that rely on consistent > properties in FI will fail if FI properties are carried over. > The DocValuesCodec (DocValuesBranch) for instance writes files per segment > and field (using the field id within the file name). Yet, if a segment has no > DocValues indexed in a particular segment but a previous segment in the same > IW session had DocValues, FieldInfo#docValues will be true since those > values are reused from previous segments. > We already work around this "limitation" in SegmentInfo with properties like > hasVectors or hasProx which is really something we should manage per Codec & > Segment. Ideally FieldInfo would be managed per Segment and Codec such that > its properties are valid per segment. It also seems to be necessary to bind > FieldInfoS to SegmentInfo logically since its really just per segment > metadata. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org