[ https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000180#comment-13000180 ]
Simon Willnauer commented on LUCENE-2881: ----------------------------------------- Thanks for the clarification, michael! bq. ... the smallest number available in the local map is picked (to keep the numbers dense). ... Oh man I was not aware of this. I got totally confused... see next comment {quote} Hmm, after writing this example down I'm realizing that it would be better to just always pick the next available global field number for a new field, then, at least until we get DWPTs, we should never get different numbers across segments, I think? The disadvantage would be that FieldInfos could have "gaps" in the numbers. I implemented the current approach because I wanted to avoid those gaps, but having them would probably not be a big deal? {quote} This is how I thought it works but I obviously got confused by global vs. local this is also why I had trouble to understand how that failure could ever have happened. But after looking at the code again this makes sense ;) I don't see any problems in FieldInfo number gaps. this should work just fine and guarantee the bulk copy just for now at least. > Track FieldInfo per segment instead of per-IW-session > ----------------------------------------------------- > > Key: LUCENE-2881 > URL: https://issues.apache.org/jira/browse/LUCENE-2881 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: Realtime Branch, CSF branch, 4.0 > Reporter: Simon Willnauer > Assignee: Michael Busch > Fix For: Realtime Branch, CSF branch, 4.0 > > Attachments: lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, > lucene-2881.patch, lucene-2881.patch > > > Currently FieldInfo is tracked per IW session to guarantee consistent global > field-naming / ordering. IW carries FI instances over from previous segments > which also carries over field properties like isIndexed etc. While having > consistent field ordering per IW session appears to be important due to bulk > merging stored fields etc. carrying over other properties might become > problematic with Lucene's Codec support. Codecs that rely on consistent > properties in FI will fail if FI properties are carried over. > The DocValuesCodec (DocValuesBranch) for instance writes files per segment > and field (using the field id within the file name). Yet, if a segment has no > DocValues indexed in a particular segment but a previous segment in the same > IW session had DocValues, FieldInfo#docValues will be true since those > values are reused from previous segments. > We already work around this "limitation" in SegmentInfo with properties like > hasVectors or hasProx which is really something we should manage per Codec & > Segment. Ideally FieldInfo would be managed per Segment and Codec such that > its properties are valid per segment. It also seems to be necessary to bind > FieldInfoS to SegmentInfo logically since its really just per segment > metadata. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org