[ https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000581#comment-13000581 ]
Simon Willnauer commented on LUCENE-2881: ----------------------------------------- For the record, robert reverted the changes made by this issue since we have been experiencing a fair bit of [problems|https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/5420/] lately. eventually reproducible with: {code} ant test -Dtestcase=SimpleFacetsTest -Dtestmethod=testFacetSingleValued -Dtests.seed=-4971136915249645135:5200209917417531291 -Dtests.multiplier=3 ant test -Dtestcase=SimpleFacetsTest -Dtestmethod=testFacetSingleValuedFcs -Dtests.seed=-4971136915249645135:-3738166620811568832 -Dtests.multiplier=3 ant test -Dtestcase=SimpleFacetsTest -Dtestmethod=testFacetPrefixMultiValued -Dtests.seed=-4971136915249645135:4594369826150277150 -Dtests.multiplier=3 ant test -Dtestcase=SimpleFacetsTest -Dtestmethod=testFacetPrefixSingleValued -Dtests.seed=-4971136915249645135:-7702531001769827248 -Dtests.multiplier=3 ant test -Dtestcase=SimpleFacetsTest -Dtestmethod=testFacetPrefixSingleValuedFcs -Dtests.seed=-4971136915249645135:698398490325732548 -Dtests.multiplier=3 {code} I found the problem causing this where certain field numbers got mixed up when the FieldInfos get build initially in IndexWriter and a segment is loaded first which had gaps in its field numbering. FieldInfos is ignoring the FieldInfo's number if the FieldInfo does not exist yet and tries to assigne a new "local" field number. But if the next available field number x while the actual FI's number was > x+1 the new added FI will be set to x instead. in other words, lets say we have 2 segments: {code} seg1 : { fields : [(a:0, c:2)] } seg2 : { fields : [(a:0, b:1, c:2)] } {code} if we load seg1's FI we end up with {code}fields : [(a:0, c:1)] {code} then we add seg2's FI's and end up with {code}fields : [(a:0, c:1, b:2)] {code} this will also explain the TestNRTThreads.testNRTThreads failure where bulkMerge could not be applied due to different field numbers across segments. I will upload a patch tomorrow. > Track FieldInfo per segment instead of per-IW-session > ----------------------------------------------------- > > Key: LUCENE-2881 > URL: https://issues.apache.org/jira/browse/LUCENE-2881 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: Realtime Branch, CSF branch, 4.0 > Reporter: Simon Willnauer > Assignee: Michael Busch > Fix For: Realtime Branch, CSF branch, 4.0 > > Attachments: lucene-2881.patch, lucene-2881.patch, lucene-2881.patch, > lucene-2881.patch, lucene-2881.patch > > > Currently FieldInfo is tracked per IW session to guarantee consistent global > field-naming / ordering. IW carries FI instances over from previous segments > which also carries over field properties like isIndexed etc. While having > consistent field ordering per IW session appears to be important due to bulk > merging stored fields etc. carrying over other properties might become > problematic with Lucene's Codec support. Codecs that rely on consistent > properties in FI will fail if FI properties are carried over. > The DocValuesCodec (DocValuesBranch) for instance writes files per segment > and field (using the field id within the file name). Yet, if a segment has no > DocValues indexed in a particular segment but a previous segment in the same > IW session had DocValues, FieldInfo#docValues will be true since those > values are reused from previous segments. > We already work around this "limitation" in SegmentInfo with properties like > hasVectors or hasProx which is really something we should manage per Codec & > Segment. Ideally FieldInfo would be managed per Segment and Codec such that > its properties are valid per segment. It also seems to be necessary to bind > FieldInfoS to SegmentInfo logically since its really just per segment > metadata. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org