[
https://issues.apache.org/jira/browse/LUCENE-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000581#comment-13000581
]
Simon Willnauer commented on LUCENE-2881:
-----------------------------------------
For the record, robert reverted the changes made by this issue since we have
been experiencing a fair bit of
[problems|https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/5420/]
lately.
eventually reproducible with:
{code}
ant test -Dtestcase=SimpleFacetsTest -Dtestmethod=testFacetSingleValued
-Dtests.seed=-4971136915249645135:5200209917417531291 -Dtests.multiplier=3
ant test -Dtestcase=SimpleFacetsTest -Dtestmethod=testFacetSingleValuedFcs
-Dtests.seed=-4971136915249645135:-3738166620811568832 -Dtests.multiplier=3
ant test -Dtestcase=SimpleFacetsTest -Dtestmethod=testFacetPrefixMultiValued
-Dtests.seed=-4971136915249645135:4594369826150277150 -Dtests.multiplier=3
ant test -Dtestcase=SimpleFacetsTest -Dtestmethod=testFacetPrefixSingleValued
-Dtests.seed=-4971136915249645135:-7702531001769827248 -Dtests.multiplier=3
ant test -Dtestcase=SimpleFacetsTest
-Dtestmethod=testFacetPrefixSingleValuedFcs
-Dtests.seed=-4971136915249645135:698398490325732548 -Dtests.multiplier=3
{code}
I found the problem causing this where certain field numbers got mixed up when
the FieldInfos get build initially in IndexWriter and a segment is loaded first
which had gaps in its field numbering.
FieldInfos is ignoring the FieldInfo's number if the FieldInfo does not exist
yet and tries to assigne a new "local" field number. But if the next available
field number x while the actual FI's number was > x+1 the new added FI will be
set to x instead.
in other words, lets say we have 2 segments:
{code}
seg1 : { fields : [(a:0, c:2)] }
seg2 : { fields : [(a:0, b:1, c:2)] }
{code}
if we load seg1's FI we end up with
{code}fields : [(a:0, c:1)] {code}
then we add seg2's FI's and end up with
{code}fields : [(a:0, c:1, b:2)] {code}
this will also explain the TestNRTThreads.testNRTThreads failure where
bulkMerge could not be applied due to different field numbers across segments.
I will upload a patch tomorrow.
> Track FieldInfo per segment instead of per-IW-session
> -----------------------------------------------------
>
> Key: LUCENE-2881
> URL: https://issues.apache.org/jira/browse/LUCENE-2881
> Project: Lucene - Java
> Issue Type: Improvement
> Affects Versions: Realtime Branch, CSF branch, 4.0
> Reporter: Simon Willnauer
> Assignee: Michael Busch
> Fix For: Realtime Branch, CSF branch, 4.0
>
> Attachments: lucene-2881.patch, lucene-2881.patch, lucene-2881.patch,
> lucene-2881.patch, lucene-2881.patch
>
>
> Currently FieldInfo is tracked per IW session to guarantee consistent global
> field-naming / ordering. IW carries FI instances over from previous segments
> which also carries over field properties like isIndexed etc. While having
> consistent field ordering per IW session appears to be important due to bulk
> merging stored fields etc. carrying over other properties might become
> problematic with Lucene's Codec support. Codecs that rely on consistent
> properties in FI will fail if FI properties are carried over.
> The DocValuesCodec (DocValuesBranch) for instance writes files per segment
> and field (using the field id within the file name). Yet, if a segment has no
> DocValues indexed in a particular segment but a previous segment in the same
> IW session had DocValues, FieldInfo#docValues will be true since those
> values are reused from previous segments.
> We already work around this "limitation" in SegmentInfo with properties like
> hasVectors or hasProx which is really something we should manage per Codec &
> Segment. Ideally FieldInfo would be managed per Segment and Codec such that
> its properties are valid per segment. It also seems to be necessary to bind
> FieldInfoS to SegmentInfo logically since its really just per segment
> metadata.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]