[ 
https://issues.apache.org/jira/browse/LUCENE-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213617#comment-14213617
 ] 

Shai Erera commented on LUCENE-6062:
------------------------------------

I found the problem. With your change to the test, you created the following 
scenario: update a non-existing NDV field in a segment with other NDV fields 
(note that without this change, the test ensures that you can update a 
non-existing NDV fields in a segment without any other NDV fields).

What happens is that in this code of SegmentDocValuesProducer:

{code}
          if (baseProducer == null) {
            // the base producer gets all the fields, so the Codec can validate 
properly
            baseProducer = segDocValues.getDocValuesProducer(docValuesGen, si, 
IOContext.READ, dir, dvFormat, fieldInfos);
            dvGens.add(docValuesGen);
            dvProducers.add(baseProducer);
          }
{code}

We pass all the fieldInfos, which now also contain an FI for 'ndv'. But that 
field was never written to the base segment file (the .cfs), and so it cannot 
be found there...

Not yet sure how to resolve it. We pass all the FIS because e.g. Lucene50DVP 
verifies that every field it encounters in the metadata file has a matching 
entry in the given FieldInfos (to check for index corruption). So we cannot 
just pass only the FIs with dvGen=-1. On the other hand, we do have a case here 
where the base .cfs never had an instance of that field ... it's like we need 
to know in which 'gen' a DV field was introduced. Then we can pass to 
baseProducer all the FIs whose startGen==-1...

> Index corruption from numeric DV updates
> ----------------------------------------
>
>                 Key: LUCENE-6062
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6062
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Michael McCandless
>             Fix For: 4.10.3, 5.0, Trunk
>
>
> I hit this while working on on LUCENE-6005: when cutting over 
> TestNumericDocValuesUpdates to the new Document2 API, I accidentally enabled 
> additional docValues in the test, and this this:
> {noformat}
> There was 1 failure:
> 1) 
> testUpdateSegmentWithNoDocValues(org.apache.lucene.index.TestNumericDocValuesUpdates)
> java.io.FileNotFoundException: _1_Asserting_0.dvm in 
> dir=RAMDirectory@259847e5 
> lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@30981eab
>       at __randomizedtesting.SeedInfo.seed([0:7C88A439A551C47D]:0)
>       at 
> org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:645)
>       at 
> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110)
>       at 
> org.apache.lucene.codecs.lucene50.Lucene50DocValuesProducer.<init>(Lucene50DocValuesProducer.java:130)
>       at 
> org.apache.lucene.codecs.lucene50.Lucene50DocValuesFormat.fieldsProducer(Lucene50DocValuesFormat.java:182)
>       at 
> org.apache.lucene.codecs.asserting.AssertingDocValuesFormat.fieldsProducer(AssertingDocValuesFormat.java:66)
>       at 
> org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.<init>(PerFieldDocValuesFormat.java:267)
>       at 
> org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:357)
>       at 
> org.apache.lucene.index.SegmentDocValues.newDocValuesProducer(SegmentDocValues.java:51)
>       at 
> org.apache.lucene.index.SegmentDocValues.getDocValuesProducer(SegmentDocValues.java:68)
>       at 
> org.apache.lucene.index.SegmentDocValuesProducer.<init>(SegmentDocValuesProducer.java:63)
>       at 
> org.apache.lucene.index.SegmentReader.initDocValuesProducer(SegmentReader.java:167)
>       at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:109)
>       at 
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:58)
>       at 
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:50)
>       at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:556)
>       at 
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:50)
>       at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
>       at 
> org.apache.lucene.index.TestNumericDocValuesUpdates.testUpdateSegmentWithNoDocValues(TestNumericDocValuesUpdates.java:769)
> {noformat}
> A one-line change to the existing test (on trunk) causes this corruption:
> {noformat}
> Index: 
> lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java
> ===================================================================
> --- 
> lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java 
>     (revision 1639580)
> +++ 
> lucene/core/src/test/org/apache/lucene/index/TestNumericDocValuesUpdates.java 
>     (working copy)
> @@ -750,6 +750,7 @@
>      // second segment with no NDV
>      doc = new Document();
>      doc.add(new StringField("id", "doc1", Store.NO));
> +    doc.add(new NumericDocValuesField("foo", 3));
>      writer.addDocument(doc);
>      doc = new Document();
>      doc.add(new StringField("id", "doc2", Store.NO)); // document that isn't 
> updated
> {noformat}
> For some reason, the base doc values for the 2nd segment is not being 
> written, but clearly should have (to hold field "foo")... I'm not sure why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to