[ 
https://issues.apache.org/jira/browse/LUCENE-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976656#comment-15976656
 ] 

Przemysław Szeremiota commented on LUCENE-7791:
-----------------------------------------------

OK, there is it. It fails on branch_6_5, and passes with patch; rudimentary 
test only for NumericDocValuesWriter, fails with AIOOBE:

{code}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestIndexSorting 
-Dtests.method=testEmptyNonSortedIntField -Dtests.seed=B1B45F478095D85D 
-Dtests.slow=true -Dtests.locale=fr-BE -Dtests.timezone=Canada/Mountain 
-Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
   [junit4] FAILURE 0.02s | TestIndexSorting.testEmptyNonSortedIntField <<<
   [junit4]    > Throwable #1: java.lang.AssertionError: index=127, numBits=64
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([B1B45F478095D85D:EA0CCD4D0DFEC9E8]:0)
   [junit4]    >        at 
org.apache.lucene.util.FixedBitSet.get(FixedBitSet.java:181)
   [junit4]    >        at 
org.apache.lucene.index.NumericDocValuesWriter$SortingNumericIterator.next(NumericDocValuesWriter.java:257)
   [junit4]    >        at 
org.apache.lucene.index.NumericDocValuesWriter$SortingNumericIterator.next(NumericDocValuesWriter.java:228)
   [junit4]    >        at 
org.apache.lucene.codecs.memory.MemoryDocValuesConsumer.addNumericField(MemoryDocValuesConsumer.java:112)
   [junit4]    >        at 
org.apache.lucene.codecs.memory.MemoryDocValuesConsumer.addNumericField(MemoryDocValuesConsumer.java:91)
   [junit4]    >        at 
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addNumericField(PerFieldDocValuesFormat.java:111)
   [junit4]    >        at 
org.apache.lucene.index.NumericDocValuesWriter.flush(NumericDocValuesWriter.java:96)
   [junit4]    >        at 
org.apache.lucene.index.DefaultIndexingChain.writeDocValues(DefaultIndexingChain.java:258)
   [junit4]    >        at 
org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:142)
   [junit4]    >        at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:444)
   [junit4]    >        at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:539)
   [junit4]    >        at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:653)
   [junit4]    >        at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3007)
   [junit4]    >        at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3242)
   [junit4]    >        at 
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3205)
   [junit4]    >        at 
org.apache.lucene.index.TestIndexSorting.testEmptyNonSortedIntField(TestIndexSorting.java:774)
   [junit4]    >        at java.lang.Thread.run(Thread.java:745)
{code}

> AIOOBE on flush+sort
> --------------------
>
>                 Key: LUCENE-7791
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7791
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 6.5
>            Reporter: Przemysław Szeremiota
>              Labels: patch
>         Attachments: sortflush.patch
>
>
> On released 6.5.0 version, flushing operation on sorted index throws 
> ArrayIndexOutOfBoudException in NumericDocValuesWriter, NormValuesWriter and 
> BinaryDocValuesWriter.
> New SortedXXXIterators are looking up documents in FixedBitSets or 
> PackedValues based on remapped (sorted) document ID, without checking 
> BitSets/Values ranges, which are based on original document IDs. Meanwhile 
> FixedBitSets can be sparse not only in between documents with fields, but 
> also after last (originally) document with given field (because writer's 
> addValue() is not called for last documents without values for fields). So 
> remapped (sorted) values range can have different useful values range and 
> bounds checking should be done for remapped and not original ID.
> We were hit by this bug because our indexes are built from independent 
> sources by partial updating fragments of documents, so there is always some 
> documents without values in some fields.
> As I understand this bug, it shows when:
> - maxDoc is greater than 64 (64 is pre-allocated size for writers 
> FixedBitSets)
> - some number of last taken documents have empty fields (so FixedBitSet won't 
> be reallocated to maxDoc)
> Also, check for range of values for given field is now happening based on 
> original ID (e.g. "upto < size"), so flushing can now lost some values, even 
> without hitting AIOOBE.
> I will attach patch resolving issues with some writers; for other writers 
> from LUCENE-7579, I am not sure if there are similar bugs in them; patch 
> resolved our indexing issues, please check changes from LUCENE-7579 for 
> confirmation of lack of additional bugs in other flush-sorting writers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to