[
https://issues.apache.org/jira/browse/LUCENE-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976656#comment-15976656
]
Przemysław Szeremiota commented on LUCENE-7791:
-----------------------------------------------
OK, there is it. It fails on branch_6_5, and passes with patch; rudimentary
test only for NumericDocValuesWriter, fails with AIOOBE:
{code}
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestIndexSorting
-Dtests.method=testEmptyNonSortedIntField -Dtests.seed=B1B45F478095D85D
-Dtests.slow=true -Dtests.locale=fr-BE -Dtests.timezone=Canada/Mountain
-Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
[junit4] FAILURE 0.02s | TestIndexSorting.testEmptyNonSortedIntField <<<
[junit4] > Throwable #1: java.lang.AssertionError: index=127, numBits=64
[junit4] > at
__randomizedtesting.SeedInfo.seed([B1B45F478095D85D:EA0CCD4D0DFEC9E8]:0)
[junit4] > at
org.apache.lucene.util.FixedBitSet.get(FixedBitSet.java:181)
[junit4] > at
org.apache.lucene.index.NumericDocValuesWriter$SortingNumericIterator.next(NumericDocValuesWriter.java:257)
[junit4] > at
org.apache.lucene.index.NumericDocValuesWriter$SortingNumericIterator.next(NumericDocValuesWriter.java:228)
[junit4] > at
org.apache.lucene.codecs.memory.MemoryDocValuesConsumer.addNumericField(MemoryDocValuesConsumer.java:112)
[junit4] > at
org.apache.lucene.codecs.memory.MemoryDocValuesConsumer.addNumericField(MemoryDocValuesConsumer.java:91)
[junit4] > at
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addNumericField(PerFieldDocValuesFormat.java:111)
[junit4] > at
org.apache.lucene.index.NumericDocValuesWriter.flush(NumericDocValuesWriter.java:96)
[junit4] > at
org.apache.lucene.index.DefaultIndexingChain.writeDocValues(DefaultIndexingChain.java:258)
[junit4] > at
org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:142)
[junit4] > at
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:444)
[junit4] > at
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:539)
[junit4] > at
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:653)
[junit4] > at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3007)
[junit4] > at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3242)
[junit4] > at
org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3205)
[junit4] > at
org.apache.lucene.index.TestIndexSorting.testEmptyNonSortedIntField(TestIndexSorting.java:774)
[junit4] > at java.lang.Thread.run(Thread.java:745)
{code}
> AIOOBE on flush+sort
> --------------------
>
> Key: LUCENE-7791
> URL: https://issues.apache.org/jira/browse/LUCENE-7791
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Affects Versions: 6.5
> Reporter: Przemysław Szeremiota
> Labels: patch
> Attachments: sortflush.patch
>
>
> On released 6.5.0 version, flushing operation on sorted index throws
> ArrayIndexOutOfBoudException in NumericDocValuesWriter, NormValuesWriter and
> BinaryDocValuesWriter.
> New SortedXXXIterators are looking up documents in FixedBitSets or
> PackedValues based on remapped (sorted) document ID, without checking
> BitSets/Values ranges, which are based on original document IDs. Meanwhile
> FixedBitSets can be sparse not only in between documents with fields, but
> also after last (originally) document with given field (because writer's
> addValue() is not called for last documents without values for fields). So
> remapped (sorted) values range can have different useful values range and
> bounds checking should be done for remapped and not original ID.
> We were hit by this bug because our indexes are built from independent
> sources by partial updating fragments of documents, so there is always some
> documents without values in some fields.
> As I understand this bug, it shows when:
> - maxDoc is greater than 64 (64 is pre-allocated size for writers
> FixedBitSets)
> - some number of last taken documents have empty fields (so FixedBitSet won't
> be reallocated to maxDoc)
> Also, check for range of values for given field is now happening based on
> original ID (e.g. "upto < size"), so flushing can now lost some values, even
> without hitting AIOOBE.
> I will attach patch resolving issues with some writers; for other writers
> from LUCENE-7579, I am not sure if there are similar bugs in them; patch
> resolved our indexing issues, please check changes from LUCENE-7579 for
> confirmation of lack of additional bugs in other flush-sorting writers.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]