[ https://issues.apache.org/jira/browse/LUCENE-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977769#comment-15977769 ]
Jim Ferenczi commented on LUCENE-7791: -------------------------------------- Ok this is now merged to 6.x and 6.5 and the fix will be part of the 6.5.1 release. I still need to adapt the test for master. Thanks [~przemosz] and [~mikemccand] ! > AIOOBE on flush+sort > -------------------- > > Key: LUCENE-7791 > URL: https://issues.apache.org/jira/browse/LUCENE-7791 > Project: Lucene - Core > Issue Type: Bug > Components: core/index > Affects Versions: 6.5 > Reporter: Przemysław Szeremiota > Labels: patch > Fix For: master (7.0), 6.6, 6.5.1 > > Attachments: LUCENE-7791.patch, sortflush.patch, sortflush-test.patch > > > On released 6.5.0 version, flushing operation on sorted index throws > ArrayIndexOutOfBoudException in NumericDocValuesWriter, NormValuesWriter and > BinaryDocValuesWriter. > New SortedXXXIterators are looking up documents in FixedBitSets or > PackedValues based on remapped (sorted) document ID, without checking > BitSets/Values ranges, which are based on original document IDs. Meanwhile > FixedBitSets can be sparse not only in between documents with fields, but > also after last (originally) document with given field (because writer's > addValue() is not called for last documents without values for fields). So > remapped (sorted) values range can have different useful values range and > bounds checking should be done for remapped and not original ID. > We were hit by this bug because our indexes are built from independent > sources by partial updating fragments of documents, so there is always some > documents without values in some fields. > As I understand this bug, it shows when: > - maxDoc is greater than 64 (64 is pre-allocated size for writers > FixedBitSets) > - some number of last taken documents have empty fields (so FixedBitSet won't > be reallocated to maxDoc) > Also, check for range of values for given field is now happening based on > original ID (e.g. "upto < size"), so flushing can now lost some values, even > without hitting AIOOBE. > I will attach patch resolving issues with some writers; for other writers > from LUCENE-7579, I am not sure if there are similar bugs in them; patch > resolved our indexing issues, please check changes from LUCENE-7579 for > confirmation of lack of additional bugs in other flush-sorting writers. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org