[ 
https://issues.apache.org/jira/browse/LUCENE-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15978372#comment-15978372
 ] 

Jim Ferenczi commented on LUCENE-7791:
--------------------------------------

 I don't know what happened but the fix for NormsValueWriter is not in my 
patch. I'll push the fix shortly with additional tests for this case, thanks 
for checking.

> AIOOBE on flush+sort
> --------------------
>
>                 Key: LUCENE-7791
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7791
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 6.5
>            Reporter: Przemysław Szeremiota
>              Labels: patch
>             Fix For: master (7.0), 6.6, 6.5.1
>
>         Attachments: LUCENE-7791.patch, sortflush.patch, sortflush-test.patch
>
>
> On released 6.5.0 version, flushing operation on sorted index throws 
> ArrayIndexOutOfBoudException in NumericDocValuesWriter, NormValuesWriter and 
> BinaryDocValuesWriter.
> New SortedXXXIterators are looking up documents in FixedBitSets or 
> PackedValues based on remapped (sorted) document ID, without checking 
> BitSets/Values ranges, which are based on original document IDs. Meanwhile 
> FixedBitSets can be sparse not only in between documents with fields, but 
> also after last (originally) document with given field (because writer's 
> addValue() is not called for last documents without values for fields). So 
> remapped (sorted) values range can have different useful values range and 
> bounds checking should be done for remapped and not original ID.
> We were hit by this bug because our indexes are built from independent 
> sources by partial updating fragments of documents, so there is always some 
> documents without values in some fields.
> As I understand this bug, it shows when:
> - maxDoc is greater than 64 (64 is pre-allocated size for writers 
> FixedBitSets)
> - some number of last taken documents have empty fields (so FixedBitSet won't 
> be reallocated to maxDoc)
> Also, check for range of values for given field is now happening based on 
> original ID (e.g. "upto < size"), so flushing can now lost some values, even 
> without hitting AIOOBE.
> I will attach patch resolving issues with some writers; for other writers 
> from LUCENE-7579, I am not sure if there are similar bugs in them; patch 
> resolved our indexing issues, please check changes from LUCENE-7579 for 
> confirmation of lack of additional bugs in other flush-sorting writers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to