[ 
https://issues.apache.org/jira/browse/LUCENE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832689#action_12832689
 ] 

Tom Burton-West commented on LUCENE-2257:
-----------------------------------------

Hi Michael,

Thanks for your help. We mounted one of the indexes with 2.4 billion terms on 
our dev server and tested with and without the patch. (I discovered that 
queries containing Korean characters would consistently trigger the bug).   
With the patch, we don't see any ArrayIndexOutOfBounds exceptions.  We are 
going to do a bit more testing before we put this into production. (We rolled 
back our production indexes temporarily to indexes that split the terms over 2 
segments and therefore didn't trigger the bug).

Other than walking though the code in the debugger, is there some systematic 
way of looking for any other places where an int is used that might also have 
problems when we have over 2.1x billion terms?

Tom

> relax the per-segment max unique term limit
> -------------------------------------------
>
>                 Key: LUCENE-2257
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2257
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9.2, 3.0.1, 3.1
>
>         Attachments: LUCENE-2257.patch
>
>
> Lucene can't handle more than 2.1B (limit of signed 32 bit int) unique terms 
> in a single segment.
> But I think we can improve this to termIndexInterval (default 128) * 2.1B.  
> There is one place (internal API only) where Lucene uses an int but should 
> use a long.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to