[ https://issues.apache.org/jira/browse/LUCENE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832729#action_12832729 ]
Michael McCandless commented on LUCENE-2257: -------------------------------------------- bq. With the patch, we don't see any ArrayIndexOutOfBounds exceptions. Great! And the results look correct? bq. Other than walking though the code in the debugger, is there some systematic way of looking for any other places where an int is used that might also have problems when we have over 2.1x billion terms? Not that I know of! The code that handles the term dict lookup is fairly contained, in TermInfosReader and SegmentTermEnum. I think scrutinizing the code and testing (as you're doing) is the only way. I just looked again -- there are a few places where int is still being used. First is two methods (get(int position) and scanEnum), in TermInfosReader, that are actually dead code (package private & unused). Second is int SegmentTermEnum.scanTo, but this is fine because it's never asked to can more than termIndexInterval terms. I'll attach patch that additionally just removes that dead code. > relax the per-segment max unique term limit > ------------------------------------------- > > Key: LUCENE-2257 > URL: https://issues.apache.org/jira/browse/LUCENE-2257 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 2.9.2, 3.0.1, 3.1 > > Attachments: LUCENE-2257.patch, LUCENE-2257.patch > > > Lucene can't handle more than 2.1B (limit of signed 32 bit int) unique terms > in a single segment. > But I think we can improve this to termIndexInterval (default 128) * 2.1B. > There is one place (internal API only) where Lucene uses an int but should > use a long. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org