[ https://issues.apache.org/jira/browse/LUCENE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832698#action_12832698 ]
Robert Muir commented on LUCENE-2257: ------------------------------------- bq. (I discovered that queries containing Korean characters would consistently trigger the bug). this makes sense because Hangul is sorted towards the end of the term dictionary you can see this visually here: http://unicode.org/roadmaps/bmp/ > relax the per-segment max unique term limit > ------------------------------------------- > > Key: LUCENE-2257 > URL: https://issues.apache.org/jira/browse/LUCENE-2257 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 2.9.2, 3.0.1, 3.1 > > Attachments: LUCENE-2257.patch > > > Lucene can't handle more than 2.1B (limit of signed 32 bit int) unique terms > in a single segment. > But I think we can improve this to termIndexInterval (default 128) * 2.1B. > There is one place (internal API only) where Lucene uses an int but should > use a long. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org