[ https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782623#action_12782623 ]
Robert Muir commented on LUCENE-2069: ------------------------------------- damn we have to use the limit form of codePointAt, just to be sure. if term text truly ends with unpaired lead surrogate, codePointAt could pair it with leftover trash trail surrogate from a previous token... > fix LowerCaseFilter for unicode 4.0 > ----------------------------------- > > Key: LUCENE-2069 > URL: https://issues.apache.org/jira/browse/LUCENE-2069 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Robert Muir > Assignee: Robert Muir > Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2069.patch, LUCENE-2069.patch, LUCENE-2069.patch, > LUCENE-2069.patch, LUCENE-2069.patch > > > lowercase suppl. characters correctly. > this only fixes the filter, the LowerCaseTokenizer is part of a more complex > issue (CharTokenizer) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org