[jira] Commented: (LUCENE-2069) fix LowerCaseFilter for unicode 4.0

Robert Muir (JIRA) Wed, 25 Nov 2009 13:19:05 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782623#action_12782623
 ]


Robert Muir commented on LUCENE-2069:
-------------------------------------

damn we have to use the limit form of codePointAt, just to be sure.

if term text truly ends with unpaired lead surrogate, codePointAt could pair it 
with leftover trash trail surrogate from a previous token...


> fix LowerCaseFilter for unicode 4.0
> -----------------------------------
>
>                 Key: LUCENE-2069
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2069
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2069.patch, LUCENE-2069.patch, LUCENE-2069.patch, 
> LUCENE-2069.patch, LUCENE-2069.patch
>
>
> lowercase suppl. characters correctly. 
> this only fixes the filter, the LowerCaseTokenizer is part of a more complex 
> issue (CharTokenizer)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2069) fix LowerCaseFilter for unicode 4.0

Reply via email to