[jira] Updated: (LUCENE-2183) Supplementary Character Handling in CharTokenizer

Simon Willnauer (JIRA) Fri, 15 Jan 2010 13:01:19 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Simon Willnauer updated LUCENE-2183:
------------------------------------

    Attachment: LUCENE-2183.patch

I updated the patch to make use of the nice reflection utils and ported all 
subclasses of CharTokenizer to the int based API.
Due to the addition of Version to CharTokenizer ctors this patch creates a lot 
of usage of deprecated API.
Yet, I haven't changed all the usage of the deprecated ctors, this should be 
done in another issue IMO.

> Supplementary Character Handling in CharTokenizer
> -------------------------------------------------
>
>                 Key: LUCENE-2183
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2183
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Simon Willnauer
>             Fix For: 3.1
>
>         Attachments: LUCENE-2183.patch, LUCENE-2183.patch
>
>
> CharTokenizer is an abstract base class for all Tokenizers operating on a 
> character level. Yet, those tokenizers still use char primitives instead of 
> int codepoints. CharTokenizer should operate on codepoints and preserve bw 
> compatibility. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (LUCENE-2183) Supplementary Character Handling in CharTokenizer

Reply via email to