[ https://issues.apache.org/jira/browse/LUCENE-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Simon Willnauer updated LUCENE-2183: ------------------------------------ Attachment: LUCENE-2183.patch I updated the patch to make use of the nice reflection utils and ported all subclasses of CharTokenizer to the int based API. Due to the addition of Version to CharTokenizer ctors this patch creates a lot of usage of deprecated API. Yet, I haven't changed all the usage of the deprecated ctors, this should be done in another issue IMO. > Supplementary Character Handling in CharTokenizer > ------------------------------------------------- > > Key: LUCENE-2183 > URL: https://issues.apache.org/jira/browse/LUCENE-2183 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Simon Willnauer > Fix For: 3.1 > > Attachments: LUCENE-2183.patch, LUCENE-2183.patch > > > CharTokenizer is an abstract base class for all Tokenizers operating on a > character level. Yet, those tokenizers still use char primitives instead of > int codepoints. CharTokenizer should operate on codepoints and preserve bw > compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org