[ 
https://issues.apache.org/jira/browse/LUCENE-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577312#action_12577312
 ] 

Andrew Lynch commented on LUCENE-1215:
--------------------------------------

This will be quite useful. I used the Normalizer to implement my own custom 
analyzer for https://issues.apache.org/jira/browse/LUCENE-1032. 
There is actually a Normalizer equivalent in older versions of the Sun JDK, 
sun.text.Normalizer, but this obviously wouldn't end up being portable across 
VMs. 

I ended up using reflection to determine the presence of Normalizer if it 
existed, then fell back to sun.text.Normalizer, then finally performing no 
normalization if neither could be found to preserve compatibility with non Java 
6/ Sun JDKs.

> Support of Unicode Collation
> ----------------------------
>
>                 Key: LUCENE-1215
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1215
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Analysis
>            Reporter: Hiroaki Kawai
>         Attachments: NormalizerTokenFilter.java
>
>
> New in java 6, we have java.text.Normalizer that supports Unicode Standard 
> Annex #15 normalization.
> http://java.sun.com/javase/6/docs/api/java/text/Normalizer.html
> http://www.unicode.org/unicode/reports/tr15/
> The normalization defined has four variants of C, D, KC, KD. Canonical 
> Decomposition or Compatibility Decomposition will be normalize the 
> representation of a String, and the search result will be improved.
> I'd like to submit a TokenFilter code supporting this feature! :-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to