[ 
https://issues.apache.org/jira/browse/LANG-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570701#action_12570701
 ] 

Cédrik LIME commented on LANG-285:
----------------------------------

Here is a pure UNICODE version, which is complementary to the "big list of 
accentuated chars" previously described.
One could probably use some reflection to see which implementation is 
available, and fall back to the "search and replace" method if nothing more is 
available.


Java 6+ (beware, java.text.Normalizer exists in Java 1.3, but is incompatible!):

private static final Pattern sunPattern =
Pattern.compile("\\p{InCombiningDiacriticalMarks}+");//$NON-NLS-1$

String decomposed = java.text.Normalizer.normalize(string, Normalizer.Form.NFD);
return  sunPattern.matcher(decomposed).replaceAll("");//$NON-NLS-1$



SUN internal, Java 1.3 to 1.5:

private static final Pattern sunPattern =
Pattern.compile("\\p{InCombiningDiacriticalMarks}+");//$NON-NLS-1$

String result = sun.text.Normalizer.decompose(text, false, 0);
result = sunPattern.matcher(result).replaceAll("");//$NON-NLS-1$



IBM ICU4J (http://www.icu-project.org/):

private static final com.ibm.icu.text.Transliterator accentsRemover =
Transliterator.getInstance("NFD; [:Nonspacing Mark:] Remove; NFC; 
");//$NON-NLS-1$

return accentsRemover.transliterate(text);

> Wish : method unaccent
> ----------------------
>
>                 Key: LANG-285
>                 URL: https://issues.apache.org/jira/browse/LANG-285
>             Project: Commons Lang
>          Issue Type: New Feature
>            Reporter: Guillaume Coté
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: MapBuilder.java, unaccent.patch, UnnacentMap.java
>
>
> I would like to add a method that replace accented caracter by unaccented 
> one.  For example, with the input String "L'été où j'ai dû aller à l'île 
> d'Anticosti commenca tôt", the method would return "L'ete ou j'ai du aller à 
> l'ile d'Anticosti commenca tot".
> I suggest to call that method unaccent and to add it in StringUtils.
> If we cannot covert all case, the first version could only covert iso-8859-1.
> If you are willing to go forward with that idea, I am willing to contribute a 
> patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to