[
https://issues.apache.org/jira/browse/LANG-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604694#comment-16604694
]
Michael Ryan commented on LANG-1406:
------------------------------------
I've been thinking - how do case-insensitive regular expressions handle this?
Theoretically these should do the same thing:
{code}
StringUtils.replaceIgnoreCase("\u0130x", "x", "");
Pattern.compile("x",
Pattern.CASE_INSENSITIVE).matcher("\u0130x").replaceAll("");
{code}
The Matcher.replaceAll(String) method does not throw an exception.
So what is the difference? The Pattern.newSingle(int) method is the key thing
to look at. It uses Character.toUpperCase(char) and
Character.toLowerCase(char), which do not have the same behavior as
String.toUpperCase() and String.toLowerCase(). The Character class produce a
single character.
So I think a possible naive solution to this would be to call
Character.toLowerCase() on each character in the String and then append the
characters together into a new String.
{code}
String text = "foo";
char[] chars = text.toCharArray();
for (int i = 0; i < chars.length; i++) {
chars[i] = Character.toLowerCase(chars[i]);
}
String lowerText = new String(chars);
{code}
> StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase
> ----------------------------------------------------------------
>
> Key: LANG-1406
> URL: https://issues.apache.org/jira/browse/LANG-1406
> Project: Commons Lang
> Issue Type: Bug
> Components: lang.*
> Reporter: Michael Ryan
> Priority: Major
>
> STEPS TO REPRODUCE:
> {code}
> StringUtils.replaceIgnoreCase("\u0130x", "x", "")
> {code}
> EXPECTED: "\u0130" is returned.
> ACTUAL: StringIndexOutOfBoundsException
> This happens because the replace method is assuming that text.length() ==
> text.toLowerCase().length(), which is not true for certain characters.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)