[jira] [Commented] (LANG-1406) StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase

Michael Ryan (JIRA) Wed, 05 Sep 2018 10:20:13 -0700


    [ 
https://issues.apache.org/jira/browse/LANG-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604694#comment-16604694
 ]


Michael Ryan commented on LANG-1406:
------------------------------------

I've been thinking - how do case-insensitive regular expressions handle this? 
Theoretically these should do the same thing:
{code}
StringUtils.replaceIgnoreCase("\u0130x", "x", "");
Pattern.compile("x", 
Pattern.CASE_INSENSITIVE).matcher("\u0130x").replaceAll("");
{code}
The Matcher.replaceAll(String) method does not throw an exception.

So what is the difference? The Pattern.newSingle(int) method is the key thing 
to look at. It uses Character.toUpperCase(char) and 
Character.toLowerCase(char), which do not have the same behavior as 
String.toUpperCase() and String.toLowerCase(). The Character class produce a 
single character.

So I think a possible naive solution to this would be to call 
Character.toLowerCase() on each character in the String and then append the 
characters together into a new String.
{code}
String text = "foo";
char[] chars = text.toCharArray();
for (int i = 0; i < chars.length; i++) {
    chars[i] = Character.toLowerCase(chars[i]);
}
String lowerText = new String(chars);
{code}

> StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase
> ----------------------------------------------------------------
>
>                 Key: LANG-1406
>                 URL: https://issues.apache.org/jira/browse/LANG-1406
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>            Reporter: Michael Ryan
>            Priority: Major
>
> STEPS TO REPRODUCE:
> {code}
> StringUtils.replaceIgnoreCase("\u0130x", "x", "")
> {code}
> EXPECTED: "\u0130" is returned.
> ACTUAL: StringIndexOutOfBoundsException
> This happens because the replace method is assuming that text.length() == 
> text.toLowerCase().length(), which is not true for certain characters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (LANG-1406) StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase

Reply via email to