[
https://issues.apache.org/jira/browse/LANG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835215#comment-16835215
]
Gerardo Torres Ontiveros commented on LANG-1453:
------------------------------------------------
I figured out the cause of this bug and find a solution.
In StringUtils.replace the replacement process is performed based on the
indexes of the substrings on the lowercase version of the input strings. When
we have this String “İa” ([Dotted uppercase
“I”|https://en.wikipedia.org/wiki/Dotted_and_dotless_I] followed by an “a”) and
try to get the lowercase version of this String something weird occurs. As
dotted lowercase “I” doesn’t exist a lowercase “i” and a
[dot|https://en.wikipedia.org/wiki/Dot_(diacritic)] is returned (“i·a”). As you
noticed the input is 2 characters length and the output have 3 characters
length, so when the “a” is tried to be removed from the original string “İa”
the index of the “a” is taken as 2 as it is on the lowercase version “i·a” but
that index is out of the bounds of the original string and the exception is
thrown. Also, with these parameters StringUtils.removeIgnoreCase("İash", "a")
the exception won’t happen instead of, the next string will be returned “İah”
that is totally wrong because the length of the normal string and the lowercase
version are different.
This character “İ” is the only one from this list
[https://en.wikipedia.org/wiki/Dot_(diacritic)] that makes that the Strings
that contain this character causes that the toLowerCase method returns a string
with a bigger size but toUpperCase method returns a String with the same size.
I already send a pull request changing the following lines of the replace
method:
From:
if (ignoreCase) {
searchText = text.toLowerCase();
searchString = searchString.toLowerCase();
}
To:
if (ignoreCase) {
searchText = text.toUpperCase();
searchString = searchString.toUpperCase();
}
I had some problems adding the test, this character “İ” was uploaded as a
question mark, I asked for help in a GitHub comment on the commit.
> StringUtils.removeIgnoreCase("İa", "a") throws IndexOutOfBoundsException
> ------------------------------------------------------------------------
>
> Key: LANG-1453
> URL: https://issues.apache.org/jira/browse/LANG-1453
> Project: Commons Lang
> Issue Type: Bug
> Components: lang.text.*
> Affects Versions: 3.8.1
> Reporter: Thomas Neerup
> Priority: Critical
> Time Spent: 10m
> Remaining Estimate: 0h
>
> *try* {
> String s = StringUtils._removeIgnoreCase_("İa", "a");
> } *catch* (Exception e) {
> e.printStackTrace();
> }
> output
> java.lang.IndexOutOfBoundsException: start 3, end 2, s.length() 2
> at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:539)
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)