[ 
https://issues.apache.org/jira/browse/LANG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835215#comment-16835215
 ] 

Gerardo Torres Ontiveros commented on LANG-1453:
------------------------------------------------

I figured out the cause of this bug and find a solution.

In StringUtils.replace the replacement process is performed based on the 
indexes of the substrings on the lowercase version of the input strings. When 
we have this String “İa” ([Dotted uppercase 
“I”|https://en.wikipedia.org/wiki/Dotted_and_dotless_I] followed by an “a”) and 
try to get the lowercase version of this String something weird occurs. As 
dotted lowercase “I” doesn’t exist a lowercase “i” and a 
[dot|https://en.wikipedia.org/wiki/Dot_(diacritic)] is returned (“i·a”). As you 
noticed the input is 2 characters length and the output have 3 characters 
length, so when the “a” is tried to be removed from the original string “İa” 
the index of the “a” is taken as 2 as it is on the lowercase version “i·a” but 
that index is out of the bounds of the original string and the exception is 
thrown. Also, with these parameters StringUtils.removeIgnoreCase("İash", "a") 
the exception won’t happen instead of,  the next string will be returned “İah” 
that is totally wrong because the length of the normal string and the lowercase 
version are different.

This character “İ” is the only one from this list 
[https://en.wikipedia.org/wiki/Dot_(diacritic)] that makes that the Strings 
that contain this character causes that the toLowerCase method returns a string 
with a bigger size but toUpperCase method returns a String with the same size.

I already send a pull request changing the following lines of the replace 
method:

From:

if (ignoreCase) {

    searchText = text.toLowerCase();

    searchString = searchString.toLowerCase();

 }

To:

if (ignoreCase) {

    searchText = text.toUpperCase();

    searchString = searchString.toUpperCase();

 }

I had some problems adding the test, this character “İ” was uploaded as a 
question mark, I asked for help in a GitHub comment on the commit.

 

> StringUtils.removeIgnoreCase("İa", "a") throws IndexOutOfBoundsException
> ------------------------------------------------------------------------
>
>                 Key: LANG-1453
>                 URL: https://issues.apache.org/jira/browse/LANG-1453
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.text.*
>    Affects Versions: 3.8.1
>            Reporter: Thomas Neerup
>            Priority: Critical
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> *try* {
> String s = StringUtils._removeIgnoreCase_("İa", "a");
> } *catch* (Exception e) {
> e.printStackTrace();
> }
> output
> java.lang.IndexOutOfBoundsException: start 3, end 2, s.length() 2
> at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:539)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to