[ 
https://issues.apache.org/jira/browse/LANG-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573108#comment-16573108
 ] 

HiuFung Kwok commented on LANG-1406:
------------------------------------

Hi all,

After a bit of research, it seem to be a known issue when unicode is contained 
on a String 
object([ref|[https://www.quora.com/Is-Javas-toLowercase-string-method-reliable-for-Unicode])],
 String.toLowerCase() would produce a incorrect result.

In this case "\u0130" would become a String object with three char which are [ 
i,  ̇, x] instead of [ İ, x].

So by given a incorrect result from .toLowCase() method, 
StringUtils.replaceIgnoreCase end attempt to access the segment of string which 
is not exist which is 3 in this case while str.length() is 2.

The fixture I come up with is replacing the .toLowcase() to .toUpperCase() in 
order to avoid the mis-interprettion on .toLowerCase while performing 
case-insensitive comparisons.

Fixture: 
[https://github.com/HiuKwok/commons-lang/commit/e0f6c7802b5e721019a602bf30b31c79dbf6d233]

Testcase: 
https://github.com/HiuKwok/commons-lang/commit/590f90889bf61a5570bd98b78e73410a07d7410b

 

 

> StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase
> ----------------------------------------------------------------
>
>                 Key: LANG-1406
>                 URL: https://issues.apache.org/jira/browse/LANG-1406
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>            Reporter: Michael Ryan
>            Priority: Major
>
> STEPS TO REPRODUCE:
> {code}
> StringUtils.replaceIgnoreCase("\u0130x", "x", "")
> {code}
> EXPECTED: "\u0130" is returned.
> ACTUAL: StringIndexOutOfBoundsException
> This happens because the replace method is assuming that text.length() == 
> text.toLowerCase().length(), which is not true for certain characters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to