[
https://issues.apache.org/jira/browse/LANG-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573108#comment-16573108
]
HiuFung Kwok commented on LANG-1406:
------------------------------------
Hi all,
After a bit of research, it seem to be a known issue when unicode is contained
on a String
object([ref|[https://www.quora.com/Is-Javas-toLowercase-string-method-reliable-for-Unicode])],
String.toLowerCase() would produce a incorrect result.
In this case "\u0130" would become a String object with three char which are [
i, ̇, x] instead of [ İ, x].
So by given a incorrect result from .toLowCase() method,
StringUtils.replaceIgnoreCase end attempt to access the segment of string which
is not exist which is 3 in this case while str.length() is 2.
The fixture I come up with is replacing the .toLowcase() to .toUpperCase() in
order to avoid the mis-interprettion on .toLowerCase while performing
case-insensitive comparisons.
Fixture:
[https://github.com/HiuKwok/commons-lang/commit/e0f6c7802b5e721019a602bf30b31c79dbf6d233]
Testcase:
https://github.com/HiuKwok/commons-lang/commit/590f90889bf61a5570bd98b78e73410a07d7410b
> StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase
> ----------------------------------------------------------------
>
> Key: LANG-1406
> URL: https://issues.apache.org/jira/browse/LANG-1406
> Project: Commons Lang
> Issue Type: Bug
> Components: lang.*
> Reporter: Michael Ryan
> Priority: Major
>
> STEPS TO REPRODUCE:
> {code}
> StringUtils.replaceIgnoreCase("\u0130x", "x", "")
> {code}
> EXPECTED: "\u0130" is returned.
> ACTUAL: StringIndexOutOfBoundsException
> This happens because the replace method is assuming that text.length() ==
> text.toLowerCase().length(), which is not true for certain characters.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)