Kazuki Hamasaki created LANG-857:
------------------------------------
Summary: Bad surrogate pair handling in the CharSequenceTranslator
Key: LANG-857
URL: https://issues.apache.org/jira/browse/LANG-857
Project: Commons Lang
Issue Type: Bug
Components: lang.text.translate.*
Affects Versions: 3.x
Reporter: Kazuki Hamasaki
Priority: Minor
Attachments: CharSequenceTranslator_translate.patch
I found that there is bad surrogate pair handling in the CharSequenceTranslator
This is a simple test case for this problem.
\uD83D\uDE30 is a surrogate pair.
{code:java}
@Test
public void testEscapeSurrogatePairs() throws Exception {
assertEquals("\uD83D\uDE30", StringEscapeUtils.escapeCsv("\uD83D\uDE30"));
}
{code}
You'll get the exception as shown below.
{code}
java.lang.StringIndexOutOfBoundsException: String index out of range: 2
at java.lang.String.charAt(String.java:658)
at java.lang.Character.codePointAt(Character.java:4668)
at
org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:95)
at
org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:59)
at
org.apache.commons.lang3.StringEscapeUtils.escapeCsv(StringEscapeUtils.java:556)
{code}
Patch attached, the method affected:
# public final void translate(CharSequence input, Writer out) throws IOException
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira