Kazuki Hamasaki created LANG-857:
------------------------------------

             Summary: Bad surrogate pair handling in the CharSequenceTranslator
                 Key: LANG-857
                 URL: https://issues.apache.org/jira/browse/LANG-857
             Project: Commons Lang
          Issue Type: Bug
          Components: lang.text.translate.*
    Affects Versions: 3.x
            Reporter: Kazuki Hamasaki
            Priority: Minor
         Attachments: CharSequenceTranslator_translate.patch

I found that there is bad surrogate pair handling in the CharSequenceTranslator

This is a simple test case for this problem.
\uD83D\uDE30 is a surrogate pair.

{code:java}
@Test
public void testEscapeSurrogatePairs() throws Exception {
    assertEquals("\uD83D\uDE30", StringEscapeUtils.escapeCsv("\uD83D\uDE30"));
}
{code}

You'll get the exception as shown below.

{code}
java.lang.StringIndexOutOfBoundsException: String index out of range: 2
        at java.lang.String.charAt(String.java:658)
        at java.lang.Character.codePointAt(Character.java:4668)
        at 
org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:95)
        at 
org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:59)
        at 
org.apache.commons.lang3.StringEscapeUtils.escapeCsv(StringEscapeUtils.java:556)
{code}

Patch attached, the method affected:
# public final void translate(CharSequence input, Writer out) throws IOException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to