[
https://issues.apache.org/jira/browse/LANG-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502022#comment-13502022
]
Kazuki Hamasaki commented on LANG-857:
--------------------------------------
I created additional test cases.
But tests for {{escapeJava}} and {{escapeEcmaScript}} fail at this time, due to
[LANG-858]
{code:java}
@Test
public void testEscapeSurrogatePairs() throws Exception {
assertEquals("\uD83D\uDE30", StringEscapeUtils.escapeCsv("\uD83D\uDE30"));
// Examples from https://en.wikipedia.org/wiki/UTF-16
assertEquals("\uD800\uDC00", StringEscapeUtils.escapeCsv("\uD800\uDC00"));
assertEquals("\uD834\uDD1E", StringEscapeUtils.escapeCsv("\uD834\uDD1E"));
assertEquals("\uDBFF\uDFFD", StringEscapeUtils.escapeCsv("\uDBFF\uDFFD"));
assertEquals("\uDBFF\uDFFD", StringEscapeUtils.escapeHtml3("\uDBFF\uDFFD"));
assertEquals("\uDBFF\uDFFD", StringEscapeUtils.escapeHtml4("\uDBFF\uDFFD"));
assertEquals("\\uDBFF\\uDFFD",
StringEscapeUtils.escapeJava("\uDBFF\uDFFD")); //fail
assertEquals("\\uDBFF\\uDFFD",
StringEscapeUtils.escapeEcmaScript("\uDBFF\uDFFD")); //fail
assertEquals("\uDBFF\uDFFD", StringEscapeUtils.escapeXml("\uDBFF\uDFFD"));
}
@Test
public void testUnEscapeSurrogatePairs() throws Exception {
assertEquals("\uD83D\uDE30", StringEscapeUtils.unescapeCsv("\uD83D\uDE30"));
// Examples from https://en.wikipedia.org/wiki/UTF-16
assertEquals("\uD800\uDC00", StringEscapeUtils.unescapeCsv("\uD800\uDC00"));
assertEquals("\uD834\uDD1E", StringEscapeUtils.unescapeCsv("\uD834\uDD1E"));
assertEquals("\uDBFF\uDFFD", StringEscapeUtils.unescapeCsv("\uDBFF\uDFFD"));
assertEquals("\uDBFF\uDFFD",
StringEscapeUtils.unescapeHtml3("\uDBFF\uDFFD"));
assertEquals("\uDBFF\uDFFD",
StringEscapeUtils.unescapeHtml4("\uDBFF\uDFFD"));
assertEquals("\uDBFF\uDFFD",
StringEscapeUtils.unescapeJava("\\uDBFF\\uDFFD"));
assertEquals("\uDBFF\uDFFD",
StringEscapeUtils.unescapeEcmaScript("\\uDBFF\\uDFFD"));
assertEquals("\uDBFF\uDFFD", StringEscapeUtils.escapeXml("\uDBFF\uDFFD"));
}
{code}
> StringIndexOutOfBoundsException in CharSequenceTranslator
> ---------------------------------------------------------
>
> Key: LANG-857
> URL: https://issues.apache.org/jira/browse/LANG-857
> Project: Commons Lang
> Issue Type: Bug
> Components: lang.text.translate.*
> Affects Versions: 3.x
> Reporter: Kazuki Hamasaki
> Priority: Minor
> Labels: patch
> Fix For: 3.2
>
> Attachments: CharSequenceTranslator_translate.patch
>
>
> I found that there is bad surrogate pair handling in the
> CharSequenceTranslator
> This is a simple test case for this problem.
> \uD83D\uDE30 is a surrogate pair.
> {code:java}
> @Test
> public void testEscapeSurrogatePairs() throws Exception {
> assertEquals("\uD83D\uDE30", StringEscapeUtils.escapeCsv("\uD83D\uDE30"));
> }
> {code}
> You'll get the exception as shown below.
> {code}
> java.lang.StringIndexOutOfBoundsException: String index out of range: 2
> at java.lang.String.charAt(String.java:658)
> at java.lang.Character.codePointAt(Character.java:4668)
> at
> org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:95)
> at
> org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:59)
> at
> org.apache.commons.lang3.StringEscapeUtils.escapeCsv(StringEscapeUtils.java:556)
> {code}
> Patch attached, the method affected:
> # public final void translate(CharSequence input, Writer out) throws
> IOException
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira