Kazuki Hamasaki created LANG-858:
------------------------------------

             Summary: StringEscapeUtils.escapeJava() does not output the 
escaped surrogate pairs that is Java parsable
                 Key: LANG-858
                 URL: https://issues.apache.org/jira/browse/LANG-858
             Project: Commons Lang
          Issue Type: Bug
          Components: lang.*, lang.text.translate.*
    Affects Versions: 3.x
            Reporter: Kazuki Hamasaki
            Priority: Minor
         Attachments: JavaUnicodeEscape.patch

In case of Java and ECMA Script, the style of unicode escape {{'\uxxxxxx'}} 
cannot be accepted. We need to separate it into high-surrogate and 
low-surrogate.

For example, you put the surrogate pair
{code:java}
'\uDBFF\uDFFD'
{code}
output must be
{code:java}
"\\uDBFF\\uDFFD"
{code}
However you get
{code:java}
"\\u10FFFD"
{code}

Test case here:
{code:java}
@Test
public void testEscapeSurrogatePairs() throws Exception {
    assertEquals("\\uDBFF\\uDFFD", 
StringEscapeUtils.escapeJava("\uDBFF\uDFFD"));
    assertEquals("\\uDBFF\\uDFFD", 
StringEscapeUtils.escapeEcmaScript("\uDBFF\uDFFD"));
}
{code}

I attached the patch which implements simple solution.
But UnicodeEscaper.java should not be specified for Java, I think. We need to 
discuss about it.

This issue does not be appeared in unescape method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to