Kazuki Hamasaki created LANG-858: ------------------------------------ Summary: StringEscapeUtils.escapeJava() does not output the escaped surrogate pairs that is Java parsable Key: LANG-858 URL: https://issues.apache.org/jira/browse/LANG-858 Project: Commons Lang Issue Type: Bug Components: lang.*, lang.text.translate.* Affects Versions: 3.x Reporter: Kazuki Hamasaki Priority: Minor Attachments: JavaUnicodeEscape.patch
In case of Java and ECMA Script, the style of unicode escape {{'\uxxxxxx'}} cannot be accepted. We need to separate it into high-surrogate and low-surrogate. For example, you put the surrogate pair {code:java} '\uDBFF\uDFFD' {code} output must be {code:java} "\\uDBFF\\uDFFD" {code} However you get {code:java} "\\u10FFFD" {code} Test case here: {code:java} @Test public void testEscapeSurrogatePairs() throws Exception { assertEquals("\\uDBFF\\uDFFD", StringEscapeUtils.escapeJava("\uDBFF\uDFFD")); assertEquals("\\uDBFF\\uDFFD", StringEscapeUtils.escapeEcmaScript("\uDBFF\uDFFD")); } {code} I attached the patch which implements simple solution. But UnicodeEscaper.java should not be specified for Java, I think. We need to discuss about it. This issue does not be appeared in unescape method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira