StringEscapeUtils.escapeHtml incorrectly converts unicode characters above
U+00FFFF into 2 characters
-----------------------------------------------------------------------------------------------------
Key: LANG-480
URL: https://issues.apache.org/jira/browse/LANG-480
Project: Commons Lang
Issue Type: Bug
Affects Versions: 2.4
Environment: doesn't matter
Reporter: Alexander Kjäll
Priority: Minor
Characters that are represented as a 2 characters internaly by java are
incorrectly converted by the function. The following test displays the problem
quite nicely:
import org.apache.commons.lang.*;
public class J2 {
public static void main(String[] args) throws Exception {
// this is the utf8 representation of the character:
// COUNTING ROD UNIT DIGIT THREE
// in unicode
// codepoint: U+1D362
byte[] data = new byte[] { (byte)0xF0, (byte)0x9D, (byte)0x8D,
(byte)0xA2 };
//output is: ��
// should be: 𝍢
System.out.println("'" + StringEscapeUtils.escapeHtml(new String(data,
"UTF8")) + "'");
}
}
Should be very quick to fix, feel free to drop me an email if you want a patch.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.