StringEscapeUtils.escapeHtml incorrectly converts unicode characters above 
U+00FFFF into 2 characters
-----------------------------------------------------------------------------------------------------

                 Key: LANG-480
                 URL: https://issues.apache.org/jira/browse/LANG-480
             Project: Commons Lang
          Issue Type: Bug
    Affects Versions: 2.4
         Environment: doesn't matter
            Reporter: Alexander Kjäll
            Priority: Minor


Characters that are represented as a 2 characters internaly by java are 
incorrectly converted by the function. The following test displays the problem 
quite nicely:

import org.apache.commons.lang.*;

public class J2 {
    public static void main(String[] args) throws Exception {
        // this is the utf8 representation of the character:
        // COUNTING ROD UNIT DIGIT THREE
        // in unicode
        // codepoint: U+1D362
        byte[] data = new byte[] { (byte)0xF0, (byte)0x9D, (byte)0x8D, 
(byte)0xA2 };

        //output is: ��
        // should be: 𝍢
        System.out.println("'" + StringEscapeUtils.escapeHtml(new String(data, 
"UTF8")) + "'");
    }
}

Should be very quick to fix, feel free to drop me an email if you want a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to