[ 
https://issues.apache.org/jira/browse/LANG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666155#action_12666155
 ] 

Alexander Kjäll commented on LANG-480:
--------------------------------------

Just my 2 cents, I don't need a release that fixes this bug, i stumbled on it 
by chance and wrote a patch so that the next person that have the same problem 
that i do won't have to dig through the library in order to understand what's 
going on.

I'm mainly interested in fixing this because i don't like buggy software, but i 
totally agree that building in reflection stuff leads to more problems than it 
solves in the long run.

My opinion on how to fix this is either push for the JDK 1.5 dependency, or 
write some code that parses the format the strings are stored in memory. The 
latter might sound complicated but i think it's quite straight forward.

> StringEscapeUtils.escapeHtml incorrectly converts unicode characters above 
> U+00FFFF into 2 characters
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LANG-480
>                 URL: https://issues.apache.org/jira/browse/LANG-480
>             Project: Commons Lang
>          Issue Type: Bug
>    Affects Versions: 2.4
>         Environment: doesn't matter
>            Reporter: Alexander Kjäll
>            Priority: Minor
>         Attachments: lang-480.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Characters that are represented as a 2 characters internaly by java are 
> incorrectly converted by the function. The following test displays the 
> problem quite nicely:
> import org.apache.commons.lang.*;
> public class J2 {
>     public static void main(String[] args) throws Exception {
>         // this is the utf8 representation of the character:
>         // COUNTING ROD UNIT DIGIT THREE
>         // in unicode
>         // codepoint: U+1D362
>         byte[] data = new byte[] { (byte)0xF0, (byte)0x9D, (byte)0x8D, 
> (byte)0xA2 };
>         //output is: ��
>         // should be: 𝍢
>         System.out.println("'" + StringEscapeUtils.escapeHtml(new 
> String(data, "UTF8")) + "'");
>     }
> }
> Should be very quick to fix, feel free to drop me an email if you want a 
> patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to