[ 
https://issues.apache.org/jira/browse/LANG-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henri Yandell closed LANG-617.
------------------------------

    Resolution: Fixed

Marking this as closed. Feel free to reopen if you find the current codebase is 
still problematic David. Things seem good with the data you provided (many 
thanks for that by the way).

> StringEscapeUtils.escapeXML() can't process UTF-16 supplementary characters
> ---------------------------------------------------------------------------
>
>                 Key: LANG-617
>                 URL: https://issues.apache.org/jira/browse/LANG-617
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.4
>            Reporter: David Garcia
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: utf8-fragment.txt, xml-escaped-fragment.txt
>
>
> Supplementary characters in UTF-16 are those whose code points are above 
> 0xffff, that is, require more than 1 Java char to be encoded, as explained 
> here: http://java.sun.com/developer/technicalArticles/Intl/Supplementary/
> Currently, StringEscapeUtils.escapeXML() isn't aware of this coding scheme 
> and treats each char as one character, which is not always right.
> A possible solution in class Entities would be:
>     public void escape(Writer writer, String str) throws IOException {
>         int len = str.length();
>         for (int i = 0; i < len; i++) {
>             int code = str.codePointAt(i);
>             String entityName = this.entityName(code);
>             if (entityName != null) {
>                 writer.write('&');
>                 writer.write(entityName);
>                 writer.write(';');
>             } else if (code > 0x7F) {
>                     writer.write("&#");
>                     writer.write(code);
>                     writer.write(';');
>             } else {
>                     writer.write((char) code);
>             }
>             if (code > 0xffff) {
>                     i++;
>             }
>         }
>     }
> Besides fixing escapeXML(), this will also affect HTML escaping functions. I 
> guess that's a good thing, but please remember I have only tested escapeXML().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to