[ 
https://issues.apache.org/jira/browse/XALANJ-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486869#comment-16486869
 ] 

Jesper Steen Møller commented on XALANJ-2593:
---------------------------------------------

This is also fixed in the patch in XALANJ-2419.

> Incorrect showing of supplementary characters in attributes
> -----------------------------------------------------------
>
>                 Key: XALANJ-2593
>                 URL: https://issues.apache.org/jira/browse/XALANJ-2593
>             Project: XalanJ2
>          Issue Type: Bug
>      Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>          Components: Serialization
>    Affects Versions: 2.7.2
>         Environment: Win 7 x64, Java 1.6 
>            Reporter: Eugene Shkel
>            Assignee: Steven J. Hathaway
>            Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In Xalan 2.7.2 the supplementary characters (see 
> http://www.oracle.com/technetwork/articles/javase/supplementary-142654.html 
> for details) shown incorrectly in attributes .
> For example, I need to show symbols 𣎴 (& # 144308 ; ) or 𠘨 (& # 132648 ; ) in 
> attribute "y" of element "x"
> Expected result: {code}<?xml version="1.0" encoding="UTF-8"?><x y="&#144308; 
> - &#132648;"/>{code}
> Actual result for Xalan 2.7.2 is:{code} <?xml version="1.0" 
> encoding="UTF-8"?><x y="&#55372;&#57268; - &#55361;&#56872;"/>{code}
> Code snippet for test:
> {code}
> public static void main(String[] argv) throws Exception {
>         TransformerFactory tFactory = TransformerFactory.newInstance();
>         StreamSource stylesource = new StreamSource(new StringReader("<?xml 
> version=\"1.0\" encoding=\"UTF-8\"?><xsl:stylesheet 
> xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\"; version=\"1.0\" 
> ><xsl:template match=\"/\"><x y=\"{xslt/search/value1}\" 
> /></xsl:template></xsl:stylesheet>"));
>         Transformer transformer = tFactory.newTransformer(stylesource);
>         StreamSource source = new StreamSource(new StringReader("<?xml 
> version=\"1.0\"?><xslt><search><value1>𣎴 - 𠘨</value1></search></xslt>"));
>         Result result = new StreamResult(System.out);
>         transformer.transform(source, result);
>     } 
> {code}
> The problem relates to the method 
> org.apache.xml.serializer.ToStream.writeAttrString(Writer, String, String). 
> {code}
>             if (m_charInfo.shouldMapAttrChar(ch)) {
>                 // The character is supposed to be replaced by a String
>                 // e.g.   '&'  -->  "&amp;"
>                 // e.g.   '<'  -->  "&lt;"
>                 accumDefaultEscape(writer, ch, i, stringChars, len, false, 
> true);
>             }
> {code}
> this part doesn't process multicharacter sequences like supplementary 
> characters within Java platform and this leads to executing next part within 
> same method
> {code}
>             else {
>                     // This is a fallback plan, we should never get here
>                     // but if the character wasn't previously handled
>                     // (i.e. isn't in the encoding, etc.) then what
>                     // should we do?  We choose to write out a character ref
>                     writer.write("!13&#");
>                     writer.write(Integer.toString(ch));
>                     writer.write(';');
>                 }
> {code}
>  PS: Can't add patch file, so put here.
> {code}
> --- src\org\apache\xml\serializer\ToStream.java       2014-03-26 17:21:30 
> +0200
> +++ src\org\apache\xml\serializer\ToStream.java       2014-09-09 19:09:30 
> +0300
> @@ -2112,8 +2112,13 @@
>                  // e.g.   '&'  -->  "&amp;"
>                  // e.g.   '<'  -->  "&lt;"
>                  accumDefaultEscape(writer, ch, i, stringChars, len, false, 
> true);
> -            }
> -            else {
> +            } else if (Encodings.isHighUTF16Surrogate(ch)) {
> +                // more than single input character can be processed
> +                // within accumDefaultEscape()
> +                // so we set appropriate value for loop for().
> +                i = accumDefaultEscape(writer, ch, i, stringChars, len, 
> false, true); 
> +
> +            } else {
>                  if (0x0 <= ch && ch <= 0x1F) {
>                      // Range 0x00 through 0x1F inclusive
>                      // This covers the non-whitespace control characters
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

Reply via email to