[ https://issues.apache.org/jira/browse/XALANJ-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486869#comment-16486869 ]
Jesper Steen Møller commented on XALANJ-2593: --------------------------------------------- This is also fixed in the patch in XALANJ-2419. > Incorrect showing of supplementary characters in attributes > ----------------------------------------------------------- > > Key: XALANJ-2593 > URL: https://issues.apache.org/jira/browse/XALANJ-2593 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization > Affects Versions: 2.7.2 > Environment: Win 7 x64, Java 1.6 > Reporter: Eugene Shkel > Assignee: Steven J. Hathaway > Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > In Xalan 2.7.2 the supplementary characters (see > http://www.oracle.com/technetwork/articles/javase/supplementary-142654.html > for details) shown incorrectly in attributes . > For example, I need to show symbols 𣎴 (& # 144308 ; ) or 𠘨 (& # 132648 ; ) in > attribute "y" of element "x" > Expected result: {code}<?xml version="1.0" encoding="UTF-8"?><x y="𣎴 > - 𠘨"/>{code} > Actual result for Xalan 2.7.2 is:{code} <?xml version="1.0" > encoding="UTF-8"?><x y="�� - ��"/>{code} > Code snippet for test: > {code} > public static void main(String[] argv) throws Exception { > TransformerFactory tFactory = TransformerFactory.newInstance(); > StreamSource stylesource = new StreamSource(new StringReader("<?xml > version=\"1.0\" encoding=\"UTF-8\"?><xsl:stylesheet > xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" version=\"1.0\" > ><xsl:template match=\"/\"><x y=\"{xslt/search/value1}\" > /></xsl:template></xsl:stylesheet>")); > Transformer transformer = tFactory.newTransformer(stylesource); > StreamSource source = new StreamSource(new StringReader("<?xml > version=\"1.0\"?><xslt><search><value1>𣎴 - 𠘨</value1></search></xslt>")); > Result result = new StreamResult(System.out); > transformer.transform(source, result); > } > {code} > The problem relates to the method > org.apache.xml.serializer.ToStream.writeAttrString(Writer, String, String). > {code} > if (m_charInfo.shouldMapAttrChar(ch)) { > // The character is supposed to be replaced by a String > // e.g. '&' --> "&" > // e.g. '<' --> "<" > accumDefaultEscape(writer, ch, i, stringChars, len, false, > true); > } > {code} > this part doesn't process multicharacter sequences like supplementary > characters within Java platform and this leads to executing next part within > same method > {code} > else { > // This is a fallback plan, we should never get here > // but if the character wasn't previously handled > // (i.e. isn't in the encoding, etc.) then what > // should we do? We choose to write out a character ref > writer.write("!13&#"); > writer.write(Integer.toString(ch)); > writer.write(';'); > } > {code} > PS: Can't add patch file, so put here. > {code} > --- src\org\apache\xml\serializer\ToStream.java 2014-03-26 17:21:30 > +0200 > +++ src\org\apache\xml\serializer\ToStream.java 2014-09-09 19:09:30 > +0300 > @@ -2112,8 +2112,13 @@ > // e.g. '&' --> "&" > // e.g. '<' --> "<" > accumDefaultEscape(writer, ch, i, stringChars, len, false, > true); > - } > - else { > + } else if (Encodings.isHighUTF16Surrogate(ch)) { > + // more than single input character can be processed > + // within accumDefaultEscape() > + // so we set appropriate value for loop for(). > + i = accumDefaultEscape(writer, ch, i, stringChars, len, > false, true); > + > + } else { > if (0x0 <= ch && ch <= 0x1F) { > // Range 0x00 through 0x1F inclusive > // This covers the non-whitespace control characters > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org For additional commands, e-mail: dev-h...@xalan.apache.org