[ https://issues.apache.org/jira/browse/XALANJ-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805304#comment-17805304 ]
Joe Kesselman commented on XALANJ-2560: --------------------------------------- Comments from the mailing list: [~ericjs] : Might be worth posting your analysis of what the JDK fork of Xalan is doing differently that avoids this issue. [~martin.honnen] (I hope that's the right one) suggests {{[https://stackoverflow.com/a/74245232/252228]}} might be relevant. [~stanio] cites the following as possibly relevant places to look, along with ToXMLStream. Remember that this change should not affect anything but XML and HTML serialization; other representations generally do not use SGML-syntax Numeric Character References and should generally output the appropriate bytes for that encoding (assuming the character is present in that encoding). {{[https://github.com/openjdk/jdk/blob/jdk-21-ga/src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/ToStream.java] [https://github.com/openjdk/jdk/blob/jdk-21-ga/src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/ToHTMLStream.java]}} > ToXMLStream does not support unicode supplementary characters > ------------------------------------------------------------- > > Key: XALANJ-2560 > URL: https://issues.apache.org/jira/browse/XALANJ-2560 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization > Affects Versions: 2.7.1 > Environment: Xalan 2.7.1 serializer. > Tested on Ubuntu 12.04 with Oracle JDK 1.7.0_05. > Reporter: Damien Guillaume > Assignee: Joe Kesselman > Priority: Major > Labels: serialization, unicode > > org.apache.xml.serializer.ToXMLStream (which extends ToStream) does not > support serialization of unicode supplementary characters such as U+1D49C. It > creates invalid characters entities like "��" instead of > "𝒜" (or F0 9D 92 9C with UTF-8). ToXMLStream is used by LSSerializer > when Xalan's serializer is on the classpath. > org.apache.xml.serialize.DOMSerializerImpl (included in Xerces) does not have > this problem, but it is deprecated since Xerces 2.9.0, so this is a > regression. > See > http://stackoverflow.com/questions/11952289/serializing-supplementary-unicode-characters-into-xml-documents-with-java > for more details. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org For additional commands, e-mail: dev-h...@xalan.apache.org