[ 
https://issues.apache.org/jira/browse/XALANJ-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805304#comment-17805304
 ] 

Joe Kesselman commented on XALANJ-2560:
---------------------------------------

Comments from the mailing list:

[~ericjs] : Might be worth posting your analysis of what the JDK fork of Xalan 
is doing differently that avoids this issue.

[~martin.honnen] (I hope that's the right one) suggests 
{{[https://stackoverflow.com/a/74245232/252228]}}  might be relevant.

[~stanio] cites the following as possibly relevant places to look, along with 
ToXMLStream. Remember that this change should not affect anything but XML and 
HTML serialization; other representations generally do not use SGML-syntax 
Numeric Character References and should generally output the appropriate bytes 
for that encoding (assuming the character is present in that encoding).

{{[https://github.com/openjdk/jdk/blob/jdk-21-ga/src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/ToStream.java]

[https://github.com/openjdk/jdk/blob/jdk-21-ga/src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/ToHTMLStream.java]}}

> ToXMLStream does not support unicode supplementary characters
> -------------------------------------------------------------
>
>                 Key: XALANJ-2560
>                 URL: https://issues.apache.org/jira/browse/XALANJ-2560
>             Project: XalanJ2
>          Issue Type: Bug
>      Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>          Components: Serialization
>    Affects Versions: 2.7.1
>         Environment: Xalan 2.7.1 serializer.
> Tested on Ubuntu 12.04 with Oracle JDK 1.7.0_05.
>            Reporter: Damien Guillaume
>            Assignee: Joe Kesselman
>            Priority: Major
>              Labels: serialization, unicode
>
> org.apache.xml.serializer.ToXMLStream (which extends ToStream) does not 
> support serialization of unicode supplementary characters such as U+1D49C. It 
> creates invalid characters entities like "��" instead of 
> "𝒜" (or F0 9D 92 9C with UTF-8). ToXMLStream is used by LSSerializer 
> when Xalan's serializer is on the classpath.
> org.apache.xml.serialize.DOMSerializerImpl (included in Xerces) does not have 
> this problem, but it is deprecated since Xerces 2.9.0, so this is a 
> regression.
> See 
> http://stackoverflow.com/questions/11952289/serializing-supplementary-unicode-characters-into-xml-documents-with-java
>  for more details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

Reply via email to