Mon, 8 Jan 2024 16:33:39 +0100, /Martin Honnen/:
On 08/01/2024 16:28, Eric J. Schwarzenbach wrote:
Does anybody have a patch for
https://issues.apache.org/jira/browse/XALANJ-2560
That Xalan produces invalid XML with some utf-8 characters seems
rather serious. I find putting 💻 or the literal character it
represents into an XML document and running it through any XML-to-XML
transform results in it being replaced with �� in the
output which evidently makes the XML invalid. I tried a change to
ToStream.java from https://issues.apache.org/jira/browse/XALANJ-2419
with the source of Xalan 2.7.3 but it did not help.
Use Saxon, perhaps, or see whether
https://stackoverflow.com/a/74245232/252228 helps for patching Xalan.
One may also use just the JDK-supplied provider (a Xalan fork):
* https://lists.apache.org/thread/3hzpj1gt1ql38d17dcfxrgss872v50l6 "XML
Entities"
Related to the patch referenced in the Stack Overflow answer, one may
compare with the JDK sources as well:
*
https://github.com/openjdk/jdk/blob/jdk-21-ga/src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/ToStream.java
*
https://github.com/openjdk/jdk/blob/jdk-21-ga/src/java.xml/share/classes/com/sun/org/apache/xml/internal/serializer/ToHTMLStream.java
--
Stanimir