Daniel Kec created XALANJ-2617:
----------------------------------
Summary: Serializer produces separately escaped surrogate pair
instead of codepoint
Key: XALANJ-2617
URL: https://issues.apache.org/jira/browse/XALANJ-2617
Project: XalanJ2
Issue Type: Bug
Security Level: No security risk; visible to anyone (Ordinary problems in
Xalan projects. Anybody can view the issue.)
Components: Serialization, Xalan
Affects Versions: 2.7.2, 2.7.1
Reporter: Daniel Kec
Assignee: Steven J. Hathaway
Attachments: JI9053942.java
When trying to serialize XML with char consisting of unicode surogate char
"\uD840\uDC0B" I have tried several and non worked. XML Transformer creates XML
string with escaped surogate pair separately, which makes XML unparseable. eg.:
SAXParseException; Character reference "�" is an invalid XML character.
{code:java|title=Output}
kec@phoebe:~/Downloads$ java -version
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
kec@phoebe:~/Downloads$ java -cp
/home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
JI9053942
Character: 𠀋
EXPECTED: <?xml version="1.0" encoding="UTF-8"?><a>𠀋</a>
ACTUAL: <?xml version="1.0" encoding="UTF-8"?><a>��</a>
[Fatal Error] :1:50: Character reference "&#
{code}
{code:java|title=Test}
String value = "\uD840\uDC0B";
System.out.println("Character: " + value);
System.out.println("EXPECTED: <?xml version=\"1.0\" encoding=\"UTF-8\"?><a>&#"
+ value.codePointAt(0) + ";</a>");
StringWriter writer = new StringWriter();
final DocumentBuilder documentBuilder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document dom = documentBuilder.newDocument();
final Element rootEl = dom.createElement("a");
rootEl.setTextContent(value);
dom.appendChild(rootEl);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new DOMSource(dom), new
javax.xml.transform.stream.StreamResult(writer));
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");
String xml = writer.toString();
System.out.println(" ACTUAL: " + xml);
InputSource inputSource = new InputSource();
inputSource.setCharacterStream(new StringReader(xml));
System.out.println("ACTUAL PARSED CHAR " +
documentBuilder.parse(inputSource).getDocumentElement().getTextContent());
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]