[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Kec updated XALANJ-2617:
-------------------------------
    Description: 
When trying to serialize XML with char consisting of unicode surogate char 
"\uD840\uDC0B" I have tried several and non worked. XML Transformer creates XML 
string with escaped surogate pair separately, which makes XML unparseable. eg.: 
SAXParseException; Character reference "&#55360" is an invalid XML character.

 
{code:java|title=Output}
kec@phoebe:~/Downloads$ java -version
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)

kec@phoebe:~/Downloads$ java -cp 
/home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
 JI9053942
Character: 𠀋
EXPECTED: <?xml version="1.0" encoding="UTF-8"?><a>&#131083;</a>
 ACTUAL: <?xml version="1.0" encoding="UTF-8"?><a>&#55360;&#56331;</a>
[Fatal Error] :1:50: Character reference "&#
{code}
{code:java|title=Test}
String value = "\uD840\uDC0B"; 
System.out.println("Character: " + value); 
System.out.println("EXPECTED: <?xml version=\"1.0\" encoding=\"UTF-8\"?><a>&#" 
+ value.codePointAt(0) + ";</a>"); 
StringWriter writer = new StringWriter(); 

final DocumentBuilder documentBuilder = 
DocumentBuilderFactory.newInstance().newDocumentBuilder(); 
Document dom = documentBuilder.newDocument(); 
final Element rootEl = dom.createElement("a"); 
rootEl.setTextContent(value); 
dom.appendChild(rootEl); 

Transformer transformer = TransformerFactory.newInstance().newTransformer(); 
transformer.transform(new DOMSource(dom), new 
javax.xml.transform.stream.StreamResult(writer)); 
String xml = writer.toString(); 
System.out.println(" ACTUAL: " + xml); 

InputSource inputSource = new InputSource(); 
inputSource.setCharacterStream(new StringReader(xml)); 
System.out.println("ACTUAL PARSED CHAR " + 
documentBuilder.parse(inputSource).getDocumentElement().getTextContent()); 
{code}

  was:
When trying to serialize XML with char consisting of unicode surogate char 
"\uD840\uDC0B" I have tried several and non worked. XML Transformer creates XML 
string with escaped surogate pair separately, which makes XML unparseable. eg.: 
SAXParseException; Character reference "&#55360" is an invalid XML character.

 
{code:java|title=Output}
kec@phoebe:~/Downloads$ java -version
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)

kec@phoebe:~/Downloads$ java -cp 
/home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
 JI9053942
Character: 𠀋
EXPECTED: <?xml version="1.0" encoding="UTF-8"?><a>&#131083;</a>
 ACTUAL: <?xml version="1.0" encoding="UTF-8"?><a>&#55360;&#56331;</a>
[Fatal Error] :1:50: Character reference "&#
{code}
{code:java|title=Test}
String value = "\uD840\uDC0B"; 
System.out.println("Character: " + value); 
System.out.println("EXPECTED: <?xml version=\"1.0\" encoding=\"UTF-8\"?><a>&#" 
+ value.codePointAt(0) + ";</a>"); 
StringWriter writer = new StringWriter(); 

final DocumentBuilder documentBuilder = 
DocumentBuilderFactory.newInstance().newDocumentBuilder(); 
Document dom = documentBuilder.newDocument(); 
final Element rootEl = dom.createElement("a"); 
rootEl.setTextContent(value); 
dom.appendChild(rootEl); 

Transformer transformer = TransformerFactory.newInstance().newTransformer(); 
transformer.transform(new DOMSource(dom), new 
javax.xml.transform.stream.StreamResult(writer)); 
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16"); 
String xml = writer.toString(); 
System.out.println(" ACTUAL: " + xml); 

InputSource inputSource = new InputSource(); 
inputSource.setCharacterStream(new StringReader(xml)); 
System.out.println("ACTUAL PARSED CHAR " + 
documentBuilder.parse(inputSource).getDocumentElement().getTextContent()); 
{code}


> Serializer produces separately escaped surrogate pair instead of codepoint
> --------------------------------------------------------------------------
>
>                 Key: XALANJ-2617
>                 URL: https://issues.apache.org/jira/browse/XALANJ-2617
>             Project: XalanJ2
>          Issue Type: Bug
>      Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>          Components: Serialization, Xalan
>    Affects Versions: 2.7.1, 2.7.2
>            Reporter: Daniel Kec
>            Assignee: Steven J. Hathaway
>            Priority: Major
>         Attachments: JI9053942.java
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "&#55360" is an 
> invalid XML character.
>  
> {code:java|title=Output}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 𠀋
> EXPECTED: <?xml version="1.0" encoding="UTF-8"?><a>&#131083;</a>
>  ACTUAL: <?xml version="1.0" encoding="UTF-8"?><a>&#55360;&#56331;</a>
> [Fatal Error] :1:50: Character reference "&#
> {code}
> {code:java|title=Test}
> String value = "\uD840\uDC0B"; 
> System.out.println("Character: " + value); 
> System.out.println("EXPECTED: <?xml version=\"1.0\" 
> encoding=\"UTF-8\"?><a>&#" + value.codePointAt(0) + ";</a>"); 
> StringWriter writer = new StringWriter(); 
> final DocumentBuilder documentBuilder = 
> DocumentBuilderFactory.newInstance().newDocumentBuilder(); 
> Document dom = documentBuilder.newDocument(); 
> final Element rootEl = dom.createElement("a"); 
> rootEl.setTextContent(value); 
> dom.appendChild(rootEl); 
> Transformer transformer = TransformerFactory.newInstance().newTransformer(); 
> transformer.transform(new DOMSource(dom), new 
> javax.xml.transform.stream.StreamResult(writer)); 
> String xml = writer.toString(); 
> System.out.println(" ACTUAL: " + xml); 
> InputSource inputSource = new InputSource(); 
> inputSource.setCharacterStream(new StringReader(xml)); 
> System.out.println("ACTUAL PARSED CHAR " + 
> documentBuilder.parse(inputSource).getDocumentElement().getTextContent()); 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

Reply via email to