[
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597890#comment-16597890
]
Peter De Maeyer edited comment on XALANJ-2617 at 9/13/18 6:46 AM:
------------------------------------------------------------------
[~danielkec], I reviewed your patch, but I wonder if the 'if' statement you
added shouldn't have been an 'if else' statement. The reason I wonder is that
the 'else if' and 'else' branches that immediately follow your changes now get
a different meaning. Especially the 'else' branch that has a code comment "//
This is a fallback plan, we should never get here" unsettles me. I can imagine
that some _new_ scenario is indeed fixed, but I'm worried that some _existing_
scenarios (which were supposed to trigger this "fallback plan") might be
broken. The fact that I did not find any unit tests to illustrate those
scenarios does not help take away my concern either. I'm not familiar with the
Xalan codebase (I am very familiar with Java code in general though), so maybe
I misunderstand a couple of things, I would really appreciate it if you could
reassure me this patch is indeed the right fix.
Then as a cosmetic side note, I noticed that your patch does not respect the
code style of the surrounding code. '{' should not be on a new line, and there
are 2 instances of a ',' that aren't followed by a space as they should. Call
me pedantic, but I'm just being thorough here.
was (Author: peterdm):
[~danielkec], I reviewed your patch, but I wonder if the 'if' statement you
added shouldn't have been an 'if else' statement. The reason I wonder is that
the 'else if' and 'else' branches that immediately follow your changes now get
a different meaning. Especially the 'else' branch that has a code comment "//
This is a fallback plan, we should never get here" unsettles me. I can imagine
that some _new_ scenario is indeed fixed, but I'm worried that some _existing_
scenarios (which were supposed to trigger this "fallback plan") might be
broken. The fact that I did not find any unit tests to illustrate those
scenarios does not help take away my concern either. I'm not familiar with the
Xalan codebase (I am very familiar with Java code in general though), so maybe
I misunderstand a couple of things, it would really appreciate it if you could
reassure me this patch is indeed the right fix.
Then as a cosmetic side note, I noticed that your patch does not respect the
code style of the surrounding code. '{' should not be on a new line, and there
are 2 instances of a ',' that aren't followed by a space as they should. Call
me pedantic, but I'm just being thorough here.
> Serializer produces separately escaped surrogate pair instead of codepoint
> --------------------------------------------------------------------------
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
> Issue Type: Bug
> Security Level: No security risk; visible to anyone(Ordinary problems in
> Xalan projects. Anybody can view the issue.)
> Components: Serialization, Xalan
> Affects Versions: 2.7.1, 2.7.2
> Reporter: Daniel Kec
> Assignee: Steven J. Hathaway
> Priority: Major
> Attachments: JI9053942.java,
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch,
> XALANJ-2617_Fix_missing_surrogate_pairs_support_new.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates
> XML string with escaped surogate pair separately, which makes XML
> unparseable. eg.: SAXParseException; Character reference "�" is an
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
> JI9053942
> Character: 𠀋
> EXPECTED: <?xml version="1.0" encoding="UTF-8"?><a>𠀋</a>
> ACTUAL: <?xml version="1.0" encoding="UTF-8"?><a>��</a>
> [Fatal Error] :1:50: Character reference "&#
> {code}
> {code:java|title=But Xalan ver. 2.7.0 works OK}
> kec@phoebe:~/Downloads$ java -cp
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.0/xalan-2.7.0.jar:/home/kec/.m2/repository/xalan/serializer/2.7.0/serializer-2.7.0.jar:.
> JI9053942
> Character: 𠀋
> EXPECTED: <?xml version="1.0" encoding="UTF-8"?><a>𠀋</a>
> ACTUAL: <?xml version="1.0" encoding="UTF-8"?><a>𠀋</a>
> ACTUAL PARSED CHAR 𠀋
> {code}
> {code:java|title=Test}
> String value = "\uD840\uDC0B";
> System.out.println("Character: " + value);
> System.out.println("EXPECTED: <?xml version=\"1.0\"
> encoding=\"UTF-8\"?><a>&#" + value.codePointAt(0) + ";</a>");
> StringWriter writer = new StringWriter();
> final DocumentBuilder documentBuilder =
> DocumentBuilderFactory.newInstance().newDocumentBuilder();
> Document dom = documentBuilder.newDocument();
> final Element rootEl = dom.createElement("a");
> rootEl.setTextContent(value);
> dom.appendChild(rootEl);
> Transformer transformer = TransformerFactory.newInstance().newTransformer();
> transformer.transform(new DOMSource(dom), new
> javax.xml.transform.stream.StreamResult(writer));
> String xml = writer.toString();
> System.out.println(" ACTUAL: " + xml);
> InputSource inputSource = new InputSource();
> inputSource.setCharacterStream(new StringReader(xml));
> System.out.println("ACTUAL PARSED CHAR " +
> documentBuilder.parse(inputSource).getDocumentElement().getTextContent());
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]