[
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615312#comment-16615312
]
Peter De Maeyer edited comment on XALANJ-2617 at 9/14/18 8:00 PM:
------------------------------------------------------------------
Pull request created. Unfortunately, it only contains the fix in production
code and not the tests, because there is no repository on Github for the test
code. This confuses me a bit - if anyone has a recommendation of what to do
with the test, I'd be happy to follow them.
What I did:
* (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient)
authoritative repository for the production code. This is where I created my
production code patch against.
* (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient)
authoritative repository for the test code. This is where I created my test
code patch against.
* (/) The repository for the production code is mirrored on Github:
https://github.com/apache/xalan-j. This is where I created a pull request
against for my production code patch.
* (x) I did not find an equivalent mirror on Github of the repository for the
test code, so I can't create a pull request for my test code patch.
To complete the story: I successfully ran the minitest and smoketest in the
test repository before and after my fix. In order to be able to do this, I
recreated an ancient Windows 2000 32-bit system in a VM, capable of running the
ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being
spoiled with JUnit, it took some effort to take a step back in time:
# Install Windows 2000 32-bit in a VirtualBox VM.
# Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I
could have compiled this with a JDK 1.6 as well, but that only applies to the
bytecode, it doesn't prevent @Since > 1.3 API usage).
# Familiarize myself with the really clunky and ancient test harness (being
used to JUnit).
Forgive me if this explanation is overly verbose, but I'm trying to illustrate
that I didn't make this patch in a hurry, I was being thorough.
was (Author: peterdm):
Pull request created. Unfortunately, it only contains the fix in production
code and not the tests, because there is no repository on Github for the test
code. This confuses me a bit - if anyone has a recommendation of what to do
with the test, I'd be happy to follow them.
My understanding of things (correct me if I'm wrong):
* (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient)
authoritative repository for the production code. This is where I created my
production code patch against.
* (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient)
authoritative repository for the test code. This is where I created my test
code patch against.
* (/) The repository for the production code is mirrored on Github:
https://github.com/apache/xalan-j. This is where I created a pull request
against for my production code patch.
* (x) I did not find an equivalent mirror on Github of the repository for the
test code, so I can't create a pull request for my test code patch.
To complete the story: I successfully ran the minitest and smoketest in the
test repository before and after my fix. In order to be able to do this, I
recreated an ancient Windows 2000 32-bit system in a VM, capable of running the
ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being
spoiled with JUnit, it took some effort to take a step back in time:
# Install Windows 2000 32-bit in a VirtualBox VM.
# Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I
could have compiled this with a JDK 1.6 as well, but that only applies to the
bytecode, it doesn't prevent @Since > 1.3 API usage).
# Familiarize myself with the really clunky and ancient test harness (being
used to JUnit).
Forgive me if this explanation is overly verbose, but I'm trying to illustrate
that I didn't make this patch in a hurry, I was being thorough.
> Serializer produces separately escaped surrogate pair instead of codepoint
> --------------------------------------------------------------------------
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
> Issue Type: Bug
> Security Level: No security risk; visible to anyone(Ordinary problems in
> Xalan projects. Anybody can view the issue.)
> Components: Serialization, Xalan
> Affects Versions: 2.7.1, 2.7.2
> Reporter: Daniel Kec
> Assignee: Steven J. Hathaway
> Priority: Major
> Attachments: JI9053942.java,
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch,
> XALANJ-2617_java.patch, XALANJ-2617_test.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates
> XML string with escaped surogate pair separately, which makes XML
> unparseable. eg.: SAXParseException; Character reference "�" is an
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
> JI9053942
> Character: 𠀋
> EXPECTED: <?xml version="1.0" encoding="UTF-8"?><a>𠀋</a>
> ACTUAL: <?xml version="1.0" encoding="UTF-8"?><a>��</a>
> [Fatal Error] :1:50: Character reference "&#
> {code}
> {code:java|title=But Xalan ver. 2.7.0 works OK}
> kec@phoebe:~/Downloads$ java -cp
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.0/xalan-2.7.0.jar:/home/kec/.m2/repository/xalan/serializer/2.7.0/serializer-2.7.0.jar:.
> JI9053942
> Character: 𠀋
> EXPECTED: <?xml version="1.0" encoding="UTF-8"?><a>𠀋</a>
> ACTUAL: <?xml version="1.0" encoding="UTF-8"?><a>𠀋</a>
> ACTUAL PARSED CHAR 𠀋
> {code}
> {code:java|title=Test}
> String value = "\uD840\uDC0B";
> System.out.println("Character: " + value);
> System.out.println("EXPECTED: <?xml version=\"1.0\"
> encoding=\"UTF-8\"?><a>&#" + value.codePointAt(0) + ";</a>");
> StringWriter writer = new StringWriter();
> final DocumentBuilder documentBuilder =
> DocumentBuilderFactory.newInstance().newDocumentBuilder();
> Document dom = documentBuilder.newDocument();
> final Element rootEl = dom.createElement("a");
> rootEl.setTextContent(value);
> dom.appendChild(rootEl);
> Transformer transformer = TransformerFactory.newInstance().newTransformer();
> transformer.transform(new DOMSource(dom), new
> javax.xml.transform.stream.StreamResult(writer));
> String xml = writer.toString();
> System.out.println(" ACTUAL: " + xml);
> InputSource inputSource = new InputSource();
> inputSource.setCharacterStream(new StringReader(xml));
> System.out.println("ACTUAL PARSED CHAR " +
> documentBuilder.parse(inputSource).getDocumentElement().getTextContent());
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]