[ https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774677#comment-16774677 ]
Jason Harrop commented on XALANJ-2419: -------------------------------------- It works under Java 11 if I change makeStream("ISO-8859-1") to makeStream("ISO8859_1"). With makeStream("ISO-8859-1"), s.getBytes(encoding) throws UnsupportedEncodingException for encoding 8859-1 at {code:java} EncodingInfo.inEncoding(char, String) line: 438 EncodingInfo$EncodingImpl.isInEncoding(char) line: 226 EncodingInfo$EncodingImpl.isInEncoding(char) line: 215 EncodingInfo.isInEncoding(char) line: 113 ToXMLStream(ToStream).characters(char[], int, int) line: 1597 ToXMLStream(ToStream).characters(String) line: 1774 ToXMLStreamTest(ToStreamTest).outputCharacters(ToStream, String) line: 88 ToXMLStreamTest.testCase2() line: 114 NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method] NativeMethodAccessorImpl.invoke(Object, Object[]) line: 62 DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 566 Reporter.executeTests(Test, int, Object) line: 787 ToXMLStreamTest(FileBasedTest).runTestCases(Properties) line: 339 ToXMLStreamTest(TestImpl).runTest(Properties) line: 205 ToXMLStreamTest(FileBasedTest).doMain(String[]) line: 833 ToXMLStreamTest.main(String[]) line: 196 {code} Not related to 2419, but FYI there is one other test which fails, due to date formatting and http://openjdk.java.net/jeps/252 I've put the test code on GitHub; for Java 11 I am using https://github.com/plutext/xalan-test/tree/Plutext_Java11_xalan-j_2_7_x > Astral characters written as a pair of NCRs with the surrogate scalar values > when using UTF-8 > --------------------------------------------------------------------------------------------- > > Key: XALANJ-2419 > URL: https://issues.apache.org/jira/browse/XALANJ-2419 > Project: XalanJ2 > Issue Type: Bug > Components: Serialization > Affects Versions: 2.7.1 > Reporter: Henri Sivonen > Priority: Major > Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt > > > org.apache.xml.serializer.ToStream contains the following code: > else if (m_encodingInfo.isInEncoding(ch)) { > // If the character is in the encoding, and > // not in the normal ASCII range, we also > // just leave it get added on to the clean characters > > } > else { > // This is a fallback plan, we should never get here > // but if the character wasn't previously handled > // (i.e. isn't in the encoding, etc.) then what > // should we do? We choose to write out an entity > writeOutCleanChars(chars, i, lastDirtyCharProcessed); > writer.write("&#"); > writer.write(Integer.toString(ch)); > writer.write(';'); > lastDirtyCharProcessed = i; > } > This leads to the wrong (latter) if branch running for surrogates, because > isInEncoding() for UTF-8 returns false for surrogates. It is always wrong > (regardless of encoding) to escape a surrogate as an NCR. > The practical effect of this bug is that any document with astral characters > in it ends up in an ill-formed serialization and does not parse back using an > XML parser. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org For additional commands, e-mail: dev-h...@xalan.apache.org