Daniel van den Ouden created XALANJ-2625: --------------------------------------------
Summary: Text output in ISO-8859-1 in Java 11 Key: XALANJ-2625 URL: https://issues.apache.org/jira/browse/XALANJ-2625 Project: XalanJ2 Issue Type: Bug Security Level: No security risk; visible to anyone (Ordinary problems in Xalan projects. Anybody can view the issue.) Components: Xalan Affects Versions: 2.7.2 Reporter: Daniel van den Ouden Assignee: Gary Gregory We're currently in the process of upgrading our builds from Java 8 to Java 11 and we've run into the following issue: Given the following XML {noformat} <?xml version="1.0" encoding="UTF-8"?> <Settings xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../xsd/DBSettings.xsd"> <Database> <Type value="Oracle"/> <Database value="UTF8"/> <User name="fgi_user" password="fgi"/> <Owner name="fgi_owner" password="fgi"/> </Database> </Settings> {noformat} and the following XSL {noformat} <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"> <xsl:output method="text" version="1.0" encoding="ISO-8859-1" indent="yes"/> <xsl:template match="/"> <xsl:text>db://</xsl:text> <xsl:value-of select="/Settings/Database/Type/@value"> </xsl:value-of> <xsl:text>:</xsl:text> <xsl:value-of select="/Settings/Database/User/@name" /> <xsl:text>/</xsl:text> <xsl:value-of select="/Settings/Database/User/@password" /> <xsl:text>@</xsl:text> <xsl:value-of select="/Settings/Database/Database/@value" /> </xsl:template> </xsl:stylesheet> {noformat} We would expect the output to be {noformat} db://Oracle:fgi_user/fgi@UTF8 {noformat} But with Java11, the output becomes {noformat} db://Oracle:fgi_user/fgi@UTF8 {noformat} And the console gets flooded with messages like {noformat} Attempt to output character of integral value 100 that is not represented in specified output encoding of ISO-8859-1. Attempt to output character of integral value 98 that is not represented in specified output encoding of ISO-8859-1. Attempt to output character of integral value 58 that is not represented in specified output encoding of ISO-8859-1. Attempt to output character of integral value 47 that is not represented in specified output encoding of ISO-8859-1. Attempt to output character of integral value 47 that is not represented in specified output encoding of ISO-8859-1. {noformat} The problem seems to be caused by org.apache.xml.serializer.Encodings.java. In loadEncodingInfo(), a properties file is read (org.apache.xml.serializer.Encodings.properties) containing a Java encoding name and the associated MIME name that may appear in a stylesheet. For ISO-8859-1, it contains the following entries in this order: {noformat} ISO8859-1 ISO-8859-1 0x00FF ISO8859_1 ISO-8859-1 0x00FF 8859-1 ISO-8859-1 0x00FF 8859_1 ISO-8859-1 0x00FF {noformat} the loadEncodingInfo() method iterates over these entries, but the order differs between Java 8 and Java 11. Java 8: {noformat} ISO8859-1 8859_1 8859-1 ISO8859_1 {noformat} Java 11: {noformat} ISO8859-1 ISO8859_1 8859_1 8859-1 {noformat} Every entry is put in the _encodingTableKeyJava map using the Java name as key, and in the _encodingTableKeyMime hastable using the MIME name as key. In our case, the method getEncodingInfo(String encoding) with "encoding" having the value "ISO-8859-1". First the _encodingTableKeyJava map is checked; it doesn't contain the key "ISO-8859-1". Then the _encodingTableKeyMime map is checked, which contains the last entry that was processed from the properties file with a matching MIME name. Then the Java name of that entry is used to build a new EncodingInfo object and perform the actual encoding using the String class. The problem here is that with Java 11, the last entry from the properties file is "8859-1". This is NOT an alias for the actual ISO-8859-1 encoding. With Java 8, the last entry would be "ISO8859_1" which IS an alias for ISO-8859-1. The aliases as I found them are: {noformat} ISO-8859-1 819 ISO8859-1 l1 ISO_8859-1:1987 ISO_8859-1 8859_1 iso-ir-100 latin1 cp819 ISO8859_1 IBM819 ISO_8859_1 IBM-819 csISOLatin1 {noformat} Long story short: org.apache.xml.serializer.Encodings.properties contains entries that are not valid Encoding aliases. Removing 8859-1 through 8859-9 should fix it. Changing _encodingTableKeyMime to contain multiple encodings per MIME would be an option as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org For additional commands, e-mail: dev-h...@xalan.apache.org