Daniel van den Ouden created XALANJ-2625:
--------------------------------------------

             Summary: Text output in ISO-8859-1 in Java 11 
                 Key: XALANJ-2625
                 URL: https://issues.apache.org/jira/browse/XALANJ-2625
             Project: XalanJ2
          Issue Type: Bug
      Security Level: No security risk; visible to anyone (Ordinary problems in 
Xalan projects.  Anybody can view the issue.)
          Components: Xalan
    Affects Versions: 2.7.2
            Reporter: Daniel van den Ouden
            Assignee: Gary Gregory


We're currently in the process of upgrading our builds from Java 8 to Java 11 
and we've run into the following issue:

Given the following XML
{noformat}
<?xml version="1.0" encoding="UTF-8"?>
<Settings xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:noNamespaceSchemaLocation="../xsd/DBSettings.xsd">
        <Database>
                <Type value="Oracle"/>
                <Database value="UTF8"/>
                <User name="fgi_user" password="fgi"/>
                <Owner name="fgi_owner" password="fgi"/>
        </Database>
</Settings>
 {noformat}

and the following XSL
{noformat}
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; 
xmlns:fo="http://www.w3.org/1999/XSL/Format";>
        <xsl:output method="text" version="1.0" encoding="ISO-8859-1" 
indent="yes"/>
        <xsl:template match="/">
        <xsl:text>db://</xsl:text>
        <xsl:value-of select="/Settings/Database/Type/@value">
        </xsl:value-of>
                        <xsl:text>:</xsl:text>
                        <xsl:value-of select="/Settings/Database/User/@name" />
        <xsl:text>/</xsl:text>
                        <xsl:value-of 
select="/Settings/Database/User/@password" />
        <xsl:text>@</xsl:text>
                        <xsl:value-of 
select="/Settings/Database/Database/@value" />
        </xsl:template>
</xsl:stylesheet>
{noformat}

We would expect the output to be 
{noformat}
db://Oracle:fgi_user/fgi@UTF8
{noformat}

But with Java11, the output becomes
{noformat}
&#100;&#98;&#58;&#47;&#47;&#79;&#114;&#97;&#99;&#108;&#101;&#58;&#102;&#103;&#105;&#95;&#117;&#115;&#101;&#114;&#47;&#102;&#103;&#105;&#64;&#85;&#84;&#70;&#56;
{noformat}

And the console gets flooded with messages like
{noformat}
Attempt to output character of integral value 100 that is not represented in 
specified output encoding of ISO-8859-1.
Attempt to output character of integral value 98 that is not represented in 
specified output encoding of ISO-8859-1.
Attempt to output character of integral value 58 that is not represented in 
specified output encoding of ISO-8859-1.
Attempt to output character of integral value 47 that is not represented in 
specified output encoding of ISO-8859-1.
Attempt to output character of integral value 47 that is not represented in 
specified output encoding of ISO-8859-1.
{noformat}

The problem seems to be caused by org.apache.xml.serializer.Encodings.java. In 
loadEncodingInfo(), a properties file is read 
(org.apache.xml.serializer.Encodings.properties) containing a Java encoding 
name and the associated MIME name that may appear in a stylesheet. For 
ISO-8859-1, it contains the following entries in this order:
{noformat}
ISO8859-1  ISO-8859-1                             0x00FF
ISO8859_1  ISO-8859-1                             0x00FF
8859-1     ISO-8859-1                             0x00FF
8859_1     ISO-8859-1                             0x00FF
{noformat}

the loadEncodingInfo() method iterates over these entries, but the order 
differs between Java 8 and Java 11.
Java 8:
{noformat}
ISO8859-1
8859_1
8859-1
ISO8859_1
{noformat}

Java 11:
{noformat}
ISO8859-1
ISO8859_1
8859_1
8859-1
{noformat}

Every entry is put in the _encodingTableKeyJava map using the Java name as key, 
and in the _encodingTableKeyMime hastable using the MIME name as key.

In our case, the method getEncodingInfo(String encoding) with "encoding" having 
the value "ISO-8859-1". First the _encodingTableKeyJava map is checked; it 
doesn't contain the key "ISO-8859-1". Then the _encodingTableKeyMime map is 
checked, which contains the last entry that was processed from the properties 
file with a matching MIME name. Then the Java name of that entry is used to 
build a new EncodingInfo object and perform the actual encoding using the 
String class.
The problem here is that with Java 11, the last entry from the properties file 
is "8859-1". This is NOT an alias for the actual ISO-8859-1 encoding. 
With Java 8, the last entry would be "ISO8859_1" which IS an alias for 
ISO-8859-1.

The aliases as I found them are:

{noformat}
ISO-8859-1
        819
        ISO8859-1
        l1
        ISO_8859-1:1987
        ISO_8859-1
        8859_1
        iso-ir-100
        latin1
        cp819
        ISO8859_1
        IBM819
        ISO_8859_1
        IBM-819
        csISOLatin1
{noformat}


Long story short: org.apache.xml.serializer.Encodings.properties contains 
entries that are not valid Encoding aliases. Removing 8859-1 through 8859-9 
should fix it.
Changing _encodingTableKeyMime to contain multiple encodings per MIME would be 
an option as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

Reply via email to