[ 
https://issues.apache.org/jira/browse/XALANJ-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811569#comment-17811569
 ] 

Cédric Damioli commented on XALANJ-2625:
----------------------------------------

[~kesh...@alum.mit.edu] this one should be resolved as duplicate of XALANJ-2618

> Text output in ISO-8859-1 in Java 11 
> -------------------------------------
>
>                 Key: XALANJ-2625
>                 URL: https://issues.apache.org/jira/browse/XALANJ-2625
>             Project: XalanJ2
>          Issue Type: Bug
>      Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>          Components: Xalan
>    Affects Versions: 2.7.2
>            Reporter: Daniel van den Ouden
>            Assignee: Gary D. Gregory
>            Priority: Minor
>
> We're currently in the process of upgrading our builds from Java 8 to Java 11 
> and we've run into the following issue:
> Given the following XML
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?>
> <Settings xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
> xsi:noNamespaceSchemaLocation="../xsd/DBSettings.xsd">
>       <Database>
>               <Type value="Oracle"/>
>               <Database value="UTF8"/>
>               <User name="fgi_user" password="fgi"/>
>               <Owner name="fgi_owner" password="fgi"/>
>       </Database>
> </Settings>
>  {noformat}
> and the following XSL
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet version="1.0" 
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; 
> xmlns:fo="http://www.w3.org/1999/XSL/Format";>
>       <xsl:output method="text" version="1.0" encoding="ISO-8859-1" 
> indent="yes"/>
>       <xsl:template match="/">
>       <xsl:text>db://</xsl:text>
>       <xsl:value-of select="/Settings/Database/Type/@value">
>       </xsl:value-of>
>                       <xsl:text>:</xsl:text>
>                       <xsl:value-of select="/Settings/Database/User/@name" />
>       <xsl:text>/</xsl:text>
>                       <xsl:value-of 
> select="/Settings/Database/User/@password" />
>       <xsl:text>@</xsl:text>
>                       <xsl:value-of 
> select="/Settings/Database/Database/@value" />
>       </xsl:template>
> </xsl:stylesheet>
> {noformat}
> We would expect the output to be 
> {noformat}
> db://Oracle:fgi_user/fgi@UTF8
> {noformat}
> But with Java11, the output becomes
> {noformat}
> &#100;&#98;&#58;&#47;&#47;&#79;&#114;&#97;&#99;&#108;&#101;&#58;&#102;&#103;&#105;&#95;&#117;&#115;&#101;&#114;&#47;&#102;&#103;&#105;&#64;&#85;&#84;&#70;&#56;
> {noformat}
> And the console gets flooded with messages like
> {noformat}
> Attempt to output character of integral value 100 that is not represented in 
> specified output encoding of ISO-8859-1.
> Attempt to output character of integral value 98 that is not represented in 
> specified output encoding of ISO-8859-1.
> Attempt to output character of integral value 58 that is not represented in 
> specified output encoding of ISO-8859-1.
> Attempt to output character of integral value 47 that is not represented in 
> specified output encoding of ISO-8859-1.
> Attempt to output character of integral value 47 that is not represented in 
> specified output encoding of ISO-8859-1.
> {noformat}
> The problem seems to be caused by org.apache.xml.serializer.Encodings.java. 
> In loadEncodingInfo(), a properties file is read 
> (org.apache.xml.serializer.Encodings.properties) containing a Java encoding 
> name and the associated MIME name that may appear in a stylesheet. For 
> ISO-8859-1, it contains the following entries in this order:
> {noformat}
> ISO8859-1  ISO-8859-1                             0x00FF
> ISO8859_1  ISO-8859-1                             0x00FF
> 8859-1     ISO-8859-1                             0x00FF
> 8859_1     ISO-8859-1                             0x00FF
> {noformat}
> the loadEncodingInfo() method iterates over these entries, but the order 
> differs between Java 8 and Java 11.
> Java 8:
> {noformat}
> ISO8859-1
> 8859_1
> 8859-1
> ISO8859_1
> {noformat}
> Java 11:
> {noformat}
> ISO8859-1
> ISO8859_1
> 8859_1
> 8859-1
> {noformat}
> Every entry is put in the _encodingTableKeyJava map using the Java name as 
> key, and in the _encodingTableKeyMime hastable using the MIME name as key.
> In our case, the method getEncodingInfo(String encoding) with "encoding" 
> having the value "ISO-8859-1". First the _encodingTableKeyJava map is 
> checked; it doesn't contain the key "ISO-8859-1". Then the 
> _encodingTableKeyMime map is checked, which contains the last entry that was 
> processed from the properties file with a matching MIME name. Then the Java 
> name of that entry is used to build a new EncodingInfo object and perform the 
> actual encoding using the String class.
> The problem here is that with Java 11, the last entry from the properties 
> file is "8859-1". This is NOT an alias for the actual ISO-8859-1 encoding. 
> With Java 8, the last entry would be "ISO8859_1" which IS an alias for 
> ISO-8859-1.
> The aliases as I found them are:
> {noformat}
> ISO-8859-1
>       819
>       ISO8859-1
>       l1
>       ISO_8859-1:1987
>       ISO_8859-1
>       8859_1
>       iso-ir-100
>       latin1
>       cp819
>       ISO8859_1
>       IBM819
>       ISO_8859_1
>       IBM-819
>       csISOLatin1
> {noformat}
> Long story short: org.apache.xml.serializer.Encodings.properties contains 
> entries that are not valid Encoding aliases. Removing 8859-1 through 8859-9 
> should fix it.
> Changing _encodingTableKeyMime to contain multiple encodings per MIME would 
> be an option as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org
For additional commands, e-mail: dev-h...@xalan.apache.org

Reply via email to