Encodings.properties

Simon Schaarschmidt (JIRA) Mon, 24 Sep 2018 07:14:12 -0700

Simon Schaarschmidt created XALANJ-2618:
-------------------------------------------


             Summary: Error in org/apache/xml/serializer/Encodings.properties
                 Key: XALANJ-2618
                 URL: https://issues.apache.org/jira/browse/XALANJ-2618
             Project: XalanJ2
          Issue Type: Bug
      Security Level: No security risk; visible to anyone (Ordinary problems in 
Xalan projects.  Anybody can view the issue.)
          Components: Serialization, transformation
    Affects Versions: 2.7.2
         Environment: Java 11
            Reporter: Simon Schaarschmidt
            Assignee: Steven J. Hathaway


We transform and serialize using encoding ISO-8859-1. With JDK 1.8 all is fine, 
but with OpenJDK 11 the result will be written (from class ToTextStream) in 
character references, e.g. "*&#105;&#100;&#61;&#49;*" instead of "*id=1*".

In org/apache/xml/serializer/Encodings.properties (serializer.jar) are various 
encodings defined, e.g.

{{ISO8859-1  ISO-8859-1  0x00FF}}
{{ ISO8859_1  ISO-8859-1  0x00FF}}
{{ {color:#FF0000}8859-1{color}     ISO-8859-1  0x00FF}}
{{ {color:#FF0000}8859_1{color}     ISO-8859-1  0x00FF}}

First value: Java encoding name

Second value: comma separated preferred mime names.

The class org.apache.xml.serializer.Encodings reads this file in a Properties 
object and processes the definitions to create EncodingInfo objects and puts 
them (see method loadEncodingInfo()) into the member fields 
__encodingTableKeyJava_ and __encodingTableKeyMime_ (both Hashtable). 
Especially putting Elements into _encodingTableKeyMime is critical because 
there is not a 1:1 mapping and the latest returned Properties.keys() element 
replaces the previous ElementInfo object.

Until Java 1.8 the first line from above is the latest entry in Enumeration, 
therefor _encodingTableKeyMime returns the EncodingInfo object with Java 
encoding "{color:#14892c}ISO8859-1{color}" for encoding "ISO-8859-1". With Java 
11 the elements of the Enumeration returned by Properties.keys() has a 
different order: the third line from above is the latest entry! Therefor 
_encodingTableKeyMime returns the EncodingInfo object with Java encoding 
"*{color:#FF0000}8859-1{color}*" when asking for encoding "ISO-8859-1". But: 
"8859-1" ist not a valid Java encoding name! Method 
EncodingInfo.inEncoding(char,String) fails internally with an 
*UnsupportedEncodingException* and returns false.

The methods in class Encodings first searches EncodingInfo object in 
_encodingTableKeyJava and uses elements from _encodingTableKeyMime as fallback.

I suggest the definitions in Encodings.properties must be extended with 
additional lines, e.g.

*{color:#14892c}ISO-8859-1{color}* ISO-8859-1  0x00FF

Also for encodings ISO-8859-2..9. Or all entries with Java encoding name 
"8859*" should be removed. (They are not valid Java encoding names - 
UnsupportedEncodingException!)

Finally I think, the current mechanism of collecting the EncodingInfo objects 
using two Hashtables is critical.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (XALANJ-2618) Error in org/apache/xml/serializer/Encodings.properties

Reply via email to