[jira] [Created] (XERCESJ-1574) Problem with detected encoding for UTF-16 encoded as Unicode Little

Radu Coravu (JIRA) Thu, 19 Jul 2012 00:49:42 -0700

Radu Coravu created XERCESJ-1574:
------------------------------------

             Summary: Problem with detected encoding for UTF-16 encoded as 
Unicode Little
                 Key: XERCESJ-1574
                 URL: https://issues.apache.org/jira/browse/XERCESJ-1574
             Project: Xerces2-J
          Issue Type: Bug
          Components: DOM (Level 3 Core)
    Affects Versions: 2.11.0
            Reporter: Radu Coravu



I have the following test case:

    ByteArrayInputStream bis = new ByteArrayInputStream(
          "<?xml version=\"1.0\" encoding=\"UTF-16\"?> 
<a/>".getBytes("UnicodeLittle"));
    InputSource is = new InputSource(bis);
    DOMParser dp = new DOMParser();
    dp.parse(is);
    assertEquals("UTF-16LE", dp.getDocument().getInputEncoding());

The input stream is encoded as "UnicodeLittle" and " 
dp.getDocument().getInputEncoding()" should return "UTF-16LE" (at least it did 
so in the previous Xerces version). Right now it returns "UTF-16" regardless of 
the byte order mark in the input stream.

So a developer using the information from "dp.getDocument().getInputEncoding()" 
information does not know how to save the document in order to preserve the 
same BOM.

This problem is related to the modifications which were made in the 
XMLEntityManager related to encoding detection.

As a proposed modification, in the method:

org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(String, 
XMLInputSource, boolean, boolean)

before the code:

fCurrentEntity = new ScannedEntity(name,....

we could add the following code:

        if("UTF-16".equals(encoding)) {
          if(isBigEndian != null) {
            if(isBigEndian) {
              encoding = "UTF-16BE"; 
            } else {
              encoding = "UTF-16LE";
            }
          }
        }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (XERCESJ-1574) Problem with detected encoding for UTF-16 encoded as Unicode Little

Reply via email to