[ 
https://issues.apache.org/jira/browse/XERCESJ-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Costanzo updated XERCESJ-1574:
------------------------------------
    Fix Version/s: 2.12.0

> Problem with detected encoding for UTF-16 encoded as Unicode Little
> -------------------------------------------------------------------
>
>                 Key: XERCESJ-1574
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1574
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: DOM (Level 3 Core)
>    Affects Versions: 2.11.0
>            Reporter: Radu Coravu
>            Assignee: Michael Glavassevich
>             Fix For: 2.12.0
>
>         Attachments: patch.txt
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> I have the following test case:
>     ByteArrayInputStream bis = new ByteArrayInputStream(
>           "<?xml version=\"1.0\" encoding=\"UTF-16\"?> 
> <a/>".getBytes("UnicodeLittle"));
>     InputSource is = new InputSource(bis);
>     DOMParser dp = new DOMParser();
>     dp.parse(is);
>     assertEquals("UTF-16LE", dp.getDocument().getInputEncoding());
> The input stream is encoded as "UnicodeLittle" and " 
> dp.getDocument().getInputEncoding()" should return "UTF-16LE" (at least it 
> did so in the previous Xerces version). Right now it returns "UTF-16" 
> regardless of the byte order mark in the input stream.
> So a developer using the information from 
> "dp.getDocument().getInputEncoding()" information does not know how to save 
> the document in order to preserve the same BOM.
> This problem is related to the modifications which were made in the 
> XMLEntityManager related to encoding detection.
> As a proposed modification, in the method:
> org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(String, 
> XMLInputSource, boolean, boolean)
> before the code:
> fCurrentEntity = new ScannedEntity(name,....
> we could add the following code:
>         if("UTF-16".equals(encoding)) {
>           if(isBigEndian != null) {
>             if(isBigEndian) {
>               encoding = "UTF-16BE"; 
>             } else {
>               encoding = "UTF-16LE";
>             }
>           }
>         }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-dev-h...@xerces.apache.org

Reply via email to