[ 
https://issues.apache.org/jira/browse/DAFFODIL-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Lawrence resolved DAFFODIL-2128.
--------------------------------------
    Resolution: Fixed

Fixed in commit 8bb1aa2b1c9d2b2e00cf07de6b449805a2fd2d37

> XML preamble encoding ignored when CLI unparsing with "xml" infoset type 
> -------------------------------------------------------------------------
>
>                 Key: DAFFODIL-2128
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2128
>             Project: Daffodil
>          Issue Type: Bug
>          Components: CLI
>    Affects Versions: 2.3.0
>            Reporter: Steve Lawrence
>            Assignee: Steve Lawrence
>            Priority: Major
>             Fix For: 2.4.0
>
>
> When using the CLI to unparse XML using the "xml" infoset type, we have the 
> following code:
> {code:scala}
> case "xml" => {
>   val rdr = new BufferedReader(new InputStreamReader(new 
> ByteArrayInputStream(anyRef.asInstanceOf[Array[Byte]])))
>   new XMLTextInfosetInputter(rdr)
> }
> {code}
> In order to create the XMLTextInfosetInputter, we create an 
> InputStreamReader, but we do not specify an encoding. This means the Java 
> "file.encoding" system property will be used to decode this XML. So on 
> machines where that property isn't UTF-8 (e.g. Windows), this can result in 
> UTF-8 data in the XML not decoded correctly, which leads to incorrect 
> unparsed data.
> I believe Woodstox has the ability to inspect XML and determine the encoding 
> based on the preamble, so we should just take advantage of that. So we should 
> change the XMLTextInfosetInputter to accept an InputStream in the constructor 
> instead of a Reader, and deprecate the Reader constructor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to