[
https://issues.apache.org/jira/browse/DAFFODIL-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Lawrence resolved DAFFODIL-2128.
--------------------------------------
Resolution: Fixed
Fixed in commit 8bb1aa2b1c9d2b2e00cf07de6b449805a2fd2d37
> XML preamble encoding ignored when CLI unparsing with "xml" infoset type
> -------------------------------------------------------------------------
>
> Key: DAFFODIL-2128
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2128
> Project: Daffodil
> Issue Type: Bug
> Components: CLI
> Affects Versions: 2.3.0
> Reporter: Steve Lawrence
> Assignee: Steve Lawrence
> Priority: Major
> Fix For: 2.4.0
>
>
> When using the CLI to unparse XML using the "xml" infoset type, we have the
> following code:
> {code:scala}
> case "xml" => {
> val rdr = new BufferedReader(new InputStreamReader(new
> ByteArrayInputStream(anyRef.asInstanceOf[Array[Byte]])))
> new XMLTextInfosetInputter(rdr)
> }
> {code}
> In order to create the XMLTextInfosetInputter, we create an
> InputStreamReader, but we do not specify an encoding. This means the Java
> "file.encoding" system property will be used to decode this XML. So on
> machines where that property isn't UTF-8 (e.g. Windows), this can result in
> UTF-8 data in the XML not decoded correctly, which leads to incorrect
> unparsed data.
> I believe Woodstox has the ability to inspect XML and determine the encoding
> based on the preamble, so we should just take advantage of that. So we should
> change the XMLTextInfosetInputter to accept an InputStream in the constructor
> instead of a Reader, and deprecate the Reader constructor.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)