Steve Lawrence created DAFFODIL-2128:
----------------------------------------
Summary: XML preamble encoding ignored when CLI unparsing with
"xml" infoset type
Key: DAFFODIL-2128
URL: https://issues.apache.org/jira/browse/DAFFODIL-2128
Project: Daffodil
Issue Type: Bug
Components: CLI
Affects Versions: 2.3.0
Reporter: Steve Lawrence
Fix For: 2.4.0
When using the CLI to unparse XML using the "xml" infoset type, we have the
following code:
{code:scala}
case "xml" => {
val rdr = new BufferedReader(new InputStreamReader(new
ByteArrayInputStream(anyRef.asInstanceOf[Array[Byte]])))
new XMLTextInfosetInputter(rdr)
}
{code}
In order to create the XMLTextInfosetInputter, we create an InputStreamReader,
but we do not specify an encoding. This means the Java "file.encoding" system
property will be used to decode this XML. So on machines where that property
isn't UTF-8 (e.g. Windows), this can result in UTF-8 data in the XML not
decoded correctly, which leads to incorrect unparsed data.
I believe Woodstox has the ability to inspect XML and determine the encoding
based on the preamble, so we should just take advantage of that. So we should
change the XMLTextInfosetInputter to accept an InputStream in the constructor
instead of a Reader, and deprecate the Reader constructor.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)