Steve Lawrence created DAFFODIL-2128:
----------------------------------------

             Summary: XML preamble encoding ignored when CLI unparsing with 
"xml" infoset type 
                 Key: DAFFODIL-2128
                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2128
             Project: Daffodil
          Issue Type: Bug
          Components: CLI
    Affects Versions: 2.3.0
            Reporter: Steve Lawrence
             Fix For: 2.4.0


When using the CLI to unparse XML using the "xml" infoset type, we have the 
following code:
{code:scala}
case "xml" => {
  val rdr = new BufferedReader(new InputStreamReader(new 
ByteArrayInputStream(anyRef.asInstanceOf[Array[Byte]])))
  new XMLTextInfosetInputter(rdr)
}
{code}
In order to create the XMLTextInfosetInputter, we create an InputStreamReader, 
but we do not specify an encoding. This means the Java "file.encoding" system 
property will be used to decode this XML. So on machines where that property 
isn't UTF-8 (e.g. Windows), this can result in UTF-8 data in the XML not 
decoded correctly, which leads to incorrect unparsed data.

I believe Woodstox has the ability to inspect XML and determine the encoding 
based on the preamble, so we should just take advantage of that. So we should 
change the XMLTextInfosetInputter to accept an InputStream in the constructor 
instead of a Reader, and deprecate the Reader constructor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to