Steve Lawrence created DAFFODIL-3074:
----------------------------------------
Summary: stringAsXML always creates empty elements
Key: DAFFODIL-3074
URL: https://issues.apache.org/jira/browse/DAFFODIL-3074
Project: Daffodil
Issue Type: Bug
Components: Back End
Reporter: Steve Lawrence
When using the stringAsXml feature, if the input XML contains something like
<foo></foo>, this will be written to the infoset as <foo />, i.e. a
self-closing empty element. Although stringAsXml does not guarantee that it
will not change the XML in some way, where possible we try to keep the XML
exactly the same. It would be nice if we could maintain empty vs non-empty
elements as well.
The issue seems to be that when the XMLStreamReader sees START_ELEMENT and
END_ELEMENT events, we call writeStartElement() and writeEndElement() functions
respectively:
https://github.com/apache/daffodil/blob/main/daffodil-core/src/main/scala/org/apache/daffodil/runtime1/infoset/XMLTextInfosetInputter.scala#L145-L159
And it seems Woodstox auto collapses elements with no content into empty
elements (e.g. <foo />).
The normal XMLStreamReader and XMLStreamWriter API's do not seem to have a way
to really control this--the XMLStreamReader sees the same START/END events
regardless if the element is empty or not. And the XMLStreamWriter does not
have a way to specify if an element should be written as empty or not.
But I think the Woodstox XMLStreamReader2 and XMLStreamWriter2 API's do provide
enough information. Based on the API and skimming code, I think these are the
changes that need to be made, though they haven't been tested:
1. In the XMLTextInfosetInputter and XMLTextInfosetOutputter, when we call
createXMLStreamReader or createXMLStreamWriter, we cast the result to the
Woodstox XMLStreamReader2 and XMLStreamWriter2 interfaces. This gives us access
to the additional API functions we need.
2. Modify the writeXMLStreamEvent in XMLTextInfosetInputter so that the
START_ELEMENT logic is something like this:
{code:scala}
if (xsr.isEmptyElement()) {
xsw.writeEmptyElement(...)
} else {
xsw.writeStartElement(...)
}
... // existing namespace/attribute code
if (xsr.isEmptyElement()) {
xsr.next() // skip the END_ELEMENT event since writeEmptyElement ends the
element
}
{code}
So we call writeEmptyElement or writeStartElement depending on if the the
element is empty or not. And if the element is empty, then we also call
xsr.next() to skip the END_ELEMENT event to avoid calling writeEndElement for
it.
3. Modify the END_ELEMENT logic so it calls xsw.writeFullEndElement() instead
of xsw.writeEndElement(). This forces it to write both the opening and closing
tag.
So now END_ELEMENT ensures we always write an opening and closing tag, and
START_ELEMENT handles the case where the element is empty and ensures
END_ELEMENT is skipped.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)