Steve Lawrence created DAFFODIL-3074:
----------------------------------------

             Summary: stringAsXML always creates empty elements
                 Key: DAFFODIL-3074
                 URL: https://issues.apache.org/jira/browse/DAFFODIL-3074
             Project: Daffodil
          Issue Type: Bug
          Components: Back End
            Reporter: Steve Lawrence


When using the stringAsXml feature, if the input XML contains something like 
<foo></foo>, this will be written to the infoset as <foo />, i.e. a 
self-closing empty element. Although stringAsXml does not guarantee that it 
will not change the XML in some way, where possible we try to keep the XML 
exactly the same. It would be nice if we could maintain empty vs non-empty 
elements as well.

The issue seems to be that when the XMLStreamReader sees START_ELEMENT and 
END_ELEMENT events, we call writeStartElement() and writeEndElement() functions 
respectively:

https://github.com/apache/daffodil/blob/main/daffodil-core/src/main/scala/org/apache/daffodil/runtime1/infoset/XMLTextInfosetInputter.scala#L145-L159

And it seems Woodstox auto collapses elements with no content into empty 
elements (e.g. <foo />).

The normal XMLStreamReader and XMLStreamWriter API's do not seem to have a way 
to really control this--the XMLStreamReader sees the same START/END events 
regardless if the element is empty or not. And the XMLStreamWriter does not 
have a way to specify if an element should be written as empty or not. 

But I think the Woodstox XMLStreamReader2 and XMLStreamWriter2 API's do provide 
enough information. Based on the API and skimming code, I think these are the 
changes that need to be made, though they haven't been tested:

1. In the XMLTextInfosetInputter and XMLTextInfosetOutputter, when we call 
createXMLStreamReader or createXMLStreamWriter, we cast the result to the 
Woodstox XMLStreamReader2 and XMLStreamWriter2 interfaces. This gives us access 
to the additional API functions we need.
2. Modify the writeXMLStreamEvent in XMLTextInfosetInputter so that the 
START_ELEMENT logic is something like this:
{code:scala}
if (xsr.isEmptyElement()) {
  xsw.writeEmptyElement(...)
} else {
  xsw.writeStartElement(...)
}
... // existing namespace/attribute code
if (xsr.isEmptyElement()) {
  xsr.next() // skip the END_ELEMENT event since writeEmptyElement ends the 
element
}
{code}
So we call writeEmptyElement or writeStartElement depending on if the the 
element is empty or not. And if the element is empty, then we also call 
xsr.next() to skip the END_ELEMENT event to avoid calling writeEndElement for 
it.
3. Modify  the END_ELEMENT logic so it calls xsw.writeFullEndElement() instead 
of xsw.writeEndElement(). This forces it to write both the opening and closing 
tag.

So now END_ELEMENT ensures we always write an opening and closing tag, and 
START_ELEMENT handles the case where the element is empty and ensures 
END_ELEMENT is skipped.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to