[ 
https://issues.apache.org/jira/browse/DAFFODIL-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Lawrence updated DAFFODIL-3074:
-------------------------------------
    Labels: beginner  (was: )

> stringAsXML always creates empty elements
> -----------------------------------------
>
>                 Key: DAFFODIL-3074
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-3074
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Back End
>            Reporter: Steve Lawrence
>            Priority: Major
>              Labels: beginner
>
> When using the stringAsXml feature, if the input XML contains something like 
> <foo></foo>, this will be written to the infoset as <foo />, i.e. a 
> self-closing empty element. Although stringAsXml does not guarantee that it 
> will not change the XML in some way, where possible we try to keep the XML 
> exactly the same. It would be nice if we could maintain empty vs non-empty 
> elements as well.
> The issue seems to be that when the XMLStreamReader sees START_ELEMENT and 
> END_ELEMENT events, we call writeStartElement() and writeEndElement() 
> functions respectively:
> https://github.com/apache/daffodil/blob/main/daffodil-core/src/main/scala/org/apache/daffodil/runtime1/infoset/XMLTextInfosetInputter.scala#L145-L159
> And it seems Woodstox auto collapses elements with no content into empty 
> elements (e.g. <foo />).
> The normal XMLStreamReader and XMLStreamWriter API's do not seem to have a 
> way to really control this--the XMLStreamReader sees the same START/END 
> events regardless if the element is empty or not. And the XMLStreamWriter 
> does not have a way to specify if an element should be written as empty or 
> not. 
> But I think the Woodstox XMLStreamReader2 and XMLStreamWriter2 API's do 
> provide enough information. Based on the API and skimming code, I think these 
> are the changes that need to be made, though they haven't been tested:
> 1. In the XMLTextInfosetInputter and XMLTextInfosetOutputter, when we call 
> createXMLStreamReader or createXMLStreamWriter, we cast the result to the 
> Woodstox XMLStreamReader2 and XMLStreamWriter2 interfaces. This gives us 
> access to the additional API functions we need.
> 2. Modify the writeXMLStreamEvent in XMLTextInfosetInputter so that the 
> START_ELEMENT logic is something like this:
> {code:scala}
> if (xsr.isEmptyElement()) {
>   xsw.writeEmptyElement(...)
> } else {
>   xsw.writeStartElement(...)
> }
> ... // existing namespace/attribute code
> if (xsr.isEmptyElement()) {
>   xsr.next() // skip the END_ELEMENT event since writeEmptyElement ends the 
> element
> }
> {code}
> So we call writeEmptyElement or writeStartElement depending on if the the 
> element is empty or not. And if the element is empty, then we also call 
> xsr.next() to skip the END_ELEMENT event to avoid calling writeEndElement for 
> it.
> 3. Modify  the END_ELEMENT logic so it calls xsw.writeFullEndElement() 
> instead of xsw.writeEndElement(). This forces it to write both the opening 
> and closing tag.
> So now END_ELEMENT ensures we always write an opening and closing tag, and 
> START_ELEMENT handles the case where the element is empty and ensures 
> END_ELEMENT is skipped.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to