Mike Beckerle created DAFFODIL-2708:
---------------------------------------
Summary: XML String feature in XMLText Infoset Inputter/Outputter
Key: DAFFODIL-2708
URL: https://issues.apache.org/jira/browse/DAFFODIL-2708
Project: Daffodil
Issue Type: New Feature
Components: Back End
Affects Versions: 3.3.0
Reporter: Mike Beckerle
Several users need a specific feature.
The required feature is needed for XML output where a string that is known to
itself be a string of XML text can be embedded in the XML output from parsing
without escaping it.
Symmetrically, for unparsing, a string element identified as XML text should
result in a series of XML "events" being absorbed and converted to a string
which is the ultimate value of the string element.
Note that for any given popular data format (XML, JSON, etc.) where Daffodil
supports output of infosets in that representation, the same issue can arise
where data contains a string which is already in that representation and users
desire for it to be directly embedded, not escaped as a string.
For the purposes of this ticket, let's focus on XML only. Other
representations could be added subsequently.
Notes:
1) on canonicalization - I see know way to avoid strong canonicalization of
this XML. If byte for byte preservation of characters such as character
entities like   (a space) or CRLFs is needed, there's just no way to
do that(at least that I know of).
2) XML initial slug line/processing instruction - a way to strip this if
present in the XML string may be needed. An option to generate it as part of
the string when unparsing may also be needed.
3) An ASCII-only or iso-8859-1 only option may be needed where any character
outside of those and standard whitespaces is converted to a character entity.
4) This breaks the idea that the DFDL schema IS the XML Schema of the output
Infoset XML from parsing. Rather, to create an XML schema for the resulting
data, one would have to replace the DFDL element declaration for the string to
an appropriate DFDL element reference to the schema of the XML being embedded
at that place.
It is highly recommended that such a DFDL schema contain comments describing
this exact element reference - namespace + name, that the XML String
corresponds to.
w.r.t. implementation...
There's some pseudocode for in the "Example Implementation" section of
the Runtime Properties proposal:
[https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Runtime+Properties#Proposal%3ARuntimeProperties-ExampleImplementation]
This pseudocode uses the ScalaXML InfosetInputter/Outputter as a base for
simplicity, but we should base the actual one on the
XMLTextInfosetInputter/Outputter
since that's what most people use.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)