Mike Beckerle created DAFFODIL-2708:
---------------------------------------

             Summary: XML String feature in XMLText Infoset Inputter/Outputter
                 Key: DAFFODIL-2708
                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2708
             Project: Daffodil
          Issue Type: New Feature
          Components: Back End
    Affects Versions: 3.3.0
            Reporter: Mike Beckerle


Several users need a specific feature.

The required feature is needed for XML output where a string that is known to 
itself be a string of XML text can be embedded in the XML output from parsing 
without escaping it.

Symmetrically, for unparsing, a string element identified as XML text should 
result in a series of XML "events" being absorbed and converted to a string 
which is the ultimate value of the string element. 

Note that for any given popular data format (XML, JSON, etc.) where Daffodil 
supports output of infosets in that representation, the same issue can arise 
where data contains a string which is already in that representation and users 
desire for it to be directly embedded, not escaped as a string. 

For the  purposes of this ticket, let's focus on XML only. Other 
representations could be added subsequently. 

Notes:

1) on canonicalization - I see know way to avoid strong canonicalization of 
this XML. If byte for byte preservation of characters such as character 
entities like   (a space) or CRLFs is needed, there's just no way to 
do that(at least that I know of). 

2) XML initial slug line/processing instruction - a way to strip this if 
present in the XML string may be needed. An option to generate it as part of 
the string when unparsing may also be needed. 

3) An ASCII-only or iso-8859-1 only option may be needed where any character 
outside of those and standard whitespaces is converted to a character entity. 

4) This breaks the idea that the DFDL schema IS the XML Schema of the output 
Infoset XML from parsing. Rather, to create an XML schema for the resulting 
data, one would have to replace the DFDL element declaration for the string to 
an appropriate DFDL element reference to the schema of the XML being embedded 
at that place. 

It is highly recommended that such a DFDL schema contain comments describing 
this exact element reference - namespace + name, that the XML String 
corresponds to. 

w.r.t. implementation...

There's some pseudocode for in the "Example Implementation" section of
the Runtime Properties proposal:

[https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Runtime+Properties#Proposal%3ARuntimeProperties-ExampleImplementation]

This pseudocode uses the ScalaXML InfosetInputter/Outputter as a base for 
simplicity, but we should base the actual one on the 
XMLTextInfosetInputter/Outputter
since that's what most people use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to