[ 
https://issues.apache.org/jira/browse/DAFFODIL-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561188#comment-17561188
 ] 

Mike Beckerle commented on DAFFODIL-1959:
-----------------------------------------

EXI represents the XML Infoset, but there are things we depend upon in XML Text 
representation that are not part of that infoset. 

For example, character entities used to preserve CR characters and CDATA 
bracketing used to preserve whitespace. 

The EXI spec, in this section [https://www.w3.org/TR/exi/#InfosetMapping] says:
{quote}As has been prescribed in section [*2. Design 
Principles*|https://www.w3.org/TR/exi/#principles], EXI is designed to be 
compatible with the XML Information Set. While this approach is both legitimate 
and practical for designing a succinct format interoperable with XML family of 
specifications and technologies, it entails that some lexical constructs of XML 
not recognized by the XML Information Set are not represented by EXI, either. 
Examples of such unrepresented lexical constructs of XML include white space 
outside the document element, white space within tags, the kind of quotation 
marks (single or double) used to quote attribute values, and the boundaries of 
CDATA marked sections.
{quote}
Converting Daffodil-created EXI data back to XML text needs to take into 
account the escaping of whitespace so as to preserve the original data. 

That is, when simple text elements are output from EXI as XML Text, some 
escaping is needed to render some characters as character entities (E.g, CR 
becomes "
"). To avoid whitespace insertion/removal issues one must escape 
all leading and trailing whitespace, and between text one must escape any 
whitespace other than single spaces or LF.

Basically, when Daffodil converts a Daffodil-infoset string into an XML Text 
string, it must do various things to insure exact value preservation. These 
same things must be done when EXI converts a string into XML Text. Otherwise 
the XML Text fungibility of whitespace can cause loss of information. 

See DAFFODIL-2346

 

> EXIficient Inputter and outputter for XML EXI representation
> ------------------------------------------------------------
>
>                 Key: DAFFODIL-1959
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-1959
>             Project: Daffodil
>          Issue Type: New Feature
>          Components: Back End
>    Affects Versions: 2.1.0
>            Reporter: Mike Beckerle
>            Assignee: Josh Adams
>            Priority: Major
>              Labels: beginner
>             Fix For: 3.4.0
>
>
> Create EXI (dense binary XML) representation using an EXI-specific 
> InfosetInputter and InfosetOutputter.
> These would be very similar to the XML InfosetInputter and Outputter - as EXI 
> libraries such as EXIfficient https://github.com/EXIficient/exificient 
> already have SAX/StAX, APIs, etc. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to