[ 
https://issues.apache.org/jira/browse/DAFFODIL-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555143#comment-17555143
 ] 

Mike Beckerle commented on DAFFODIL-1559:
-----------------------------------------

Another option for the PUA characters is this. If you create a PUA character to 
represent an XML-illegal character, then you ALSO convert that PUA character 
into an XML charater entity; hence, an ASCII NUL, which becomes U+E000 PUA 
character would be represented not by the character  (which probably 
displays as a box or something), but literally by the string "". So 
that it's apparent there is a character there.

Yet another option: Unicode has a set of control-character pictures. (For NUL, 
Ctrl-A, Ctrl-B, these are  ␀ ␁ ␂ )

For the control characters that have control pictures, our "escaping" of these 
illegal characters could map them into these control picture characters (and 
back for unparsing). 

This would have to be, again, an option. 

> Add option to disable CRLF to LF XML canonicalization
> -----------------------------------------------------
>
>                 Key: DAFFODIL-1559
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-1559
>             Project: Daffodil
>          Issue Type: Bug
>          Components: API
>            Reporter: Steve Lawrence
>            Priority: Major
>              Labels: beginner
>
> See the review or more details. The short of it is that when converting parse 
> results to XML, we convert CR to LF, and we convert CRLF to LF. This means 
> that we lose the information that the data used to contain CRLF. This is 
> similar to how we lose that information with delimiters if someone uses NL, 
> but it's slightly different since it is actual data. However, it's most user 
> friendly and consistent with other XML technologies to have this behavior.
> Perhaps we need an option to convert CRLF to somewhere in PUA so that this 
> information can be maintained if someone needs it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to