[ 
https://issues.apache.org/jira/browse/DAFFODIL-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814411#comment-16814411
 ] 

Steve Lawrence commented on DAFFODIL-1559:
------------------------------------------

And actually, maybe &#...; are always allowed in XML? Perhaps instead of 
converting to the PUA we should convert XML disallowed bytes to character 
entities? This makes it much more obvious what the original bytes were without 
needing to do our complex PUA conversion.

> Add option to disable CRLF to LF XML canonicalization
> -----------------------------------------------------
>
>                 Key: DAFFODIL-1559
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-1559
>             Project: Daffodil
>          Issue Type: Bug
>          Components: API
>            Reporter: Steve Lawrence
>            Priority: Major
>              Labels: beginner
>
> See the review or more details. The short of it is that when converting parse 
> results to XML, we convert CR to LF, and we convert CRLF to LF. This means 
> that we lose the information that the data used to contain CRLF. This is 
> similar to how we lose that information with delimiters if someone uses NL, 
> but it's slightly different since it is actual data. However, it's most user 
> friendly and consistent with other XML technologies to have this behavior.
> Perhaps we need an option to convert CRLF to somewhere in PUA so that this 
> information can be maintained if someone needs it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to