[
https://issues.apache.org/jira/browse/DAFFODIL-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814406#comment-16814406
]
Steve Lawrence edited comment on DAFFODIL-1559 at 4/10/19 12:23 PM:
--------------------------------------------------------------------
Rather than using the PUA, it might make sense to use character entities which
are a little more user friendly. From what I can tell, both 
{{}} and

 are legal in XML and should not be replaced my an XML parser.
I've confirmed that this works with {{xmllint --format foo.xml:}}
* When foo.xml contains {{CR}} or {{CRLF}}, the resulting XML only contains an
{{LF}}. This matches the Daffodil behavior
* When foo.xml contains 
{{ }} , the resulting XML contains

{{ }}.
* When foo.xml contains 
{{ }}, the resulting XML contains 
{{
}}.
So xmllint does the right thing with {{CR}}, but interestingly always converts
the character entity 
 to use the hex version. If character entities xD
or 13 weren't allowed, I would expect it to match the first behavior.
was (Author: slawrence):
Rather than using the PUA, it might make sense to use character entities which
are a little more user friendly. From what I can tell, both {{ }} and
{{
}} are legal in XML and should not be replaced my an XML parser.
I've confirmed that this works with {{xmllint --format foo.xml:}}
* When foo.xml contains {{CR}} or {{CRLF}}, the resulting XML only contains an
{{LF}}. This matches the Daffodil behavior
* When foo.xml contains {{
}} , the resulting XML contains {{
}}.
* When foo.xml contains {{ }}, the resulting XML contains {{
}}.
So xmllint does the right thing with {{CR}}, but interestingly always converts
the character entity to use the hex version. If character entities xD or 13
eren't allowed, I would expect it to match the first behavior.
> Add option to disable CRLF to LF XML canonicalization
> -----------------------------------------------------
>
> Key: DAFFODIL-1559
> URL: https://issues.apache.org/jira/browse/DAFFODIL-1559
> Project: Daffodil
> Issue Type: Bug
> Components: API
> Reporter: Steve Lawrence
> Priority: Major
> Labels: beginner
>
> See the review or more details. The short of it is that when converting parse
> results to XML, we convert CR to LF, and we convert CRLF to LF. This means
> that we lose the information that the data used to contain CRLF. This is
> similar to how we lose that information with delimiters if someone uses NL,
> but it's slightly different since it is actual data. However, it's most user
> friendly and consistent with other XML technologies to have this behavior.
> Perhaps we need an option to convert CRLF to somewhere in PUA so that this
> information can be maintained if someone needs it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)