[jira] [Comment Edited] (DAFFODIL-1559) Add option to disable CRLF to LF XML canonicalization

Mike Beckerle (Jira) Thu, 16 Jun 2022 07:27:06 -0700


    [ 
https://issues.apache.org/jira/browse/DAFFODIL-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815478#comment-16815478
 ]


Mike Beckerle edited comment on DAFFODIL-1559 at 6/16/22 2:26 PM:
------------------------------------------------------------------

All reasonable arguments for doing PUA. But from a usability perspective, I 
would think an entity would be eaiser to use.

I would think people familar with XML would rather see:
{code:java}
<foo>This contains&#D;a CR</foo>
{code}
instead of this:
{code:java}
<foo>This containsa CR</foo>
{code}
It's like saying in JSON, would you rather have "\r" in the data or "\uE00D"? 
Daffodil can treat them both the same, but people familar with JSON would 
rather see the \r.

Maybe the tunable to control the behavior just allows multiple options and we 
let the user decide how CR's are output?
 # Convert CR and CRLF to LF like we do know
 # Convert CR to PUA (i.e. \uE00D)
 # Convert CR to `&amp;#x0D;`


was (Author: slawrence):
All reasonable arguments for doing PUA. But from a usability perspective, I 
would think an entity would be eaiser to use.

I would think people familar with XML would rather see:
{code:java}
<foo>This contains&#D;a CR</foo>
{code}
instead of this:
{code:java}
<foo>This containsa CR</foo>
{code}
It's like saying in JSON, would you rather have "\r" in the data or "\uE00D"? 
Daffodil can treat them both the same, but people familar with JSON would 
rather see the \r.

Maybe the tunable to control the behavior just allows multiple options and we 
let the user decide how CR's are output?
 # Convert CR and CRLF to LF like we do know
 # Convert CR to PUA (i.e. \uE00D)
 # Convert CR to `&#x0D;`

> Add option to disable CRLF to LF XML canonicalization
> -----------------------------------------------------
>
>                 Key: DAFFODIL-1559
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-1559
>             Project: Daffodil
>          Issue Type: Bug
>          Components: API
>            Reporter: Steve Lawrence
>            Priority: Major
>              Labels: beginner
>
> See the review or more details. The short of it is that when converting parse 
> results to XML, we convert CR to LF, and we convert CRLF to LF. This means 
> that we lose the information that the data used to contain CRLF. This is 
> similar to how we lose that information with delimiters if someone uses NL, 
> but it's slightly different since it is actual data. However, it's most user 
> friendly and consistent with other XML technologies to have this behavior.
> Perhaps we need an option to convert CRLF to somewhere in PUA so that this 
> information can be maintained if someone needs it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Comment Edited] (DAFFODIL-1559) Add option to disable CRLF to LF XML canonicalization

Reply via email to