mbeckerle commented on PR #1244:
URL: https://github.com/apache/daffodil/pull/1244#issuecomment-2122869387

   > Does Daffodil ever really need the CRL -> LF canonicalization? If CR is 
part of an actual field and is not used as a delimiter or something, it seems 
we should never do this lossy mapping. Maybe we should just always disable CR 
-> LF canonicalization and convert CR to PUA. It makes infosets messier, but 
enures we don't lose data.
   
   I agree. We should not, in my view, ever convert CR to LF. We should 
preserve it. I think we could do better than to convert it to the PUA though. 
   
   See last comment on 
[DAFFODIL-1559](https://issues.apache.org/jira/browse/DAFFODIL-1559). I think 
we could map CRLF to U+202B + LF, and isolated CR to NEL(U+2028), and back for 
unparsing. This would be more palatable for such data as the data would not 
look polluted by wierd glyphs that get assigned to the PUA characters. 
   
   No matter what we need to supply several variations here, so we need to 
settle on the scheme for how users parameterize the InfosetOutputters and 
Inputters needs to be agreed upon. The default scheme needs to be exactly what 
we do today. Maybe Daffodil v4.0.0 can change the default behavior for XML 
mappings. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to