mbeckerle commented on PR #1244: URL: https://github.com/apache/daffodil/pull/1244#issuecomment-2122869387
> Does Daffodil ever really need the CRL -> LF canonicalization? If CR is part of an actual field and is not used as a delimiter or something, it seems we should never do this lossy mapping. Maybe we should just always disable CR -> LF canonicalization and convert CR to PUA. It makes infosets messier, but enures we don't lose data. I agree. We should not, in my view, ever convert CR to LF. We should preserve it. I think we could do better than to convert it to the PUA though. See last comment on [DAFFODIL-1559](https://issues.apache.org/jira/browse/DAFFODIL-1559). I think we could map CRLF to U+202B + LF, and isolated CR to NEL(U+2028), and back for unparsing. This would be more palatable for such data as the data would not look polluted by wierd glyphs that get assigned to the PUA characters. No matter what we need to supply several variations here, so we need to settle on the scheme for how users parameterize the InfosetOutputters and Inputters needs to be agreed upon. The default scheme needs to be exactly what we do today. Maybe Daffodil v4.0.0 can change the default behavior for XML mappings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
