mbeckerle commented on code in PR #799:
URL: https://github.com/apache/daffodil/pull/799#discussion_r899381289
##########
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala:
##########
@@ -2360,6 +2354,45 @@ sealed abstract class DocumentPart(part: Node, parent:
Document) {
}
+object CanonData {
+ private lazy val doubleForwardPattern = "//.*".r
+ private lazy val openClosePattern = "(?s)/[*].*?[*]/".r
+ private lazy val noWarnCharsSet = "|()[].Xx \n\r"
+
+ /*
+ * Allow "//" and "/* */" to act as comments.
+ * Any valid XML characters not explicitly allowed are also considered
comments and are removed.
+ */
+ def canonicalizeData(validCharactersSet: String, userData: String): String =
{
+ var doWarning: Boolean = false
+ Assert.invariant(!userData.contains('\r')) // \r should not exist in
userData
Review Comment:
Confirming you can put a CR in XML documents using . It is CR and CRLF
as whitespace characters that get normalized into LF.
So I believe when an Infoset is created from XML text (as by a TDML test
that uses an XML infoset) it can contain CR via ` ` The Daffodil
InfosetInputter does NOT convert these to LF.
Interestingly, the daffodil XML infoset outputters used by parsing, those
create XML and always they convert CRLF and CR to LF. So you cannot get a CR in
an XML infoset from the daffodil parser.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]