mbeckerle commented on code in PR #799:
URL: https://github.com/apache/daffodil/pull/799#discussion_r899381289


##########
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala:
##########
@@ -2360,6 +2354,45 @@ sealed abstract class DocumentPart(part: Node, parent: 
Document) {
 
 }
 
+object CanonData {
+  private lazy val doubleForwardPattern = "//.*".r
+  private lazy val openClosePattern = "(?s)/[*].*?[*]/".r
+  private lazy val noWarnCharsSet = "|()[].Xx \n\r"
+
+  /*
+  * Allow "//" and "/* */" to act as comments.
+  * Any valid XML characters not explicitly allowed are also considered 
comments and are removed.
+  */
+  def canonicalizeData(validCharactersSet: String, userData: String): String = 
{
+    var doWarning: Boolean = false
+    Assert.invariant(!userData.contains('\r'))      // \r should not exist in 
userData

Review Comment:
   Confirming you can put a CR in XML documents using 
. It is CR and CRLF 
as whitespace characters that get normalized into LF. 
   
   So I believe when an Infoset is created from XML text (as by a TDML test 
that uses an XML infoset) it can contain CR via `
` The Daffodil 
InfosetInputter does NOT convert these to LF. 
   
   Interestingly, the daffodil XML infoset outputters used by parsing, those 
create XML and always they convert CRLF and CR to LF. So you cannot get a CR in 
an XML infoset from the daffodil parser. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to