stevedlawrence commented on code in PR #799:
URL: https://github.com/apache/daffodil/pull/799#discussion_r892397448


##########
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala:
##########
@@ -2364,6 +2358,53 @@ sealed abstract class DocumentPart(part: Node, parent: 
Document) {
     }
   }.trim.toUpperCase()
 
+  private lazy val doubleForwardPattern = "//.*".r
+  private lazy val openClosePattern = "/[*](.|\n)*?[*]/".r
+
+  //Accessor for doubleForwardPattern regex
+  def getDoubleForwardPattern() : scala.util.matching.Regex = {
+    return doubleForwardPattern
+  }
+
+  //Accessor for openClosePattern regex
+  def getOpenClosePattern() : scala.util.matching.Regex = {
+    return openClosePattern
+  }
+
+  /*
+  * Allow "//" and "/* */" (inline) to act as comments.
+  * Any valid XML characters not explicitly allowed are also considered 
comments and are removed.
+  */
+  def canonicalizeData(validCharactersSet : String, userData : String) : 
String = {
+    val noWarnCharsSet = "|()[]."
+    var doWarning: Boolean = false
+    Assert.invariant(!userData.contains('\r'))      // \r should not exist in 
userData
+
+    //Remove the comments (//) and (/* */)
+    val noCommentsForward = getDoubleForwardPattern().replaceAllIn(userData,"")
+    val noCommentsBothFormats = 
getOpenClosePattern().replaceAllIn(noCommentsForward,"")
+
+    //Throw exception if /* or */ still found. This means user input was not 
formatted correctly.
+    if(noCommentsBothFormats.contains("/*") || 
noCommentsBothFormats.contains("*/")){
+      throw TDMLException("Improper formatting of /* */ style comment", None)
+    }
+
+    //Split on new line and white spaces - concatenate after
+    val noCommentsString = noCommentsBothFormats.split("\\s").mkString
+
+    //Check value of the characters, if invalid character found create log and 
skip over it
+    val validData = noCommentsString.map{ch =>
+      if(validCharactersSet.contains(ch)) ch
+      else { if(!noWarnCharsSet.contains(ch)) doWarning=true; ""}
+    }

Review Comment:
   This doesn't feel very scala-y, especially with the mutable doWarning. 
There's probably a handful of ways to do this, but one approach is something 
like this:
   
   ```scala
   val noWarns = noCommentsString.filter { !noWarnChars.contains(_) } // remove 
ignored characters
   val hasInvalid = noWarns.exists { !validCharacters.contains(_) } // test if 
invalid characters exist
   if (hasInvalid) Logger.log.warn(...)
   val validData = noWarns.filter { validCharacters.contains(_) } // remove all 
invalid characters for backwards compatability
   validData
   ```
   
   This is a bit inefficient since it looks for scans the string multiple 
times, but I think it's a bit easier to read, and I doubt the performance would 
be affected here very much.
   
   Also note that you don't need a "return" statement, those are not really 
used in scala. Whatever is the last line in a function is what is returned, so 
you would just do `validData` as the last line. Also note that validData is 
already a string, so you don't need to call `mkString`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to