stevedlawrence commented on code in PR #799:
URL: https://github.com/apache/daffodil/pull/799#discussion_r892397448
##########
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala:
##########
@@ -2364,6 +2358,53 @@ sealed abstract class DocumentPart(part: Node, parent:
Document) {
}
}.trim.toUpperCase()
+ private lazy val doubleForwardPattern = "//.*".r
+ private lazy val openClosePattern = "/[*](.|\n)*?[*]/".r
+
+ //Accessor for doubleForwardPattern regex
+ def getDoubleForwardPattern() : scala.util.matching.Regex = {
+ return doubleForwardPattern
+ }
+
+ //Accessor for openClosePattern regex
+ def getOpenClosePattern() : scala.util.matching.Regex = {
+ return openClosePattern
+ }
+
+ /*
+ * Allow "//" and "/* */" (inline) to act as comments.
+ * Any valid XML characters not explicitly allowed are also considered
comments and are removed.
+ */
+ def canonicalizeData(validCharactersSet : String, userData : String) :
String = {
+ val noWarnCharsSet = "|()[]."
+ var doWarning: Boolean = false
+ Assert.invariant(!userData.contains('\r')) // \r should not exist in
userData
+
+ //Remove the comments (//) and (/* */)
+ val noCommentsForward = getDoubleForwardPattern().replaceAllIn(userData,"")
+ val noCommentsBothFormats =
getOpenClosePattern().replaceAllIn(noCommentsForward,"")
+
+ //Throw exception if /* or */ still found. This means user input was not
formatted correctly.
+ if(noCommentsBothFormats.contains("/*") ||
noCommentsBothFormats.contains("*/")){
+ throw TDMLException("Improper formatting of /* */ style comment", None)
+ }
+
+ //Split on new line and white spaces - concatenate after
+ val noCommentsString = noCommentsBothFormats.split("\\s").mkString
+
+ //Check value of the characters, if invalid character found create log and
skip over it
+ val validData = noCommentsString.map{ch =>
+ if(validCharactersSet.contains(ch)) ch
+ else { if(!noWarnCharsSet.contains(ch)) doWarning=true; ""}
+ }
Review Comment:
This doesn't feel very scala-y, especially with the mutable doWarning.
There's probably a handful of ways to do this, but one approach is something
like this:
```scala
val noWarns = noCommentsString.filter { !noWarnChars.contains(_) } // remove
ignored characters
val hasInvalid = noWarns.exists { !validCharacters.contains(_) } // test if
invalid characters exist
if (hasInvalid) Logger.log.warn(...)
val validData = noWarns.filter { validCharacters.contains(_) } // remove all
invalid characters for backwards compatability
validData
```
This is a bit inefficient since it looks for scans the string multiple
times, but I think it's a bit easier to read, and I doubt the performance would
be affected here very much.
Also note that you don't need a "return" statement, those are not really
used in scala. Whatever is the last line in a function is what is returned, so
you would just do `validData` as the last line. Also note that validData is
already a string, so you don't need to call `mkString`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]