stevedlawrence commented on code in PR #799:
URL: https://github.com/apache/daffodil/pull/799#discussion_r892332974


##########
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala:
##########
@@ -2358,30 +2358,48 @@ sealed abstract class DocumentPart(part: Node, parent: 
Document) {
     }
   }.trim.toUpperCase()
 
+  private lazy val doubleForwardPattern = "//.*".r
+  private lazy val openClosePattern = "/[*](.|\n)*?[*]/".r
+
+  //Accessor for doubleForwardPattern regex

Review Comment:
   Yes, that is a bit similar. I think the difference is that these patterns 
never really changed. They don't depend on any TDML files at all--they are 
essentially constant. Although the function uses those patterns, the result of 
the canonicalizeData function really is just a function on two inputs, the 
valid charset and the user data. We could even move those patterns into the 
function itself to make that more clear.
   
   Note, an alternative approach that might make this distinction more clear is 
if this function was a utility that doesn't know anything about where it's 
being called. In that case, these regexs never change and would just live along 
with that utility. But the data to canonicalize depends on the caller and so 
that should be passed in. For example, if we had something like this:
   
   ``` scala
   object Misc {
     private lazy val doubleForwardPattern = ...
     private lazy val openClosePattern = ...
     private lazy val noWarnChars = ...
   
     def canonicalizeTestData(validCharacterSet: String, userData: String): 
String = {
       ...
     }
   }
   ```
   
   So the patterns and no warn chars are all defined in this object and never 
change. The canonicalization function will use those to canonicalize, but 
really it's a just function of the two input parameters.
   
   And this would be called like:
   
   ```scala
   val bitsDigits = Misc.canonicalizeTestData(validBits, partRawContent)
   ```
   
   Note that something like this has a benefit that the pattern regexs are only 
compiled once, rather than every time we create a new DocumentPart, so it's 
going to be a bit more efficient. Though I'm not sure how much that would 
really matter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to