mbeckerle commented on a change in pull request #640:
URL: https://github.com/apache/daffodil/pull/640#discussion_r713201269



##########
File path: 
daffodil-lib/src/main/scala/org/apache/daffodil/cookers/EntityReplacer.scala
##########
@@ -128,6 +128,25 @@ final class EntityReplacer {
   def hasHexCodePoint(input: String): Boolean = isMatched(input, hexPattern)
   def hasByteCodePoint(input: String): Boolean = isMatched(input, bytePattern)
 
+  private def replaceEntityWithChar(input: String, entity: String, newChar: 
Char): String = {
+    val replacement =
+      if (newChar == '%') {
+        // Some character entities are not replaced in this EntityReplacer,
+        // such as double percents or character classes (NL, WSP, etc.). If
+        // this character entity results in a percent character (e.g. %#x25;),
+        // we must replace it with an escaped percent to be handled later.
+        "%%"
+      } else {
+        // This character might mean something special to the replaceAll method
+        // we are about to use (e.g. a dollar sign for regex group references).
+        // To be safe, call quoteReplaement which escapes any characters that
+        // mean something special in a replacement string so they are replaced
+        // with the literal value.
+        Matcher.quoteReplacement(newChar.toString)

Review comment:
       I had to look this up. 
   
   So replaceString actually interprets various characters in the replacement 
text, as having significance regex-matcher-wise. I.e., you can put back-refs to 
capture groups in the replacement text, and they are supposed to get 
substituted?
   
   Bizarre. I'm sure it came in handy somewhere, but it's very weird behavior 
to me. The regex syntax and any significant regex characters should *only* be 
interpreted in the argument that IS a regex, to me. Not in the input, and not 
in the replacement. 
   
   Suggests to me I've never used replaceAll properly ever, and all calls to it 
in Daffodil should be revisited for this same sort of error. It would seem to 
me I need to always use str.replaceAll(regex, Matcher.quoteReplacement(repl)) 
every time to get what I expect. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to