stevedlawrence commented on code in PR #1247:
URL: https://github.com/apache/daffodil/pull/1247#discussion_r1618842941


##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/xml/XMLUtils.scala:
##########
@@ -153,11 +153,20 @@ object XMLUtils {
     list
   }
 
-  // FIXME: DAFFODIL-2883 - this needs checkForExistingPUA to be false so that 
data
-  //  which contains unicode PUA characters doesn't cause an SDE. Needs to be 
either
-  //  accepted or optionally cause a ParseError.
   private val remapXMLToPUA =
-    new RemapXMLIllegalCharToPUA(checkForExistingPUA = true, replaceCRWithLF = 
true)
+    new RemapXMLIllegalCharToPUA(
+      // To fix DAFFODIL-2883 - changed to tolerate existing PUA by default. 
It just
+      // ignores them. If they happen to be ones we use like U+E000 for NUL, 
then
+      // a round trip of the data will not preserve them.
+      //
+      // Note that fuzz testing that just permutes bytes along with unicode 
charset data
+      // can run into this fairly easily. Since this remap is called in the
+      // InfosetOutputter converting the DFDL infoset to XML, you cannot, at 
that point,
+      // fail in any useful way.
+      //
+      checkForExistingPUA = false,
+      replaceCRWithLF = true
+    )
 
   def remapXMLIllegalCharactersToPUA(s: String): String = 
remapXMLToPUA.remap(s)

Review Comment:
   The ticket mentions possibly having a switch to make this configurable. I 
think that remapping in infoset outputters should probably never care about 
existing PUA so this change makes alot of sense to me.
   
   But maybe a switch is wanted in some internal function (e.g. getSomeString) 
that checks for PUA characters and creates a PE/SDE? Should we create a ticket 
for that, or maybe it's not necessary at all and we should just always allow 
PUA characters with the caveat that they probably won't round trip. In which 
case maybe a new ticket should be added to remove the checkForExistingPUA 
feature? I'm not sure if there's overhead in that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to