stevedlawrence commented on code in PR #1250:
URL: https://github.com/apache/daffodil/pull/1250#discussion_r1621080114


##########
daffodil-tdml-processor/src/main/scala/org/apache/daffodil/processor/tdml/TDMLInfosetInputter.scala:
##########
@@ -81,7 +82,12 @@ class TDMLInfosetInputter(
         case _ if (firstVersion eq null) => ""
         // the json infoset inputter maintains CRLF/CR, but XML converts 
CRLF/CR to
         // LF. So if this is Json, then we want the CRLF/CR converted to LF
-        case jsonii: JsonInfosetInputter => 
firstVersion.replaceAll("(\r\n|\r)", "\n")
+        // Furthermore, the json infoset inputter maintains Unicode PUA 
characters, the XML
+        // infoset inputters have to remap some characters from the PUA back 
to the
+        // XML-illegal characters. For sake of comparison with other XML 
infoset inputters
+        // we also remap AS IF it were XML.
+        case jsonii: JsonInfosetInputter =>
+          
XMLUtils.remapPUAToXMLIllegalCharacters(firstVersion.replaceAll("(\r\n|\r)", 
"\n"))

Review Comment:
   Can we remove the replaceAll? I think remapPUAToXMLIllegalCharacters also 
remaps CRLF/CR to LF?
   
   Which makes sense. XML infoset outputters remap characters so we need to do 
the same for JSON for TDML tests to verify it's all the same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to