Re: [PR] Added test to show preexisting PUA chars are preserved. [daffodil]

via GitHub Fri, 31 May 2024 07:33:03 -0700


mbeckerle commented on code in PR #1250:
URL: https://github.com/apache/daffodil/pull/1250#discussion_r1622514929



##########
daffodil-tdml-processor/src/main/scala/org/apache/daffodil/processor/tdml/TDMLInfosetInputter.scala:
##########
@@ -81,7 +82,12 @@ class TDMLInfosetInputter(
         case _ if (firstVersion eq null) => ""
         // the json infoset inputter maintains CRLF/CR, but XML converts 
CRLF/CR to
         // LF. So if this is Json, then we want the CRLF/CR converted to LF
-        case jsonii: JsonInfosetInputter => 
firstVersion.replaceAll("(\r\n|\r)", "\n")
+        // Furthermore, the json infoset inputter maintains Unicode PUA 
characters, the XML
+        // infoset inputters have to remap some characters from the PUA back 
to the
+        // XML-illegal characters. For sake of comparison with other XML 
infoset inputters
+        // we also remap AS IF it were XML.
+        case jsonii: JsonInfosetInputter =>
+          
XMLUtils.remapPUAToXMLIllegalCharacters(firstVersion.replaceAll("(\r\n|\r)", 
"\n"))

Review Comment:
   > Can we remove the replaceAll? I think remapPUAToXMLIllegalCharacters also 
remaps CRLF/CR to LF?
   > 
   > Which makes sense. XML infoset outputters remap characters so we need to 
do the same for JSON for TDML tests to verify it's all the same.
   
   No this is required. 
   
   Remapping XML illegal characters to PUA also maps CRLF and CR to LF,
   
   But this is the opposite direction. from PUA back to "native" string. We 
need the additional CR/CRLF => LF because it didn't happen during the parse 
into a JSON infoset. 
   
   I'm adding clarification to the code comments about this. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Added test to show preexisting PUA chars are preserved. [daffodil]

Reply via email to