mbeckerle commented on code in PR #1250:
URL: https://github.com/apache/daffodil/pull/1250#discussion_r1622514929
##########
daffodil-tdml-processor/src/main/scala/org/apache/daffodil/processor/tdml/TDMLInfosetInputter.scala:
##########
@@ -81,7 +82,12 @@ class TDMLInfosetInputter(
case _ if (firstVersion eq null) => ""
// the json infoset inputter maintains CRLF/CR, but XML converts
CRLF/CR to
// LF. So if this is Json, then we want the CRLF/CR converted to LF
- case jsonii: JsonInfosetInputter =>
firstVersion.replaceAll("(\r\n|\r)", "\n")
+ // Furthermore, the json infoset inputter maintains Unicode PUA
characters, the XML
+ // infoset inputters have to remap some characters from the PUA back
to the
+ // XML-illegal characters. For sake of comparison with other XML
infoset inputters
+ // we also remap AS IF it were XML.
+ case jsonii: JsonInfosetInputter =>
+
XMLUtils.remapPUAToXMLIllegalCharacters(firstVersion.replaceAll("(\r\n|\r)",
"\n"))
Review Comment:
> Can we remove the replaceAll? I think remapPUAToXMLIllegalCharacters also
remaps CRLF/CR to LF?
>
> Which makes sense. XML infoset outputters remap characters so we need to
do the same for JSON for TDML tests to verify it's all the same.
No this is required.
Remapping XML illegal characters to PUA also maps CRLF and CR to LF,
But this is the opposite direction. from PUA back to "native" string. We
need the additional CR/CRLF => LF because it didn't happen during the parse
into a JSON infoset.
I'm adding clarification to the code comments about this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]