stevedlawrence commented on code in PR #821:
URL: https://github.com/apache/daffodil/pull/821#discussion_r944740002


##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/InfosetOutputter.scala:
##########
@@ -54,11 +54,12 @@ trait InfosetOutputter {
    * @param diSimple the simple element that is started. Various fields of
    *                 DISimple can be accessed to determine things like the
    *                 value, nil, name, namespace, etc.
+   * @param xmlOutputStyle the string that determines how to print data.
    * @return true on sucess, false if there was an error and Daffodil should 
stop all
    *         future calls to the InfosetOutputter
    */
 
-  def startSimple(diSimple: DISimple): Boolean
+  def startSimple(diSimple: DISimple, xmlOutputStyle: String): Boolean

Review Comment:
   I'm not sure I like this change to the `InfosetOutputter`.
   
   For one, it breaks API backwards compatibility, which isn't a huge deal 
since we don't know of other implementations outside of Daffodil, but it's 
still not great.
   
   Though, my main gripe with this is that this CDATA stuff is so specific to 
the `XMLTextInfosetOutputter`. The `JSONInfosetOutputter` definitely doesn't 
need it, and other outputters create actual java objects, and so whitespace 
isn't really an issue for them unless they are pretty printed. But in that case 
the pretty printers can figure out this logic which is outside the scope of 
Daffodil.
   
   Because this is specific to the `XMLTextInfosetOutputter`, I would suggest 
that this should instead be a parameter passed to its constructor, and it's up 
to the caller how to figure out how that gets in there. For API users that's 
easy since they can just supply the new parameter. For CLI users, it's a bit 
more tricky since we don't yet have a way for the CLI to provide options to 
infoset outputters. This is a similar issue as 
[DAFFODIL-2234](https://issues.apache.org/jira/browse/DAFFODIL-2234), so maybe 
we figure that out separately, and this is only a feature for API users until 
then?
   
   This would also man that you don't have to pass this xmlOutputStyle around 
all over the place just to get it to the one outputter that actually cares 
about it. It should reduce the changes quite a bit.



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/JDOMInfosetOutputter.scala:
##########
@@ -48,14 +48,57 @@ class JDOMInfosetOutputter extends InfosetOutputter
     true
   }
 
-  def startSimple(diSimple: DISimple): Boolean = {
+  def startSimple(diSimple: DISimple, xmlOutputStyle: String): Boolean = {
 
     val elem = createElement(diSimple)
+    val correctFormat = new StringBuilder("");
+    val cdataIntro = "<![CDATA["
+    val cdataOutro = "]]>"
+    var charEntMode: Boolean = false
 
     if (diSimple.hasValue) {
       val text =
         if (diSimple.erd.optPrimType.get.isInstanceOf[NodeInfo.String.Kind]) {
-          remapped(diSimple.dataValueAsString)
+          val s = remapped(diSimple.dataValueAsString)
+
+            if(xmlOutputStyle == "prettyPrintSafe"){
+              val newData = s.replaceAll(">","&gt;").replace("\\r\\n", 
"&#xE00D;")
+              val readyForCDATA = newData.replaceAll("0x","&#x")
+              //Figure out what mode you have to be in
+              if(readyForCDATA(0) == '&') {
+                charEntMode = true
+              } else {
+                charEntMode = false
+                correctFormat.append(cdataIntro)
+              }
+
+              //Traverse over the string surrounding correct areas with CDATA 
info
+              for(c <- readyForCDATA) {
+                if(charEntMode) {
+                  correctFormat.append(c)
+                  if(c == ';'){
+                    correctFormat.append(cdataIntro)
+                    charEntMode = false
+                  }
+                } else {
+                  if(c == '&'){
+                    correctFormat.append(cdataOutro)
+                    charEntMode = true
+                  }
+                  correctFormat.append(c)
+                }
+              }
+
+              //You are done with the string. If you are still a non
+                   //char ent then close and finish up.
+              if(!charEntMode){
+                correctFormat.append(cdataOutro)
+              }
+

Review Comment:
   There's a lot of special logic and edge cases here needed to convert a 
string for CDATA. Mike found some potential edge cases we are missing.
   
   I wonder if we can use one of the XML library dependencies we have to 
perform this logic. I would be surprised if one of them doesn't already have 
logic for something like this. Let's not reinvent the wheel if possible.



##########
daffodil-sapi/src/test/scala/org/apache/daffodil/example/TestScalaAPI.scala:
##########
@@ -1167,4 +1168,80 @@ class TestScalaAPI {
       assertFalse(ur.isError)
       assertEquals(expectedData, bos.toString())
     }
+
+    @Test
+    def testScalaAPICDATA1(): Unit = {
+      val c = 
Daffodil.compiler().withTunable("xmlOutputStyle","prettyPrintSafe")
+
+      val schemaFile = getResource("/test/sapi/mySchemaCDATA1.dfdl.xsd")
+      val pf = c.compileFile(schemaFile)
+      var dp = pf.onPath("/")
+      dp = reserializeDataProcessor(dp)
+
+      val file = getResource("/test/sapi/myDataCDATA1.dat")

Review Comment:
   One thing to I try to remember is to also specify a charset when converting 
a string to bytes. We've had issues where systems with different languages get 
a different character set and cause test failures. So this would instead be 
something like:
   ```scala
   new ByteArrayInputStream(dataString.getBytes("ascii", 
StandardCharsets.UTF_8))
   ```



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/W3CDOMInfosetOutputter.scala:
##########
@@ -56,14 +56,56 @@ class W3CDOMInfosetOutputter extends InfosetOutputter
     true
   }
 
-  def startSimple(diSimple: DISimple): Boolean = {
+  def startSimple(diSimple: DISimple, xmlOutputStyle: String): Boolean = {
 
     val elem = createElement(diSimple)
 
+    val correctFormat = new StringBuilder("");
+    val cdataIntro = "<![CDATA["
+    val cdataOutro = "]]>"
+    var charEntMode: Boolean = false
+
     if (diSimple.hasValue) {
       val text =
         if (diSimple.erd.optPrimType.get.isInstanceOf[NodeInfo.String.Kind]) {
-          remapped(diSimple.dataValueAsString)
+          val s = remapped(diSimple.dataValueAsString)
+
+          if(xmlOutputStyle == "prettyPrintSafe"){

Review Comment:
   Do those even need this logic? Since those are all Java objects there really 
isn't a need to convert things to CDATA. The only case where CDATA would be 
need is if those are pretty printed to strings, but that seems unlikely. The 
whole reason to use those objects is to avoid issues with pretty printed XML.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to