stevedlawrence commented on code in PR #821:
URL: https://github.com/apache/daffodil/pull/821#discussion_r944740002
##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/InfosetOutputter.scala:
##########
@@ -54,11 +54,12 @@ trait InfosetOutputter {
* @param diSimple the simple element that is started. Various fields of
* DISimple can be accessed to determine things like the
* value, nil, name, namespace, etc.
+ * @param xmlOutputStyle the string that determines how to print data.
* @return true on sucess, false if there was an error and Daffodil should
stop all
* future calls to the InfosetOutputter
*/
- def startSimple(diSimple: DISimple): Boolean
+ def startSimple(diSimple: DISimple, xmlOutputStyle: String): Boolean
Review Comment:
I'm not sure I like this change to the `InfosetOutputter`.
For one, it breaks API backwards compatibility, which isn't a huge deal
since we don't know of other implementations outside of Daffodil, but it's
still not great.
Though, my main gripe with this is that this CDATA stuff is so specific to
the `XMLTextInfosetOutputter`. The `JSONInfosetOutputter` definitely doesn't
need it, and other outputters create actual java objects, and so whitespace
isn't really an issue for them unless they are pretty printed. But in that case
the pretty printers can figure out this logic which is outside the scope of
Daffodil.
Because this is specific to the `XMLTextInfosetOutputter`, I would suggest
that this should instead be a parameter passed to its constructor, and it's up
to the caller how to figure out how that gets in there. For API users that's
easy since they can just supply the new parameter. For CLI users, it's a bit
more tricky since we don't yet have a way for the CLI to provide options to
infoset outputters. This is a similar issue as
[DAFFODIL-2234](https://issues.apache.org/jira/browse/DAFFODIL-2234), so maybe
we figure that out separately, and this is only a feature for API users until
then?
This would also man that you don't have to pass this xmlOutputStyle around
all over the place just to get it to the one outputter that actually cares
about it. It should reduce the changes quite a bit.
##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/JDOMInfosetOutputter.scala:
##########
@@ -48,14 +48,57 @@ class JDOMInfosetOutputter extends InfosetOutputter
true
}
- def startSimple(diSimple: DISimple): Boolean = {
+ def startSimple(diSimple: DISimple, xmlOutputStyle: String): Boolean = {
val elem = createElement(diSimple)
+ val correctFormat = new StringBuilder("");
+ val cdataIntro = "<![CDATA["
+ val cdataOutro = "]]>"
+ var charEntMode: Boolean = false
if (diSimple.hasValue) {
val text =
if (diSimple.erd.optPrimType.get.isInstanceOf[NodeInfo.String.Kind]) {
- remapped(diSimple.dataValueAsString)
+ val s = remapped(diSimple.dataValueAsString)
+
+ if(xmlOutputStyle == "prettyPrintSafe"){
+ val newData = s.replaceAll(">",">").replace("\\r\\n",
"")
+ val readyForCDATA = newData.replaceAll("0x","&#x")
+ //Figure out what mode you have to be in
+ if(readyForCDATA(0) == '&') {
+ charEntMode = true
+ } else {
+ charEntMode = false
+ correctFormat.append(cdataIntro)
+ }
+
+ //Traverse over the string surrounding correct areas with CDATA
info
+ for(c <- readyForCDATA) {
+ if(charEntMode) {
+ correctFormat.append(c)
+ if(c == ';'){
+ correctFormat.append(cdataIntro)
+ charEntMode = false
+ }
+ } else {
+ if(c == '&'){
+ correctFormat.append(cdataOutro)
+ charEntMode = true
+ }
+ correctFormat.append(c)
+ }
+ }
+
+ //You are done with the string. If you are still a non
+ //char ent then close and finish up.
+ if(!charEntMode){
+ correctFormat.append(cdataOutro)
+ }
+
Review Comment:
There's a lot of special logic and edge cases here needed to convert a
string for CDATA. Mike found some potential edge cases we are missing.
I wonder if we can use one of the XML library dependencies we have to
perform this logic. I would be surprised if one of them doesn't already have
logic for something like this. Let's not reinvent the wheel if possible.
##########
daffodil-sapi/src/test/scala/org/apache/daffodil/example/TestScalaAPI.scala:
##########
@@ -1167,4 +1168,80 @@ class TestScalaAPI {
assertFalse(ur.isError)
assertEquals(expectedData, bos.toString())
}
+
+ @Test
+ def testScalaAPICDATA1(): Unit = {
+ val c =
Daffodil.compiler().withTunable("xmlOutputStyle","prettyPrintSafe")
+
+ val schemaFile = getResource("/test/sapi/mySchemaCDATA1.dfdl.xsd")
+ val pf = c.compileFile(schemaFile)
+ var dp = pf.onPath("/")
+ dp = reserializeDataProcessor(dp)
+
+ val file = getResource("/test/sapi/myDataCDATA1.dat")
Review Comment:
One thing to I try to remember is to also specify a charset when converting
a string to bytes. We've had issues where systems with different languages get
a different character set and cause test failures. So this would instead be
something like:
```scala
new ByteArrayInputStream(dataString.getBytes("ascii",
StandardCharsets.UTF_8))
```
##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/W3CDOMInfosetOutputter.scala:
##########
@@ -56,14 +56,56 @@ class W3CDOMInfosetOutputter extends InfosetOutputter
true
}
- def startSimple(diSimple: DISimple): Boolean = {
+ def startSimple(diSimple: DISimple, xmlOutputStyle: String): Boolean = {
val elem = createElement(diSimple)
+ val correctFormat = new StringBuilder("");
+ val cdataIntro = "<![CDATA["
+ val cdataOutro = "]]>"
+ var charEntMode: Boolean = false
+
if (diSimple.hasValue) {
val text =
if (diSimple.erd.optPrimType.get.isInstanceOf[NodeInfo.String.Kind]) {
- remapped(diSimple.dataValueAsString)
+ val s = remapped(diSimple.dataValueAsString)
+
+ if(xmlOutputStyle == "prettyPrintSafe"){
Review Comment:
Do those even need this logic? Since those are all Java objects there really
isn't a need to convert things to CDATA. The only case where CDATA would be
need is if those are pretty printed to strings, but that seems unlikely. The
whole reason to use those objects is to avoid issues with pretty printed XML.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]