[GitHub] [daffodil] stevedlawrence commented on a change in pull request #560: Verify that we get errors if the files contain a DOCTYPE

GitBox Mon, 24 May 2021 05:28:03 -0700


stevedlawrence commented on a change in pull request #560:
URL: https://github.com/apache/daffodil/pull/560#discussion_r637899469




##########
File path: 
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala
##########
@@ -347,7 +346,11 @@ class DataProcessor private (
 
   override def getDiagnostics = ssrd.diagnostics
 
-  override def newXMLReaderInstance: DFDL.DaffodilParseXMLReader = new 
DaffodilParseXMLReader(this)
+  override def newXMLReaderInstance: DFDL.DaffodilParseXMLReader = {
+    val xrdr = new DaffodilParseXMLReader(this)
+    XMLUtils.setSecureDefaults(xrdr)

Review comment:
       Thoughts on just removing this call? We aren't getting a random 
XMLReader here based on some factory that happens to be on the classpath. We 
know it's a Daffodil specific XML reader, and we know it's secure from these 
XML issues. Removing this would potentially mean we could remove the new logic 
added in set/getFeature to accept features that we don't really support. 

##########
File path: 
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/SchemaSetRuntime1Mixin.scala
##########
@@ -72,37 +74,61 @@ trait SchemaSetRuntime1Mixin { self : SchemaSet =>
   }.value
 
   def onPath(xpath: String): DFDL.DataProcessor = {
-      Assert.usage(!isError)
-      if (xpath != "/") root.notYetImplemented("""Path must be "/". Other path 
support is not yet implemented.""")
-      val rootERD = root.elementRuntimeData
-      root.schemaDefinitionUnless(
-        rootERD.outputValueCalcExpr.isEmpty,
-        "The root element cannot have the dfdl:outputValueCalc property.")
-      val validationMode = ValidationMode.Off
-      val p = if (!root.isError) parser else null
-      val u = if (!root.isError) unparser else null
-      val ssrd = new SchemaSetRuntimeData(
-        p,
-        u,
-        this.diagnostics,
-        rootERD,
-        variableMap,
-        typeCalcMap)
-      if (root.numComponents > root.numUniqueComponents)
-        log(LogLevel.Info, "Compiler: component counts: unique %s, actual %s.",
-          root.numUniqueComponents, root.numComponents)
-      val dataProc = new DataProcessor(ssrd, tunable, 
self.compilerExternalVarSettings)
-      if (dataProc.isError) {
-        // NO longer printing anything here. Callers must do this.
-        //        val diags = dataProc.getDiagnostics
-        //        log(LogLevel.Error,"Compilation (DataProcessor) reports %s 
compile errors/warnings.", diags.length)
-        //        diags.foreach { diag => log(LogLevel.Error, diag.toString()) 
}
-      } else {
-        log(LogLevel.Compile, "Parser = %s.", ssrd.parser.toString)
-        log(LogLevel.Compile, "Unparser = %s.", ssrd.unparser.toString)
-        log(LogLevel.Compile, "Compilation (DataProcesor) completed with no 
errors.")
-      }
-      dataProc
+    Assert.usage(!isError)
+    if (xpath != "/") root.notYetImplemented("""Path must be "/". Other path 
support is not yet implemented.""")
+    val rootERD = root.elementRuntimeData
+    root.schemaDefinitionUnless(
+      rootERD.outputValueCalcExpr.isEmpty,
+      "The root element cannot have the dfdl:outputValueCalc property.")
+    val validationMode = ValidationMode.Off
+    val p = if (!root.isError) parser else null
+    val u = if (!root.isError) unparser else null
+    val ssrd = new SchemaSetRuntimeData(
+      p,
+      u,
+      this.diagnostics,
+      rootERD,
+      variableMap,
+      typeCalcMap)
+    if (root.numComponents > root.numUniqueComponents)
+      log(LogLevel.Info, "Compiler: component counts: unique %s, actual %s.",
+        root.numUniqueComponents, root.numComponents)
+    val dataProc = new DataProcessor(ssrd, tunable, 
self.compilerExternalVarSettings)
+    //
+    // now we fake serialize to a dev/null-type output stream which forces
+    // any lazy evaluation that hasn't completed to complete.
+    // Those things could signal errors, so we do this before we check for 
errors.
+    //
+    // Note that calling preSerialization is not sufficient, since that's only 
mixed into
+    // objects with lazy evaluation. A SSRD is just a tuple-like object, does 
not mixin
+    // preSerialization, and shouldn't need to. We need to
+    // serialize all its substructure to insure all preSerializations, that 
force
+    // all lazy evaluations, are done.
+    //
+    // Overhead-wise, this is costly, if the caller is about to save the 
processor themselves
+    // But as there have been cases of Runtime1 processors which end up doing 
lazy evaluation
+    // that ends up happening late, this eliminates a source of bugs, albeit, 
by masking them
+    // so they are not detectable.
+    //
+    // Best to address this for real when we refactor Runtime1 to fully 
separate it from
+    // the schema compiler. At that point we can draw a firmer line about the 
compiler's output
+    // being fully realized before runtime objects are constructed.
+    //
+    // We don't call save() here, because that does a few other things than 
just serialize.
+    val oos = new 
ObjectOutputStream(org.apache.commons.io.output.NullOutputStream.NULL_OUTPUT_STREAM)
+    oos.writeObject(dataProc)
+
+    if (dataProc.isError) {
+      // NO longer printing anything here. Callers must do this.
+      //        val diags = dataProc.getDiagnostics
+      //        log(LogLevel.Error,"Compilation (DataProcessor) reports %s 
compile errors/warnings.", diags.length)
+      //        diags.foreach { diag => log(LogLevel.Error, diag.toString()) }
+    } else {
+      log(LogLevel.Compile, "Parser = %s.", ssrd.parser.toString)
+      log(LogLevel.Compile, "Unparser = %s.", ssrd.unparser.toString)
+      log(LogLevel.Compile, "Compilation (DataProcesor) completed with no 
errors.")
+    }
+    dataProc
   }
 
-}
+}

Review comment:
       This file is missing a new line. Not sure why that would be removed. Did 
you IDE settings change?

##########
File path: 
daffodil-core/src/main/scala/org/apache/daffodil/runtime1/SchemaSetRuntime1Mixin.scala
##########
@@ -72,37 +74,61 @@ trait SchemaSetRuntime1Mixin { self : SchemaSet =>
   }.value
 
   def onPath(xpath: String): DFDL.DataProcessor = {
-      Assert.usage(!isError)
-      if (xpath != "/") root.notYetImplemented("""Path must be "/". Other path 
support is not yet implemented.""")
-      val rootERD = root.elementRuntimeData
-      root.schemaDefinitionUnless(
-        rootERD.outputValueCalcExpr.isEmpty,
-        "The root element cannot have the dfdl:outputValueCalc property.")
-      val validationMode = ValidationMode.Off
-      val p = if (!root.isError) parser else null
-      val u = if (!root.isError) unparser else null
-      val ssrd = new SchemaSetRuntimeData(
-        p,
-        u,
-        this.diagnostics,
-        rootERD,
-        variableMap,
-        typeCalcMap)
-      if (root.numComponents > root.numUniqueComponents)
-        log(LogLevel.Info, "Compiler: component counts: unique %s, actual %s.",
-          root.numUniqueComponents, root.numComponents)
-      val dataProc = new DataProcessor(ssrd, tunable, 
self.compilerExternalVarSettings)
-      if (dataProc.isError) {
-        // NO longer printing anything here. Callers must do this.
-        //        val diags = dataProc.getDiagnostics
-        //        log(LogLevel.Error,"Compilation (DataProcessor) reports %s 
compile errors/warnings.", diags.length)
-        //        diags.foreach { diag => log(LogLevel.Error, diag.toString()) 
}
-      } else {
-        log(LogLevel.Compile, "Parser = %s.", ssrd.parser.toString)
-        log(LogLevel.Compile, "Unparser = %s.", ssrd.unparser.toString)
-        log(LogLevel.Compile, "Compilation (DataProcesor) completed with no 
errors.")
-      }
-      dataProc
+    Assert.usage(!isError)
+    if (xpath != "/") root.notYetImplemented("""Path must be "/". Other path 
support is not yet implemented.""")
+    val rootERD = root.elementRuntimeData
+    root.schemaDefinitionUnless(
+      rootERD.outputValueCalcExpr.isEmpty,
+      "The root element cannot have the dfdl:outputValueCalc property.")
+    val validationMode = ValidationMode.Off
+    val p = if (!root.isError) parser else null
+    val u = if (!root.isError) unparser else null
+    val ssrd = new SchemaSetRuntimeData(
+      p,
+      u,
+      this.diagnostics,
+      rootERD,
+      variableMap,
+      typeCalcMap)
+    if (root.numComponents > root.numUniqueComponents)
+      log(LogLevel.Info, "Compiler: component counts: unique %s, actual %s.",
+        root.numUniqueComponents, root.numComponents)
+    val dataProc = new DataProcessor(ssrd, tunable, 
self.compilerExternalVarSettings)
+    //
+    // now we fake serialize to a dev/null-type output stream which forces
+    // any lazy evaluation that hasn't completed to complete.
+    // Those things could signal errors, so we do this before we check for 
errors.
+    //
+    // Note that calling preSerialization is not sufficient, since that's only 
mixed into
+    // objects with lazy evaluation. A SSRD is just a tuple-like object, does 
not mixin
+    // preSerialization, and shouldn't need to. We need to
+    // serialize all its substructure to insure all preSerializations, that 
force
+    // all lazy evaluations, are done.
+    //
+    // Overhead-wise, this is costly, if the caller is about to save the 
processor themselves
+    // But as there have been cases of Runtime1 processors which end up doing 
lazy evaluation
+    // that ends up happening late, this eliminates a source of bugs, albeit, 
by masking them
+    // so they are not detectable.

Review comment:
       Agreed with this comment. And even if a user doesn't call save(), 
there's still going to be some overhead to serialize the objects, even if they 
aren't written.
   
   Can we create a bug so we remember to fix this and remove this 
serialization? Feels like something we might want to have fixed before the next 
release.

##########
File path: 
daffodil-cli/src/it/scala/org/apache/daffodil/debugger/TestCLIDebugger.scala
##########
@@ -1265,20 +1265,29 @@ class TestCLIdebugger {
       shell.sendLine("step")
       shell.sendLine("step")
       shell.sendLine("step")
-      shell.sendLine("step")
-      shell.sendLine("step")
+      //
+      // NOTE FOR REVIEWER
+      //
+      // This test had to be modified to work with this change set.
+      // I have no idea why this change set would affect behavior
+      // of this kind.
+      //

Review comment:
       Maybe related to the serilazation stuff? Maybe forcing serialization is 
cause different code paths to be triggers? Or maybe thing are getting evaluated 
in different orders, which somehow changes thigns? Doesn't seem likely, but the 
only thing I can think of. Does this behavior switch back if we remove your 
serialization changes?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [daffodil] stevedlawrence commented on a change in pull request #560: Verify that we get errors if the files contain a DOCTYPE

Reply via email to