tuxji commented on a change in pull request #560:
URL: https://github.com/apache/daffodil/pull/560#discussion_r638167992



##########
File path: 
daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java
##########
@@ -48,9 +49,31 @@
 import org.apache.daffodil.japi.logger.ConsoleLogWriter;
 import org.apache.daffodil.japi.logger.LogLevel;
 import org.apache.daffodil.japi.io.InputSourceDataInputStream;
+import org.xml.sax.SAXNotRecognizedException;
+import org.xml.sax.SAXNotSupportedException;
+import org.xml.sax.XMLReader;
+
+import javax.xml.XMLConstants;
 
 public class TestJavaAPI {
 
+    /**
+     * Best practices for XML loading are to turn off anything that could lead 
to
+     * insecurity.
+     *
+     * This is probably unnecessary in the case of these tests, but as these 
tests
+     * are also used to illustrate API usage, this exemplifies best practice.
+     */
+    public static void setSecureDefaults(XMLReader xmlReader)
+            throws SAXNotSupportedException, SAXNotRecognizedException {
+        xmlReader.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
+        // since we're not really sure what they mean by secure processing
+        // we make doubly sure by setting these ourselves also.
+        
xmlReader.setFeature("http://apache.org/xml/features/disallow-doctype-decl";, 
true);
+        
xmlReader.setFeature("http://xml.org/sax/features/external-parameter-entities";, 
false);
+        
xmlReader.setFeature("http://xml.org/sax/features/external-general-entities";, 
false);
+    }
+

Review comment:
       Should Java programs call XMLUtils.setSecureDefaults instead of defining 
their own setSecureDefaults function?

##########
File path: 
daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java
##########
@@ -40,6 +40,7 @@
 import org.apache.commons.io.IOUtils;
 import org.apache.daffodil.japi.*;
 import org.apache.daffodil.japi.infoset.XMLTextInfosetOutputter;
+import org.apache.daffodil.xml.XMLUtils;

Review comment:
       Why import XMLUtils but still define a setSecureDefaults function below? 
 Remove the import if Java programs aren't supposed to call 
XMLUtils.setSecureDefaults, or keep the import and show how Java programs can 
call XMLUtils.setSecureDefaults.

##########
File path: 
daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java
##########
@@ -48,9 +49,31 @@
 import org.apache.daffodil.japi.logger.ConsoleLogWriter;
 import org.apache.daffodil.japi.logger.LogLevel;
 import org.apache.daffodil.japi.io.InputSourceDataInputStream;
+import org.xml.sax.SAXNotRecognizedException;
+import org.xml.sax.SAXNotSupportedException;
+import org.xml.sax.XMLReader;
+
+import javax.xml.XMLConstants;

Review comment:
       If you call XMLUtils.setSecureDefaults, you won't need these imports 
anymore.

##########
File path: 
daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java
##########
@@ -1008,6 +1031,7 @@ public void testJavaAPI20() throws IOException, 
ClassNotFoundException {
         try {
             org.xml.sax.XMLReader unparseXMLReader = 
javax.xml.parsers.SAXParserFactory.newInstance()
                     .newSAXParser().getXMLReader();
+            setSecureDefaults(unparseXMLReader);

Review comment:
       Can Java programs call DaffodilSAXParserFactory.newInstance (which you 
would need to define) or should they call XMLUtils.setSecureDefaults?

##########
File path: 
daffodil-core/src/test/scala/org/apache/daffodil/processor/TestSAXParseUnparseAPI.scala
##########
@@ -48,8 +47,7 @@ class TestSAXParseUnparseAPI {
     val pr = 
parseXMLReader.getProperty(XMLUtils.DAFFODIL_SAX_URN_PARSERESULT).asInstanceOf[ParseResult]
     assertTrue(!pr.isError)
     assertEquals(expectedInfoset, scala.xml.XML.loadString(baosParse.toString))
-
-    val unparseXMLReader = 
SAXParserFactory.newInstance().newSAXParser().getXMLReader
+    val unparseXMLReader = 
DaffodilSAXParserFactory().newSAXParser().getXMLReader

Review comment:
       Please check remaining calls to SAXParserFactory.  I see:
   
   ```text
   daffodil-japi/src/main/java/org/apache/daffodil/japi/package-info.java
   260: *   XMLReader xmlReader = 
SAXParserFactory().newSAXParser().getXMLReader;
   
   daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java
   1032:            org.xml.sax.XMLReader unparseXMLReader = 
javax.xml.parsers.SAXParserFactory.newInstance()
   1168:            org.xml.sax.XMLReader unparseXMLReader = 
javax.xml.parsers.SAXParserFactory.newInstance()
   
   
daffodil-lib/src/test/scala/org/apache/daffodil/util/TestXMLCatalogAndValidate.scala
   33:import javax.xml.parsers.SAXParserFactory
   410:  val f: SAXParserFactory = DaffodilSAXParserFactory()
   
   daffodil-sapi/src/main/scala/org/apache/daffodil/sapi/package.scala
   233: *  val xmlReader = 
SAXParserFactory.newInstance.newSAXParser.getXMLReader
   
   daffodil-sapi/src/test/scala/org/apache/daffodil/example/TestScalaAPI.scala
   1066:    val unparseXMLReader = 
javax.xml.parsers.SAXParserFactory.newInstance.newSAXParser.getXMLReader
   1147:    val unparseXMLReader = 
javax.xml.parsers.SAXParserFactory.newInstance.newSAXParser.getXMLReader
   
   
daffodil-schematron/src/main/scala/org/apache/daffodil/validation/schematron/Schematron.scala
   23:import javax.xml.parsers.SAXParserFactory
   54:      val f = try featuress.foldLeft(SAXParserFactory.newInstance){ (fac, 
ft) =>
   ```

##########
File path: 
daffodil-lib/src/main/scala/org/apache/daffodil/cookers/EntityReplacer.scala
##########
@@ -493,24 +493,10 @@ sealed abstract class StringLiteralBase(propNameArg: 
String,
   extends AutoPropNameBase(propNameArg)
     with StringLiteralCookerMixin {
 
-  private val xmlEntityPattern = new Regex("""&(quot|amp|apos|lt|gt);""", 
"entity")
-
   /**
    * Thawed means XML entities have been replaced. So & is &, etc.
    */
-  private def thaw(raw: String) = {
-    val thawed: String = {
-      val res = xmlEntityPattern.replaceAllIn(raw, m => {
-        val sb = scala.xml.Utility.unescape(m.group("entity"), new 
StringBuilder())
-        // There really is no possibility for null to come back as we've made
-        // sure to only include valid xml entities in the xmlEntityPattern.
-        Assert.invariant(sb ne null)
-        sb.toString()
-      })
-      res
-    }
-    thawed
-  }
+  private def thaw(raw: String) = XMLUtils.unescape(raw)

Review comment:
       It isn't obvious to me why EntityReplacer.scala had this change made - 
what's the reason?

##########
File path: 
daffodil-core/src/main/scala/org/apache/daffodil/dsom/DFDLSchemaFile.scala
##########
@@ -134,20 +134,17 @@ final class DFDLSchemaFile(
 
   lazy val isDFDLSchemaFile = iiXMLSchemaDocument.isDFDLSchema
 
+  private lazy val loader = new DaffodilXMLLoader(this)
+
   lazy val iiXMLSchemaDocument = LV('iiXMLSchemaDocument) {
     val res = loadXMLSchemaDocument(seenBefore, Some(this))
     if (res.isDFDLSchema && sset.validateDFDLSchemas) {
       //
       // We validate DFDL schemas, only if validation is requested.
       // Some things, tests generally, want to turn this validation off.
       //
-
-      val ldr = new DaffodilXMLLoader(this)
-      ldr.setValidation(true)
-      try {
-        ldr.load(schemaSource) // validate as XML file with XML Schema for 
DFDL Schemas
-        ldr.validateSchema(schemaSource) // validate as XSD (catches UPA 
errors for example)
-      } catch {
+      try loader.validateAsXMLSchema(schemaSource) // validate as XSD (catches 
UPA errors for example)
+      catch {

Review comment:
       Please clarify whether the number of passes has changed.  Steve thinks 
it hasn't changed and I think he's right, but I wanted to make sure.

##########
File path: 
daffodil-tdml-processor/src/main/scala/org/apache/daffodil/tdml/processor/DaffodilTDMLDFDLProcessor.scala
##########
@@ -159,14 +163,18 @@ final class TDMLDFDLProcessorFactory private (
   override def getProcessor(
     schemaSource: DaffodilSchemaSource,
     useSerializedProcessor: Boolean,
-    optRootName: Option[String] = None,
-    optRootNamespace: Option[String] = None): TDML.CompileResult = {
+    optRootName: Option[String],
+    optRootNamespace: Option[String],
+    tunables: Map[String, String]): TDML.CompileResult = {

Review comment:
       I'm agnostic whether to keep the cache a global singleton or make it 
part of the testsuite/runner object, but if you (Mike) think there's some merit 
to the idea, you can create a Jira bug to remember it as a future improvement 
idea.  

##########
File path: 
daffodil-cli/src/it/scala/org/apache/daffodil/debugger/TestCLIDebugger.scala
##########
@@ -1265,20 +1265,29 @@ class TestCLIdebugger {
       shell.sendLine("step")
       shell.sendLine("step")
       shell.sendLine("step")
-      shell.sendLine("step")
-      shell.sendLine("step")
+      //
+      // NOTE FOR REVIEWER
+      //
+      // This test had to be modified to work with this change set.
+      // I have no idea why this change set would affect behavior
+      // of this kind.
+      //

Review comment:
       Good idea to check serialization changes, too bad it wasn't the real 
reason so that we could take these comments out and document the real reason in 
the commit message instead.

##########
File path: 
daffodil-tdml-processor/src/main/scala/org/apache/daffodil/tdml/processor/DaffodilTDMLDFDLProcessor.scala
##########
@@ -337,15 +340,18 @@ class DaffodilTDMLDFDLProcessor private (private var dp: 
DataProcessor) extends
     new DaffodilTDMLParseResult(actual, outputter)
   }
 
-  def doUnparseWithBothApis(dpInputter: TDMLInfosetInputter, saxInputStream: 
java.io.InputStream,
+  def doUnparseWithBothApis(
+    dpInputter: TDMLInfosetInputter,
+    saxInputStream: java.io.InputStream,
     dpOutputStream: java.io.OutputStream): DaffodilTDMLUnparseResult = {
 
     val dpOutputChannel = java.nio.channels.Channels.newChannel(dpOutputStream)
     val saxOutputStream = new ByteArrayOutputStream
     val saxOutputChannel = 
java.nio.channels.Channels.newChannel(saxOutputStream)
     val unparseContentHandler = dp.newContentHandlerInstance(saxOutputChannel)
     unparseContentHandler.enableInputterResolutionOfRelativeInfosetBlobURIs()
-    val xmlReader = SAXParserFactory.newInstance.newSAXParser.getXMLReader
+    val xmlReader = DaffodilSAXParserFactory().newSAXParser.getXMLReader
+    XMLUtils.setSecureDefaults(xmlReader)

Review comment:
       No, this XMLUtils.setSecureDefaults call shouldn't be needed anymore.

##########
File path: 
daffodil-core/src/main/scala/org/apache/daffodil/dsom/DFDLSchemaFile.scala
##########
@@ -117,11 +117,11 @@ final class DFDLSchemaFile(
       val ldr = new DaffodilXMLLoader(this)
       //
       // We do not want to validate here ever, because we have to examine the
-      // root xs:schema eleemnt of a schema to decide if it is a  DFDL schema
+      // root xs:schema element of a schema to decide if it is a  DFDL schema
       // at all that we're even supposed to compile.
       //
-      ldr.setValidation(false)
-      val node = ldr.load(schemaSource)
+      val node = ldr.load(schemaSource, None,
+        addPositionAttributes = true) // need line numbers for diagnostics
       schemaDefinitionUnless(node != null, "Unable to load XML from %s.", 
diagnosticDebugName)

Review comment:
       I suggest renaming the var ldr to var loader, moving it from line 117 to 
line 123 so it follows the big comment section, unwrapping the loader.load call 
back to one line, and moving the "// need line numbers for diagnostics" to a 
new line above it.

##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -803,14 +802,28 @@ object Main extends Logging {
           case Left(bytes) => new ByteArrayInputStream(bytes)
           case Right(is) => is
         }
-        scala.xml.XML.load(is)
+        val parser: SAXParser = {
+          val f = DaffodilSAXParserFactory()
+          f.setNamespaceAware(false)
+          val p = f.newSAXParser()
+          val xrdr = p.getXMLReader
+          p
+        }
+        scala.xml.XML.withSAXParser(parser).load(is)

Review comment:
       We don't need the getXMLReader call on line 809 anymore.

##########
File path: 
daffodil-schematron/src/main/scala/org/apache/daffodil/validation/schematron/Schematron.scala
##########
@@ -59,7 +59,9 @@ object Schematron {
           throw ValidatorInitializationException(s"Error setting feature on 
parser: ${ex.getMessage}")
       }
       f.setValidating(false)
-      f.newSAXParser.getXMLReader
+      val xr = f.newSAXParser.getXMLReader
+      XMLUtils.setSecureDefaults(xr)
+      xr

Review comment:
       Can we call DaffodilSAXParserFactory().newInstance instead of 
SAXParserFactory.newInstance on line 54 and revert the change replacing line 62 
with lines 62-64?

##########
File path: 
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
##########
@@ -538,47 +398,223 @@ trait SchemaAwareLoaderMixin {
  * do any validation either), however, once Daffodil starts processing the
  * DFDL schema nodes, it resolves references using the same one true XML 
catalog resolver.
  */
-class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler) {
+class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler)
+  extends NoBindingFactoryAdapter {
 
   def this() = this(RethrowSchemaErrorHandler)
 
-  def xercesAdapter = new DFDLXercesAdapter(errorHandler)
+  private def resolver = DFDLCatalogResolver.get
 
-  //
-  // Controls whether we setup Xerces for validation or not.
-  //
-  final var doValidation: Boolean = true
+  /**
+   * UPA errors are detected by xerces if the schema-full-checking feature is
+   * turned on, AND if you inform xerces that it is reading an XML Schema
+   * (i.e., xsd).
+   *
+   * Detecting these requires that we do THREE passes
+   * 1) load the DFDL schema as an XML document. This validates it against the 
XML Schema
+   * for DFDL schemas.
+   * 2) load the DFDL schema as an XSD - xerces then does lots of more 
intensive checking
+   * of the schema
+   * 3) load the schema for our own consumption by Daffodil code. This uses the
+   * constructing parser so as to preserve CDATA regions (xerces just does the 
wrong
+   * thing with those,...fatally so). Then our own semantic checking is 
performed
+   * as part of compiling the DFDL schema.
+   *
+   * Checks like UPA are in step (2) above. They are coded algorithmically
+   * right into Xerces. This is accomplished by
+   * using the below SchemaFactory and SchemaFactory.newSchema calls.  The
+   * newSchema call is what forces schema validation to take place.
+   */
+  private lazy val schemaFactory = {
+    val sf = new org.apache.xerces.jaxp.validation.XMLSchemaFactory()
+    sf.setResourceResolver(resolver)
+    sf.setErrorHandler(errorHandler)
+    sf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true)
+    sf.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true)
+    // These are not recognized by a schemaFactory
+    // sf.setFeature("http://xml.org/sax/features/validation";, true)
+    // sf.setFeature("http://apache.org/xml/features/validation/schema";, true)
+    // 
sf.setFeature("http://apache.org/xml/features/validation/schema-full-checking";, 
true)
+    sf
+  }
 
-  def setValidation(flag: Boolean): Unit = {
-    doValidation = flag
+  /**
+   * This loads the DFDL schema as an XML Schema. This will
+   * check many more things (ex: UPA) about the DFDL schema other than
+   * just whether it validates against the XML Schema for DFDL schemas.
+   *
+   * Unfortunately, we don't have control over how Xerces loads up these 
schemas
+   * (other than the resolver anyway), so we can't really redirect the way
+   * it issues error messages so that it properly lays blame at say, the 
schema fragments
+   * inside an embedded schema of a TDML file.
+   *
+   * So if we want good file/line/column info from this, we have to give
+   * it a plain old file or resource, and not try to play games to get it to
+   * pick up the file/line/col information from attributes of the elements.
+   *
+   * Due to limitations in the xerces newSchema() method
+   * this method should be called only after loading
+   * the schema as a regular XML file, which itself insists
+   * on the XMLUtils.setSecureDefaults, so we don't need to
+   * further check that here.

Review comment:
       Is this comment (which mentions XMLUtils.setSecureDefaults) still up to 
date or should it say DaffodilSAXParserFactory instead?

##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -1484,7 +1498,7 @@ object Main extends Logging {
   private def unparseWithSAX(
     is: InputStream,
     contentHandler: DFDL.DaffodilUnparseContentHandler): UnparseResult = {
-    val xmlReader = SAXParserFactory.newInstance.newSAXParser.getXMLReader
+    val xmlReader: XMLReader = 
DaffodilSAXParserFactory().newSAXParser.getXMLReader

Review comment:
       Do we still need the explicit XMLReader type?

##########
File path: 
daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java
##########
@@ -1143,6 +1167,7 @@ public void testJavaAPI23() throws IOException, 
ClassNotFoundException {
         try {
             org.xml.sax.XMLReader unparseXMLReader = 
javax.xml.parsers.SAXParserFactory.newInstance()
                     .newSAXParser().getXMLReader();
+            setSecureDefaults(unparseXMLReader);

Review comment:
       Likewise, DaffodilSAXParserFactory.newInstance or 
XMLUtils.setSecureDefaults might be better here.

##########
File path: 
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
##########
@@ -538,47 +398,223 @@ trait SchemaAwareLoaderMixin {
  * do any validation either), however, once Daffodil starts processing the
  * DFDL schema nodes, it resolves references using the same one true XML 
catalog resolver.
  */
-class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler) {
+class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler)
+  extends NoBindingFactoryAdapter {
 
   def this() = this(RethrowSchemaErrorHandler)
 
-  def xercesAdapter = new DFDLXercesAdapter(errorHandler)
+  private def resolver = DFDLCatalogResolver.get
 
-  //
-  // Controls whether we setup Xerces for validation or not.
-  //
-  final var doValidation: Boolean = true
+  /**
+   * UPA errors are detected by xerces if the schema-full-checking feature is
+   * turned on, AND if you inform xerces that it is reading an XML Schema
+   * (i.e., xsd).
+   *
+   * Detecting these requires that we do THREE passes
+   * 1) load the DFDL schema as an XML document. This validates it against the 
XML Schema
+   * for DFDL schemas.
+   * 2) load the DFDL schema as an XSD - xerces then does lots of more 
intensive checking
+   * of the schema
+   * 3) load the schema for our own consumption by Daffodil code. This uses the
+   * constructing parser so as to preserve CDATA regions (xerces just does the 
wrong
+   * thing with those,...fatally so). Then our own semantic checking is 
performed
+   * as part of compiling the DFDL schema.
+   *
+   * Checks like UPA are in step (2) above. They are coded algorithmically
+   * right into Xerces. This is accomplished by
+   * using the below SchemaFactory and SchemaFactory.newSchema calls.  The
+   * newSchema call is what forces schema validation to take place.
+   */
+  private lazy val schemaFactory = {
+    val sf = new org.apache.xerces.jaxp.validation.XMLSchemaFactory()
+    sf.setResourceResolver(resolver)
+    sf.setErrorHandler(errorHandler)
+    sf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true)
+    sf.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true)
+    // These are not recognized by a schemaFactory
+    // sf.setFeature("http://xml.org/sax/features/validation";, true)
+    // sf.setFeature("http://apache.org/xml/features/validation/schema";, true)
+    // 
sf.setFeature("http://apache.org/xml/features/validation/schema-full-checking";, 
true)
+    sf
+  }
 
-  def setValidation(flag: Boolean): Unit = {
-    doValidation = flag
+  /**
+   * This loads the DFDL schema as an XML Schema. This will
+   * check many more things (ex: UPA) about the DFDL schema other than
+   * just whether it validates against the XML Schema for DFDL schemas.
+   *
+   * Unfortunately, we don't have control over how Xerces loads up these 
schemas
+   * (other than the resolver anyway), so we can't really redirect the way
+   * it issues error messages so that it properly lays blame at say, the 
schema fragments
+   * inside an embedded schema of a TDML file.
+   *
+   * So if we want good file/line/column info from this, we have to give
+   * it a plain old file or resource, and not try to play games to get it to
+   * pick up the file/line/col information from attributes of the elements.
+   *
+   * Due to limitations in the xerces newSchema() method
+   * this method should be called only after loading
+   * the schema as a regular XML file, which itself insists
+   * on the XMLUtils.setSecureDefaults, so we don't need to
+   * further check that here.
+   */
+  def validateAsDFDLSchema(source: DaffodilSchemaSource): Unit = {
+    // first we load it, with validation explicitly against the
+    // schema for DFDL Schemas.
+    load(source, Some(XMLUtils.schemaForDFDLSchemas), addPositionAttributes = 
true)
+    //
+    // Then we validate explicitly so Xerces can check things
+    // such as for UPA violations
+    //
+    val inputSource = source.newInputSource()
+    val saxSource = new SAXSource(inputSource)
+    //
+    // We would like this saxSource to be created from an XMLReader
+    // so that we can call XMLUtils.setSecureDefaults on it.
+    // but we get strange errors if I do that, where every symbol
+    // in the schema has an unrecognized namespace prefix.
+    //

Review comment:
       Likewise, is this comment still up to date?

##########
File path: 
daffodil-lib/src/main/scala/org/apache/daffodil/api/DaffodilSchemaSource.scala
##########
@@ -112,7 +118,7 @@ class URISchemaSource protected (val fileOrResource: URI) 
extends DaffodilSchema
 /**
  * For stdin, or other anonymous pipe-like source of schema.
  */
-case class InputStreamSchemaSource(is: java.io.InputStream, tmpDir: 
Option[File], blameName: String, extension: String) extends 
DaffodilSchemaSource {
+class InputStreamSchemaSource(is: java.io.InputStream, tmpDir: Option[File], 
blameName: String, extension: String) extends DaffodilSchemaSource {

Review comment:
       Curious, why was `case` removed?

##########
File path: 
daffodil-lib/src/test/scala/org/apache/daffodil/util/TestXMLCatalogAndValidate.scala
##########
@@ -400,21 +407,14 @@ class SchemaAwareFactoryAdapter()
   //System.err.println("Creating " + getClass().getName())
   val res = new MyResolver()
   //System.err.println("Creating parser")
-  val f = SAXParserFactory.newInstance()
+  val f: SAXParserFactory = DaffodilSAXParserFactory()

Review comment:
       Might not want to import javax.xml.parsers.SAXParserFactory only to use 
it as a type here (inference should work fine).

##########
File path: 
daffodil-lib/src/main/scala/org/apache/daffodil/validation/XercesValidator.scala
##########
@@ -89,28 +82,42 @@ class XercesValidator(schemaFileNames: Seq[String])
 
   private val validator = new ThreadLocal[XercesValidatorImpl] {
     override def initialValue(): XercesValidatorImpl =
-      initializeValidator(schema.newValidator(), resolver)
+      initializeValidator(schema.newValidator, resolver)
   }
 
-  def validateXML(document: java.io.InputStream): ValidationResult = {
+  def validateXML(document: java.io.InputStream) =
+    validateXML(document, new XercesErrorHandler)

Review comment:
       IDEA often tells me that a def method should have an explicit return 
type.

##########
File path: 
daffodil-sapi/src/test/scala/org/apache/daffodil/example/TestScalaAPI.scala
##########
@@ -1027,6 +1064,7 @@ class TestScalaAPI {
     // prep for SAX unparse
     val unparseContentHandler = dp.newContentHandlerInstance(wbc)
     val unparseXMLReader = 
javax.xml.parsers.SAXParserFactory.newInstance.newSAXParser.getXMLReader
+    setSecureDefaults(unparseXMLReader)

Review comment:
       Maybe call DaffodilSAXParserFactory and remove setSecureDefaults?

##########
File path: 
daffodil-sapi/src/test/scala/org/apache/daffodil/example/TestScalaAPI.scala
##########
@@ -47,11 +47,33 @@ import 
org.apache.daffodil.sapi.DaffodilUnhandledSAXException
 import org.apache.daffodil.sapi.DaffodilUnparseErrorSAXException
 import org.apache.daffodil.sapi.SAXErrorHandlerForSAPITest
 import org.apache.daffodil.sapi.infoset.XMLTextInfosetOutputter
+import org.xml.sax.XMLReader
 
 import java.nio.ByteBuffer
+import javax.xml.XMLConstants
+
+object TestScalaAPI {
+  /**
+   * Best practices for XML loading are to turn off anything that could lead to
+   * insecurity.
+   *
+   * This is probably unnecessary in the case of these tests, but as these 
tests
+   * are also used to illustrate API usage, this exemplifies best practice.
+   */
+  def setSecureDefaults(xmlReader: XMLReader): Unit = {
+    xmlReader.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true)
+    // since we're not really sure what they mean by secure processing
+    // we make doubly sure by setting these ourselves also.
+    
xmlReader.setFeature("http://apache.org/xml/features/disallow-doctype-decl";, 
true)
+    
xmlReader.setFeature("http://xml.org/sax/features/external-parameter-entities";, 
false)
+    
xmlReader.setFeature("http://xml.org/sax/features/external-general-entities";, 
false)
+  }
+}

Review comment:
       Same thought here as in Java API test - should we show how to define 
setSecureDefaults or better yet, show how to call DaffodilSAXParserFactory?

##########
File path: 
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala
##########
@@ -86,6 +105,34 @@ class DaffodilConstructingLoader(uri: URI, errorHandler: 
org.xml.sax.ErrorHandle
     Source.fromInputStream(is, enc)
   }, true) {
 
+  /**
+   * Public constructor insists on normalizingCRLFtoLF behavior.
+   */
+  def this (uri: URI,
+    errorHandler: org.xml.sax.ErrorHandler,
+    addPositionAttributes: Boolean = false) =
+    this(uri, errorHandler, addPositionAttributes, normalizeCRLFtoLF = true)
+

Review comment:
       Doesn't public constructor have a test calling it?  Codecov says line 
114 isn't covered by tests.

##########
File path: 
daffodil-japi/src/main/java/org/apache/daffodil/japi/package-info.java
##########
@@ -257,7 +257,7 @@
  *  WritableByteChannel wbc = java.nio.channels.Channels.newChannel(os);
  *  DaffodilUnparseContentHandler unparseContentHandler = 
dp.newContentHandlerInstance(wbc);
  *  try {
- *   XMLReader xmlReader = 
SAXParserFactory.newInstance().newSAXParser().getXMLReader();
+ *   XMLReader xmlReader = 
SAXParserFactory.newInstance().newSAXParser().getXMLReader;

Review comment:
       Does Java code still need these parentheses after getXMLReader?

##########
File path: project/Dependencies.scala
##########
@@ -48,7 +48,7 @@ object Dependencies {
   lazy val test = Seq(
     "junit" % "junit" % "4.13.2" % "it,test",
     "com.novocode" % "junit-interface" % "0.11" % "it,test",
-    "org.scalacheck" %% "scalacheck" % "1.15.4" % "it,test"
+    "org.scalacheck" %% "scalacheck" % "1.15.4" % "it,test",
   )

Review comment:
       I don't mind adding a trailing comma to the last item in a Seq (Scala 
explicitly [allows](https://docs.scala-lang.org/sips/trailing-commas.html) it 
for good reasons), but the previous Seq's don't use a trailing comma so we've 
introduced an inconsistency.  Maybe fix all the other Seqs too?

##########
File path: 
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala
##########
@@ -2294,29 +2361,60 @@ sealed abstract class DocumentPart(part: Node, parent: 
Document) {
 }
 
 case class Infoset(i: NodeSeq, parent: TestCase) {
-  val Seq(dfdlInfoset) = (i \ "dfdlInfoset").map { node => DFDLInfoset(node, 
this) }
-  val contents = dfdlInfoset.contents
+  lazy val Seq(dfdlInfoset) = (i \ "dfdlInfoset").map { node => 
DFDLInfoset(node, this) }
+  lazy val contents = dfdlInfoset.contents
 }
 
 case class DFDLInfoset(di: Node, parent: Infoset) {
 
-  val infosetNodeSeq = {
-    (di \ "@type").toString match {
-      case "infoset" | "" => di.child.filter {
-        _.isInstanceOf[scala.xml.Elem]
-      }
-      case "file" => {
-        val path = di.text.trim()
-        val maybeURI = parent.parent.parent.findTDMLResource(path)
-        val uri = maybeURI.getOrElse(throw new 
FileNotFoundException("TDMLRunner: infoset file '" + path + "' was not found"))
-        val elem = scala.xml.XML.load(uri.toURL)
-        elem
+  private lazy val infosetNodeSeq = {
+    val testCase: TestCase = parent.parent
+    val loader = testCase.parent.loader
+    val optDataSchema: Option[URI] = {
+      
testCase.optSchemaFileURI.orElse(testCase.optEmbeddedSchema.map{_.uriForLoading})
+    }
+    val src =
+      (di \ "@type").toString match {
+        case "infoset" | "" => {
+          val rawElem = di.child.filter {
+            _.isInstanceOf[scala.xml.Elem]
+          }.head
+          UnitTestSchemaSource(rawElem, testCase.tcName)
+        }
+        case "file" => {
+          val path = di.text.trim()
+          val maybeURI = parent.parent.parent.findTDMLResource(path)
+          val uri = maybeURI.getOrElse(throw new 
FileNotFoundException("TDMLRunner: infoset file '" + path + "' was not found"))
+          URISchemaSource(uri)
+        }
+        case value => Assert.abort("Uknown value for type attribute on 
dfdlInfoset: " + value)
       }
-      case value => Assert.abort("Uknown value for type attribute on 
dfdlInfoset: " + value)
+    //
+    // TODO: DAFFODIL-288 validate the infoset also
+    // You can pass the optDataSchema, which appears to be the correct thing
+    // but in many cases it doesn't seem to be able to resolve things.
+    //
+    val testSuite = testCase.parent
+    val before = testSuite.loadingExceptions.clone()
+    // val elem: Node = loader.load(src, optDataSchema)
+    val elem = loader.load(src, None) // no schema

Review comment:
       Nope, don't do it now, do it later (smaller PR will be easier to review 
next time).

##########
File path: 
daffodil-cli/src/it/scala/org/apache/daffodil/debugger/TestCLIDebugger.scala
##########
@@ -1265,20 +1265,29 @@ class TestCLIdebugger {
       shell.sendLine("step")
       shell.sendLine("step")
       shell.sendLine("step")
-      shell.sendLine("step")
-      shell.sendLine("step")
+      //
+      // NOTE FOR REVIEWER
+      //
+      // This test had to be modified to work with this change set.
+      // I have no idea why this change set would affect behavior
+      // of this kind.
+      //

Review comment:
       Good idea to check serialization changes, too bad it wasn't the real 
reason so that we could take these comments out and document the real reason in 
the commit message instead.

##########
File path: 
daffodil-sapi/src/test/scala/org/apache/daffodil/example/TestScalaAPI.scala
##########
@@ -1107,6 +1145,7 @@ class TestScalaAPI {
     val unparseContentHandler = dp.newContentHandlerInstance(wbc)
     val errorHandler = new SAXErrorHandlerForSAPITest()
     val unparseXMLReader = 
javax.xml.parsers.SAXParserFactory.newInstance.newSAXParser.getXMLReader
+    setSecureDefaults(unparseXMLReader)

Review comment:
       Maybe call DaffodilSAXParserFactory and remove setSecureDefaults?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to