tuxji commented on a change in pull request #560:
URL: https://github.com/apache/daffodil/pull/560#discussion_r638167992
##########
File path:
daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java
##########
@@ -48,9 +49,31 @@
import org.apache.daffodil.japi.logger.ConsoleLogWriter;
import org.apache.daffodil.japi.logger.LogLevel;
import org.apache.daffodil.japi.io.InputSourceDataInputStream;
+import org.xml.sax.SAXNotRecognizedException;
+import org.xml.sax.SAXNotSupportedException;
+import org.xml.sax.XMLReader;
+
+import javax.xml.XMLConstants;
public class TestJavaAPI {
+ /**
+ * Best practices for XML loading are to turn off anything that could lead
to
+ * insecurity.
+ *
+ * This is probably unnecessary in the case of these tests, but as these
tests
+ * are also used to illustrate API usage, this exemplifies best practice.
+ */
+ public static void setSecureDefaults(XMLReader xmlReader)
+ throws SAXNotSupportedException, SAXNotRecognizedException {
+ xmlReader.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
+ // since we're not really sure what they mean by secure processing
+ // we make doubly sure by setting these ourselves also.
+
xmlReader.setFeature("http://apache.org/xml/features/disallow-doctype-decl",
true);
+
xmlReader.setFeature("http://xml.org/sax/features/external-parameter-entities",
false);
+
xmlReader.setFeature("http://xml.org/sax/features/external-general-entities",
false);
+ }
+
Review comment:
Should Java programs call XMLUtils.setSecureDefaults instead of defining
their own setSecureDefaults function?
##########
File path:
daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java
##########
@@ -40,6 +40,7 @@
import org.apache.commons.io.IOUtils;
import org.apache.daffodil.japi.*;
import org.apache.daffodil.japi.infoset.XMLTextInfosetOutputter;
+import org.apache.daffodil.xml.XMLUtils;
Review comment:
Why import XMLUtils but still define a setSecureDefaults function below?
Remove the import if Java programs aren't supposed to call
XMLUtils.setSecureDefaults, or keep the import and show how Java programs can
call XMLUtils.setSecureDefaults.
##########
File path:
daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java
##########
@@ -48,9 +49,31 @@
import org.apache.daffodil.japi.logger.ConsoleLogWriter;
import org.apache.daffodil.japi.logger.LogLevel;
import org.apache.daffodil.japi.io.InputSourceDataInputStream;
+import org.xml.sax.SAXNotRecognizedException;
+import org.xml.sax.SAXNotSupportedException;
+import org.xml.sax.XMLReader;
+
+import javax.xml.XMLConstants;
Review comment:
If you call XMLUtils.setSecureDefaults, you won't need these imports
anymore.
##########
File path:
daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java
##########
@@ -1008,6 +1031,7 @@ public void testJavaAPI20() throws IOException,
ClassNotFoundException {
try {
org.xml.sax.XMLReader unparseXMLReader =
javax.xml.parsers.SAXParserFactory.newInstance()
.newSAXParser().getXMLReader();
+ setSecureDefaults(unparseXMLReader);
Review comment:
Can Java programs call DaffodilSAXParserFactory.newInstance (which you
would need to define) or should they call XMLUtils.setSecureDefaults?
##########
File path:
daffodil-core/src/test/scala/org/apache/daffodil/processor/TestSAXParseUnparseAPI.scala
##########
@@ -48,8 +47,7 @@ class TestSAXParseUnparseAPI {
val pr =
parseXMLReader.getProperty(XMLUtils.DAFFODIL_SAX_URN_PARSERESULT).asInstanceOf[ParseResult]
assertTrue(!pr.isError)
assertEquals(expectedInfoset, scala.xml.XML.loadString(baosParse.toString))
-
- val unparseXMLReader =
SAXParserFactory.newInstance().newSAXParser().getXMLReader
+ val unparseXMLReader =
DaffodilSAXParserFactory().newSAXParser().getXMLReader
Review comment:
Please check remaining calls to SAXParserFactory. I see:
```text
daffodil-japi/src/main/java/org/apache/daffodil/japi/package-info.java
260: * XMLReader xmlReader =
SAXParserFactory().newSAXParser().getXMLReader;
daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java
1032: org.xml.sax.XMLReader unparseXMLReader =
javax.xml.parsers.SAXParserFactory.newInstance()
1168: org.xml.sax.XMLReader unparseXMLReader =
javax.xml.parsers.SAXParserFactory.newInstance()
daffodil-lib/src/test/scala/org/apache/daffodil/util/TestXMLCatalogAndValidate.scala
33:import javax.xml.parsers.SAXParserFactory
410: val f: SAXParserFactory = DaffodilSAXParserFactory()
daffodil-sapi/src/main/scala/org/apache/daffodil/sapi/package.scala
233: * val xmlReader =
SAXParserFactory.newInstance.newSAXParser.getXMLReader
daffodil-sapi/src/test/scala/org/apache/daffodil/example/TestScalaAPI.scala
1066: val unparseXMLReader =
javax.xml.parsers.SAXParserFactory.newInstance.newSAXParser.getXMLReader
1147: val unparseXMLReader =
javax.xml.parsers.SAXParserFactory.newInstance.newSAXParser.getXMLReader
daffodil-schematron/src/main/scala/org/apache/daffodil/validation/schematron/Schematron.scala
23:import javax.xml.parsers.SAXParserFactory
54: val f = try featuress.foldLeft(SAXParserFactory.newInstance){ (fac,
ft) =>
```
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/cookers/EntityReplacer.scala
##########
@@ -493,24 +493,10 @@ sealed abstract class StringLiteralBase(propNameArg:
String,
extends AutoPropNameBase(propNameArg)
with StringLiteralCookerMixin {
- private val xmlEntityPattern = new Regex("""&(quot|amp|apos|lt|gt);""",
"entity")
-
/**
* Thawed means XML entities have been replaced. So & is &, etc.
*/
- private def thaw(raw: String) = {
- val thawed: String = {
- val res = xmlEntityPattern.replaceAllIn(raw, m => {
- val sb = scala.xml.Utility.unescape(m.group("entity"), new
StringBuilder())
- // There really is no possibility for null to come back as we've made
- // sure to only include valid xml entities in the xmlEntityPattern.
- Assert.invariant(sb ne null)
- sb.toString()
- })
- res
- }
- thawed
- }
+ private def thaw(raw: String) = XMLUtils.unescape(raw)
Review comment:
It isn't obvious to me why EntityReplacer.scala had this change made -
what's the reason?
##########
File path:
daffodil-core/src/main/scala/org/apache/daffodil/dsom/DFDLSchemaFile.scala
##########
@@ -134,20 +134,17 @@ final class DFDLSchemaFile(
lazy val isDFDLSchemaFile = iiXMLSchemaDocument.isDFDLSchema
+ private lazy val loader = new DaffodilXMLLoader(this)
+
lazy val iiXMLSchemaDocument = LV('iiXMLSchemaDocument) {
val res = loadXMLSchemaDocument(seenBefore, Some(this))
if (res.isDFDLSchema && sset.validateDFDLSchemas) {
//
// We validate DFDL schemas, only if validation is requested.
// Some things, tests generally, want to turn this validation off.
//
-
- val ldr = new DaffodilXMLLoader(this)
- ldr.setValidation(true)
- try {
- ldr.load(schemaSource) // validate as XML file with XML Schema for
DFDL Schemas
- ldr.validateSchema(schemaSource) // validate as XSD (catches UPA
errors for example)
- } catch {
+ try loader.validateAsXMLSchema(schemaSource) // validate as XSD (catches
UPA errors for example)
+ catch {
Review comment:
Please clarify whether the number of passes has changed. Steve thinks
it hasn't changed and I think he's right, but I wanted to make sure.
##########
File path:
daffodil-tdml-processor/src/main/scala/org/apache/daffodil/tdml/processor/DaffodilTDMLDFDLProcessor.scala
##########
@@ -159,14 +163,18 @@ final class TDMLDFDLProcessorFactory private (
override def getProcessor(
schemaSource: DaffodilSchemaSource,
useSerializedProcessor: Boolean,
- optRootName: Option[String] = None,
- optRootNamespace: Option[String] = None): TDML.CompileResult = {
+ optRootName: Option[String],
+ optRootNamespace: Option[String],
+ tunables: Map[String, String]): TDML.CompileResult = {
Review comment:
I'm agnostic whether to keep the cache a global singleton or make it
part of the testsuite/runner object, but if you (Mike) think there's some merit
to the idea, you can create a Jira bug to remember it as a future improvement
idea.
##########
File path:
daffodil-cli/src/it/scala/org/apache/daffodil/debugger/TestCLIDebugger.scala
##########
@@ -1265,20 +1265,29 @@ class TestCLIdebugger {
shell.sendLine("step")
shell.sendLine("step")
shell.sendLine("step")
- shell.sendLine("step")
- shell.sendLine("step")
+ //
+ // NOTE FOR REVIEWER
+ //
+ // This test had to be modified to work with this change set.
+ // I have no idea why this change set would affect behavior
+ // of this kind.
+ //
Review comment:
Good idea to check serialization changes, too bad it wasn't the real
reason so that we could take these comments out and document the real reason in
the commit message instead.
##########
File path:
daffodil-tdml-processor/src/main/scala/org/apache/daffodil/tdml/processor/DaffodilTDMLDFDLProcessor.scala
##########
@@ -337,15 +340,18 @@ class DaffodilTDMLDFDLProcessor private (private var dp:
DataProcessor) extends
new DaffodilTDMLParseResult(actual, outputter)
}
- def doUnparseWithBothApis(dpInputter: TDMLInfosetInputter, saxInputStream:
java.io.InputStream,
+ def doUnparseWithBothApis(
+ dpInputter: TDMLInfosetInputter,
+ saxInputStream: java.io.InputStream,
dpOutputStream: java.io.OutputStream): DaffodilTDMLUnparseResult = {
val dpOutputChannel = java.nio.channels.Channels.newChannel(dpOutputStream)
val saxOutputStream = new ByteArrayOutputStream
val saxOutputChannel =
java.nio.channels.Channels.newChannel(saxOutputStream)
val unparseContentHandler = dp.newContentHandlerInstance(saxOutputChannel)
unparseContentHandler.enableInputterResolutionOfRelativeInfosetBlobURIs()
- val xmlReader = SAXParserFactory.newInstance.newSAXParser.getXMLReader
+ val xmlReader = DaffodilSAXParserFactory().newSAXParser.getXMLReader
+ XMLUtils.setSecureDefaults(xmlReader)
Review comment:
No, this XMLUtils.setSecureDefaults call shouldn't be needed anymore.
##########
File path:
daffodil-core/src/main/scala/org/apache/daffodil/dsom/DFDLSchemaFile.scala
##########
@@ -117,11 +117,11 @@ final class DFDLSchemaFile(
val ldr = new DaffodilXMLLoader(this)
//
// We do not want to validate here ever, because we have to examine the
- // root xs:schema eleemnt of a schema to decide if it is a DFDL schema
+ // root xs:schema element of a schema to decide if it is a DFDL schema
// at all that we're even supposed to compile.
//
- ldr.setValidation(false)
- val node = ldr.load(schemaSource)
+ val node = ldr.load(schemaSource, None,
+ addPositionAttributes = true) // need line numbers for diagnostics
schemaDefinitionUnless(node != null, "Unable to load XML from %s.",
diagnosticDebugName)
Review comment:
I suggest renaming the var ldr to var loader, moving it from line 117 to
line 123 so it follows the big comment section, unwrapping the loader.load call
back to one line, and moving the "// need line numbers for diagnostics" to a
new line above it.
##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -803,14 +802,28 @@ object Main extends Logging {
case Left(bytes) => new ByteArrayInputStream(bytes)
case Right(is) => is
}
- scala.xml.XML.load(is)
+ val parser: SAXParser = {
+ val f = DaffodilSAXParserFactory()
+ f.setNamespaceAware(false)
+ val p = f.newSAXParser()
+ val xrdr = p.getXMLReader
+ p
+ }
+ scala.xml.XML.withSAXParser(parser).load(is)
Review comment:
We don't need the getXMLReader call on line 809 anymore.
##########
File path:
daffodil-schematron/src/main/scala/org/apache/daffodil/validation/schematron/Schematron.scala
##########
@@ -59,7 +59,9 @@ object Schematron {
throw ValidatorInitializationException(s"Error setting feature on
parser: ${ex.getMessage}")
}
f.setValidating(false)
- f.newSAXParser.getXMLReader
+ val xr = f.newSAXParser.getXMLReader
+ XMLUtils.setSecureDefaults(xr)
+ xr
Review comment:
Can we call DaffodilSAXParserFactory().newInstance instead of
SAXParserFactory.newInstance on line 54 and revert the change replacing line 62
with lines 62-64?
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
##########
@@ -538,47 +398,223 @@ trait SchemaAwareLoaderMixin {
* do any validation either), however, once Daffodil starts processing the
* DFDL schema nodes, it resolves references using the same one true XML
catalog resolver.
*/
-class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler) {
+class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler)
+ extends NoBindingFactoryAdapter {
def this() = this(RethrowSchemaErrorHandler)
- def xercesAdapter = new DFDLXercesAdapter(errorHandler)
+ private def resolver = DFDLCatalogResolver.get
- //
- // Controls whether we setup Xerces for validation or not.
- //
- final var doValidation: Boolean = true
+ /**
+ * UPA errors are detected by xerces if the schema-full-checking feature is
+ * turned on, AND if you inform xerces that it is reading an XML Schema
+ * (i.e., xsd).
+ *
+ * Detecting these requires that we do THREE passes
+ * 1) load the DFDL schema as an XML document. This validates it against the
XML Schema
+ * for DFDL schemas.
+ * 2) load the DFDL schema as an XSD - xerces then does lots of more
intensive checking
+ * of the schema
+ * 3) load the schema for our own consumption by Daffodil code. This uses the
+ * constructing parser so as to preserve CDATA regions (xerces just does the
wrong
+ * thing with those,...fatally so). Then our own semantic checking is
performed
+ * as part of compiling the DFDL schema.
+ *
+ * Checks like UPA are in step (2) above. They are coded algorithmically
+ * right into Xerces. This is accomplished by
+ * using the below SchemaFactory and SchemaFactory.newSchema calls. The
+ * newSchema call is what forces schema validation to take place.
+ */
+ private lazy val schemaFactory = {
+ val sf = new org.apache.xerces.jaxp.validation.XMLSchemaFactory()
+ sf.setResourceResolver(resolver)
+ sf.setErrorHandler(errorHandler)
+ sf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true)
+ sf.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true)
+ // These are not recognized by a schemaFactory
+ // sf.setFeature("http://xml.org/sax/features/validation", true)
+ // sf.setFeature("http://apache.org/xml/features/validation/schema", true)
+ //
sf.setFeature("http://apache.org/xml/features/validation/schema-full-checking",
true)
+ sf
+ }
- def setValidation(flag: Boolean): Unit = {
- doValidation = flag
+ /**
+ * This loads the DFDL schema as an XML Schema. This will
+ * check many more things (ex: UPA) about the DFDL schema other than
+ * just whether it validates against the XML Schema for DFDL schemas.
+ *
+ * Unfortunately, we don't have control over how Xerces loads up these
schemas
+ * (other than the resolver anyway), so we can't really redirect the way
+ * it issues error messages so that it properly lays blame at say, the
schema fragments
+ * inside an embedded schema of a TDML file.
+ *
+ * So if we want good file/line/column info from this, we have to give
+ * it a plain old file or resource, and not try to play games to get it to
+ * pick up the file/line/col information from attributes of the elements.
+ *
+ * Due to limitations in the xerces newSchema() method
+ * this method should be called only after loading
+ * the schema as a regular XML file, which itself insists
+ * on the XMLUtils.setSecureDefaults, so we don't need to
+ * further check that here.
Review comment:
Is this comment (which mentions XMLUtils.setSecureDefaults) still up to
date or should it say DaffodilSAXParserFactory instead?
##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -1484,7 +1498,7 @@ object Main extends Logging {
private def unparseWithSAX(
is: InputStream,
contentHandler: DFDL.DaffodilUnparseContentHandler): UnparseResult = {
- val xmlReader = SAXParserFactory.newInstance.newSAXParser.getXMLReader
+ val xmlReader: XMLReader =
DaffodilSAXParserFactory().newSAXParser.getXMLReader
Review comment:
Do we still need the explicit XMLReader type?
##########
File path:
daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java
##########
@@ -1143,6 +1167,7 @@ public void testJavaAPI23() throws IOException,
ClassNotFoundException {
try {
org.xml.sax.XMLReader unparseXMLReader =
javax.xml.parsers.SAXParserFactory.newInstance()
.newSAXParser().getXMLReader();
+ setSecureDefaults(unparseXMLReader);
Review comment:
Likewise, DaffodilSAXParserFactory.newInstance or
XMLUtils.setSecureDefaults might be better here.
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
##########
@@ -538,47 +398,223 @@ trait SchemaAwareLoaderMixin {
* do any validation either), however, once Daffodil starts processing the
* DFDL schema nodes, it resolves references using the same one true XML
catalog resolver.
*/
-class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler) {
+class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler)
+ extends NoBindingFactoryAdapter {
def this() = this(RethrowSchemaErrorHandler)
- def xercesAdapter = new DFDLXercesAdapter(errorHandler)
+ private def resolver = DFDLCatalogResolver.get
- //
- // Controls whether we setup Xerces for validation or not.
- //
- final var doValidation: Boolean = true
+ /**
+ * UPA errors are detected by xerces if the schema-full-checking feature is
+ * turned on, AND if you inform xerces that it is reading an XML Schema
+ * (i.e., xsd).
+ *
+ * Detecting these requires that we do THREE passes
+ * 1) load the DFDL schema as an XML document. This validates it against the
XML Schema
+ * for DFDL schemas.
+ * 2) load the DFDL schema as an XSD - xerces then does lots of more
intensive checking
+ * of the schema
+ * 3) load the schema for our own consumption by Daffodil code. This uses the
+ * constructing parser so as to preserve CDATA regions (xerces just does the
wrong
+ * thing with those,...fatally so). Then our own semantic checking is
performed
+ * as part of compiling the DFDL schema.
+ *
+ * Checks like UPA are in step (2) above. They are coded algorithmically
+ * right into Xerces. This is accomplished by
+ * using the below SchemaFactory and SchemaFactory.newSchema calls. The
+ * newSchema call is what forces schema validation to take place.
+ */
+ private lazy val schemaFactory = {
+ val sf = new org.apache.xerces.jaxp.validation.XMLSchemaFactory()
+ sf.setResourceResolver(resolver)
+ sf.setErrorHandler(errorHandler)
+ sf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true)
+ sf.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true)
+ // These are not recognized by a schemaFactory
+ // sf.setFeature("http://xml.org/sax/features/validation", true)
+ // sf.setFeature("http://apache.org/xml/features/validation/schema", true)
+ //
sf.setFeature("http://apache.org/xml/features/validation/schema-full-checking",
true)
+ sf
+ }
- def setValidation(flag: Boolean): Unit = {
- doValidation = flag
+ /**
+ * This loads the DFDL schema as an XML Schema. This will
+ * check many more things (ex: UPA) about the DFDL schema other than
+ * just whether it validates against the XML Schema for DFDL schemas.
+ *
+ * Unfortunately, we don't have control over how Xerces loads up these
schemas
+ * (other than the resolver anyway), so we can't really redirect the way
+ * it issues error messages so that it properly lays blame at say, the
schema fragments
+ * inside an embedded schema of a TDML file.
+ *
+ * So if we want good file/line/column info from this, we have to give
+ * it a plain old file or resource, and not try to play games to get it to
+ * pick up the file/line/col information from attributes of the elements.
+ *
+ * Due to limitations in the xerces newSchema() method
+ * this method should be called only after loading
+ * the schema as a regular XML file, which itself insists
+ * on the XMLUtils.setSecureDefaults, so we don't need to
+ * further check that here.
+ */
+ def validateAsDFDLSchema(source: DaffodilSchemaSource): Unit = {
+ // first we load it, with validation explicitly against the
+ // schema for DFDL Schemas.
+ load(source, Some(XMLUtils.schemaForDFDLSchemas), addPositionAttributes =
true)
+ //
+ // Then we validate explicitly so Xerces can check things
+ // such as for UPA violations
+ //
+ val inputSource = source.newInputSource()
+ val saxSource = new SAXSource(inputSource)
+ //
+ // We would like this saxSource to be created from an XMLReader
+ // so that we can call XMLUtils.setSecureDefaults on it.
+ // but we get strange errors if I do that, where every symbol
+ // in the schema has an unrecognized namespace prefix.
+ //
Review comment:
Likewise, is this comment still up to date?
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/api/DaffodilSchemaSource.scala
##########
@@ -112,7 +118,7 @@ class URISchemaSource protected (val fileOrResource: URI)
extends DaffodilSchema
/**
* For stdin, or other anonymous pipe-like source of schema.
*/
-case class InputStreamSchemaSource(is: java.io.InputStream, tmpDir:
Option[File], blameName: String, extension: String) extends
DaffodilSchemaSource {
+class InputStreamSchemaSource(is: java.io.InputStream, tmpDir: Option[File],
blameName: String, extension: String) extends DaffodilSchemaSource {
Review comment:
Curious, why was `case` removed?
##########
File path:
daffodil-lib/src/test/scala/org/apache/daffodil/util/TestXMLCatalogAndValidate.scala
##########
@@ -400,21 +407,14 @@ class SchemaAwareFactoryAdapter()
//System.err.println("Creating " + getClass().getName())
val res = new MyResolver()
//System.err.println("Creating parser")
- val f = SAXParserFactory.newInstance()
+ val f: SAXParserFactory = DaffodilSAXParserFactory()
Review comment:
Might not want to import javax.xml.parsers.SAXParserFactory only to use
it as a type here (inference should work fine).
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/validation/XercesValidator.scala
##########
@@ -89,28 +82,42 @@ class XercesValidator(schemaFileNames: Seq[String])
private val validator = new ThreadLocal[XercesValidatorImpl] {
override def initialValue(): XercesValidatorImpl =
- initializeValidator(schema.newValidator(), resolver)
+ initializeValidator(schema.newValidator, resolver)
}
- def validateXML(document: java.io.InputStream): ValidationResult = {
+ def validateXML(document: java.io.InputStream) =
+ validateXML(document, new XercesErrorHandler)
Review comment:
IDEA often tells me that a def method should have an explicit return
type.
##########
File path:
daffodil-sapi/src/test/scala/org/apache/daffodil/example/TestScalaAPI.scala
##########
@@ -1027,6 +1064,7 @@ class TestScalaAPI {
// prep for SAX unparse
val unparseContentHandler = dp.newContentHandlerInstance(wbc)
val unparseXMLReader =
javax.xml.parsers.SAXParserFactory.newInstance.newSAXParser.getXMLReader
+ setSecureDefaults(unparseXMLReader)
Review comment:
Maybe call DaffodilSAXParserFactory and remove setSecureDefaults?
##########
File path:
daffodil-sapi/src/test/scala/org/apache/daffodil/example/TestScalaAPI.scala
##########
@@ -47,11 +47,33 @@ import
org.apache.daffodil.sapi.DaffodilUnhandledSAXException
import org.apache.daffodil.sapi.DaffodilUnparseErrorSAXException
import org.apache.daffodil.sapi.SAXErrorHandlerForSAPITest
import org.apache.daffodil.sapi.infoset.XMLTextInfosetOutputter
+import org.xml.sax.XMLReader
import java.nio.ByteBuffer
+import javax.xml.XMLConstants
+
+object TestScalaAPI {
+ /**
+ * Best practices for XML loading are to turn off anything that could lead to
+ * insecurity.
+ *
+ * This is probably unnecessary in the case of these tests, but as these
tests
+ * are also used to illustrate API usage, this exemplifies best practice.
+ */
+ def setSecureDefaults(xmlReader: XMLReader): Unit = {
+ xmlReader.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true)
+ // since we're not really sure what they mean by secure processing
+ // we make doubly sure by setting these ourselves also.
+
xmlReader.setFeature("http://apache.org/xml/features/disallow-doctype-decl",
true)
+
xmlReader.setFeature("http://xml.org/sax/features/external-parameter-entities",
false)
+
xmlReader.setFeature("http://xml.org/sax/features/external-general-entities",
false)
+ }
+}
Review comment:
Same thought here as in Java API test - should we show how to define
setSecureDefaults or better yet, show how to call DaffodilSAXParserFactory?
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala
##########
@@ -86,6 +105,34 @@ class DaffodilConstructingLoader(uri: URI, errorHandler:
org.xml.sax.ErrorHandle
Source.fromInputStream(is, enc)
}, true) {
+ /**
+ * Public constructor insists on normalizingCRLFtoLF behavior.
+ */
+ def this (uri: URI,
+ errorHandler: org.xml.sax.ErrorHandler,
+ addPositionAttributes: Boolean = false) =
+ this(uri, errorHandler, addPositionAttributes, normalizeCRLFtoLF = true)
+
Review comment:
Doesn't public constructor have a test calling it? Codecov says line
114 isn't covered by tests.
##########
File path:
daffodil-japi/src/main/java/org/apache/daffodil/japi/package-info.java
##########
@@ -257,7 +257,7 @@
* WritableByteChannel wbc = java.nio.channels.Channels.newChannel(os);
* DaffodilUnparseContentHandler unparseContentHandler =
dp.newContentHandlerInstance(wbc);
* try {
- * XMLReader xmlReader =
SAXParserFactory.newInstance().newSAXParser().getXMLReader();
+ * XMLReader xmlReader =
SAXParserFactory.newInstance().newSAXParser().getXMLReader;
Review comment:
Does Java code still need these parentheses after getXMLReader?
##########
File path: project/Dependencies.scala
##########
@@ -48,7 +48,7 @@ object Dependencies {
lazy val test = Seq(
"junit" % "junit" % "4.13.2" % "it,test",
"com.novocode" % "junit-interface" % "0.11" % "it,test",
- "org.scalacheck" %% "scalacheck" % "1.15.4" % "it,test"
+ "org.scalacheck" %% "scalacheck" % "1.15.4" % "it,test",
)
Review comment:
I don't mind adding a trailing comma to the last item in a Seq (Scala
explicitly [allows](https://docs.scala-lang.org/sips/trailing-commas.html) it
for good reasons), but the previous Seq's don't use a trailing comma so we've
introduced an inconsistency. Maybe fix all the other Seqs too?
##########
File path:
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala
##########
@@ -2294,29 +2361,60 @@ sealed abstract class DocumentPart(part: Node, parent:
Document) {
}
case class Infoset(i: NodeSeq, parent: TestCase) {
- val Seq(dfdlInfoset) = (i \ "dfdlInfoset").map { node => DFDLInfoset(node,
this) }
- val contents = dfdlInfoset.contents
+ lazy val Seq(dfdlInfoset) = (i \ "dfdlInfoset").map { node =>
DFDLInfoset(node, this) }
+ lazy val contents = dfdlInfoset.contents
}
case class DFDLInfoset(di: Node, parent: Infoset) {
- val infosetNodeSeq = {
- (di \ "@type").toString match {
- case "infoset" | "" => di.child.filter {
- _.isInstanceOf[scala.xml.Elem]
- }
- case "file" => {
- val path = di.text.trim()
- val maybeURI = parent.parent.parent.findTDMLResource(path)
- val uri = maybeURI.getOrElse(throw new
FileNotFoundException("TDMLRunner: infoset file '" + path + "' was not found"))
- val elem = scala.xml.XML.load(uri.toURL)
- elem
+ private lazy val infosetNodeSeq = {
+ val testCase: TestCase = parent.parent
+ val loader = testCase.parent.loader
+ val optDataSchema: Option[URI] = {
+
testCase.optSchemaFileURI.orElse(testCase.optEmbeddedSchema.map{_.uriForLoading})
+ }
+ val src =
+ (di \ "@type").toString match {
+ case "infoset" | "" => {
+ val rawElem = di.child.filter {
+ _.isInstanceOf[scala.xml.Elem]
+ }.head
+ UnitTestSchemaSource(rawElem, testCase.tcName)
+ }
+ case "file" => {
+ val path = di.text.trim()
+ val maybeURI = parent.parent.parent.findTDMLResource(path)
+ val uri = maybeURI.getOrElse(throw new
FileNotFoundException("TDMLRunner: infoset file '" + path + "' was not found"))
+ URISchemaSource(uri)
+ }
+ case value => Assert.abort("Uknown value for type attribute on
dfdlInfoset: " + value)
}
- case value => Assert.abort("Uknown value for type attribute on
dfdlInfoset: " + value)
+ //
+ // TODO: DAFFODIL-288 validate the infoset also
+ // You can pass the optDataSchema, which appears to be the correct thing
+ // but in many cases it doesn't seem to be able to resolve things.
+ //
+ val testSuite = testCase.parent
+ val before = testSuite.loadingExceptions.clone()
+ // val elem: Node = loader.load(src, optDataSchema)
+ val elem = loader.load(src, None) // no schema
Review comment:
Nope, don't do it now, do it later (smaller PR will be easier to review
next time).
##########
File path:
daffodil-cli/src/it/scala/org/apache/daffodil/debugger/TestCLIDebugger.scala
##########
@@ -1265,20 +1265,29 @@ class TestCLIdebugger {
shell.sendLine("step")
shell.sendLine("step")
shell.sendLine("step")
- shell.sendLine("step")
- shell.sendLine("step")
+ //
+ // NOTE FOR REVIEWER
+ //
+ // This test had to be modified to work with this change set.
+ // I have no idea why this change set would affect behavior
+ // of this kind.
+ //
Review comment:
Good idea to check serialization changes, too bad it wasn't the real
reason so that we could take these comments out and document the real reason in
the commit message instead.
##########
File path:
daffodil-sapi/src/test/scala/org/apache/daffodil/example/TestScalaAPI.scala
##########
@@ -1107,6 +1145,7 @@ class TestScalaAPI {
val unparseContentHandler = dp.newContentHandlerInstance(wbc)
val errorHandler = new SAXErrorHandlerForSAPITest()
val unparseXMLReader =
javax.xml.parsers.SAXParserFactory.newInstance.newSAXParser.getXMLReader
+ setSecureDefaults(unparseXMLReader)
Review comment:
Maybe call DaffodilSAXParserFactory and remove setSecureDefaults?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]