stevedlawrence commented on a change in pull request #560:
URL: https://github.com/apache/daffodil/pull/560#discussion_r634322364
##########
File path:
daffodil-core/src/main/scala/org/apache/daffodil/dsom/DFDLSchemaFile.scala
##########
@@ -134,20 +134,17 @@ final class DFDLSchemaFile(
lazy val isDFDLSchemaFile = iiXMLSchemaDocument.isDFDLSchema
+ private lazy val loader = new DaffodilXMLLoader(this)
+
lazy val iiXMLSchemaDocument = LV('iiXMLSchemaDocument) {
val res = loadXMLSchemaDocument(seenBefore, Some(this))
if (res.isDFDLSchema && sset.validateDFDLSchemas) {
//
// We validate DFDL schemas, only if validation is requested.
// Some things, tests generally, want to turn this validation off.
//
-
- val ldr = new DaffodilXMLLoader(this)
- ldr.setValidation(true)
- try {
- ldr.load(schemaSource) // validate as XML file with XML Schema for
DFDL Schemas
- ldr.validateSchema(schemaSource) // validate as XSD (catches UPA
errors for example)
- } catch {
+ try loader.validateAsXMLSchema(schemaSource) // validate as XSD (catches
UPA errors for example)
+ catch {
Review comment:
Just to be clear, it looks like we load the schema once in
loadXMLSchemaDocument, then we load it again to validate that it's a valid DFDL
and XML schema? I don't think this is any different that what we used to ahve,
but for large schemas I wonder if this could add noticable overhead?
##########
File path:
daffodil-core/src/test/scala/org/apache/daffodil/processor/TestSAXUtils.scala
##########
@@ -217,8 +224,10 @@ object TestSAXUtils {
val msgs = pf.getDiagnostics.map { _.getMessage() }.mkString("\n")
fail("pf compile errors: " + msgs)
}
- pf.sset.root.erd.preSerialization // force evaluation of all compile-time
constructs
+ // pf.sset.root.erd.preSerialization // force evaluation of all
compile-time constructs
Review comment:
Did this line cause issues? Seems like it shouldn't be needed, wonder
why it was here in the first itme? Maybe daffodil is missing a call to
preSerialization at the end of schema compilation to make sure everything is
evaluated?
##########
File path: daffodil-lib/src/main/resources/org/apache/daffodil/xsd/tdml.xsd
##########
@@ -129,6 +129,22 @@
<!-- We want the default to be 'unqualified' in the longer
term but
we leave it 'qualified here to prevent lots of tests from
breaking. -->
<attribute name="elementFormDefault" type="tns:elementFormDefaultType"
default="qualified"/>
+ <!-- we want the default for useDefaultNamespace to be false longer
term but
+ we leave it as true here to prevent lots of tests from breaking.
+ 81 tests to be exact. -->
+ <attribute name="useDefaultNamespace" type="xs:boolean" default="true">
+ <annotation><documentation>
+ Controls whether a xmlns="http://example.com" default namespace prefix
+ definition will be provided for the embedded schema.
+
+ This matters when the defined schema is used to validate the infoset
xml
+ before unparsing, or after parsing, as it affects the interpretation of
+ path expressions in the DFDL schema based on the
unqualifiedPathStepPolicy
+ tunable.
+
+ False is recommended since default namespaces can be confusing.
+ </documentation></annotation>
Review comment:
Is there a bug to fix all the tests so we can change the default
behavior to false?
Also, did this cause any issue for this doctype stuff in this PR?
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala
##########
@@ -179,6 +203,58 @@ class DaffodilConstructingLoader(uri: URI, errorHandler:
org.xml.sax.ErrorHandle
super.elem(pos, pre, local, newAttrs, newScope, empty, nodes)
}
+ /**
+ * To emulate the behavior of Xerces loader (standard scala loader)
+ * we have to normalize CRLF to LF, and solitary CR to LF.
+ *
+ * This is optional controlled by a constructor parameter.
+ */
+ override def text(pos: Int, txt: String): Text = {
+ val newText:String = {
+ if (normalizeCRLFtoLF) {
+ txt.
+ replaceAll("\r\n", "\n").
+ replaceAll("\r", "\n")
+ } else {
+ txt
+ }
+ }
+ //
+ // On MS-Windows it seems somehow that the TDML Runner
+ // loads expected infosets as XML, and they have CRLFs
+ // in them. Not sure how this happens.
+ //
+ if (normalizeCRLFtoLF)
+ Assert.invariant(!newText.contains("\r"))
+ super.text(pos, newText)
+ }
+
+ /**
+ * We override this to force the ConstrutingParser to process CDATA regions
+ * specially with an override-able method named cdata.
+ *
+ * Strangely, if you look at the implementation of this in the MarkupParser
+ * trait, it calls the handler for text, but then it ignores the result of
that
+ * and constructs a PCDATA node from the original text.
+ *
+ * It's possible this is a bug fix.
+ */
+ override def xCharData: NodeSeq = {
+ xToken("[CDATA[")
+ def mkResult(pos: Int, s: String): NodeSeq = {
+ val s1 = cdata(pos, s).text
+ PCData(s1)
+ }
+ xTakeUntil(mkResult, () => pos, "]]>")
+ }
Review comment:
Should this be reported to scala-xml? Would be nice if we don't need to
maintain a bug fix.
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
##########
@@ -538,47 +401,232 @@ trait SchemaAwareLoaderMixin {
* do any validation either), however, once Daffodil starts processing the
* DFDL schema nodes, it resolves references using the same one true XML
catalog resolver.
*/
-class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler) {
+class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler)
+ extends NoBindingFactoryAdapter {
def this() = this(RethrowSchemaErrorHandler)
- def xercesAdapter = new DFDLXercesAdapter(errorHandler)
+ private def resolver = DFDLCatalogResolver.get
- //
- // Controls whether we setup Xerces for validation or not.
- //
- final var doValidation: Boolean = true
+ /**
+ * UPA errors are detected by xerces if the schema-full-checking feature is
+ * turned on, AND if you inform xerces that it is reading an XML Schema
+ * (i.e., xsd).
+ *
+ * Detecting these requires that we do THREE passes
+ * 1) load the DFDL schema as an XML document. This validates it against the
XML Schema
+ * for DFDL schemas.
+ * 2) load the DFDL schema as an XSD - xerces then does lots of more
intensive checking
+ * of the schema
+ * 3) load the schema for our own consumption by Daffodil code. This uses the
+ * constructing parser so as to preserve CDATA regions (xerces just does the
wrong
+ * thing with those,...fatally so). Then our own semantic checking is
performed
+ * as part of compiling the DFDL schema.
+ *
+ * Checks like UPA are in step (2) above. They are coded algorithmically
+ * right into Xerces. This is accomplished by
+ * using the below SchemaFactory and SchemaFactory.newSchema calls. The
+ * newSchema call is what forces schema validation to take place.
+ */
+ private lazy val schemaFactory = {
+ val sf = new org.apache.xerces.jaxp.validation.XMLSchemaFactory()
+ sf.setResourceResolver(resolver)
+ sf.setErrorHandler(errorHandler)
+ sf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true)
+ sf.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true)
+ // These are not recognized by a schemaFactory
+ // sf.setFeature("http://xml.org/sax/features/validation", true)
+ // sf.setFeature("http://apache.org/xml/features/validation/schema", true)
+ //
sf.setFeature("http://apache.org/xml/features/validation/schema-full-checking",
true)
+ // sf.setFeature(XMLUtils.SAX_NAMESPACES_FEATURE, true)
+ // sf.setFeature(XMLUtils.SAX_NAMESPACE_PREFIXES_FEATURE, true)
+ sf
+ }
+
+ /**
+ * This loads the DFDL schema as an XML Schema. This will
+ * check many more things (ex: UPA) about the DFDL schema other than
+ * just whether it validates against the XML Schema for DFDL schemas.
+ *
+ * Unfortunately, we don't have control over how Xerces loads up these
schemas
+ * (other than the resolver anyway), so we can't really redirect the way
+ * it issues error messages so that it properly lays blame at say, the
schema fragments
+ * inside an embedded schema of a TDML file.
+ *
+ * So if we want good file/line/column info from this, we have to give
+ * it a plain old file or resource, and not try to play games to get it to
+ * pick up the file/line/col information from attributes of the elements.
+ *
+ * Due to limitations in the xerces newSchema() method
+ * this method should be called only after loading
+ * the schema as a regular XML file, which itself insists
+ * on the XMLUtils.setSecureDefaults, so we don't need to
+ * further check that here.
+ */
+ def validateAsXMLSchema(source: DaffodilSchemaSource): Unit = {
Review comment:
Should we rename this to validateAsDFDLSchema? The first load() call
should fail if it uses constructs not allowed in a DFDL Schema, right? The
Xerces load is just to checks for the other DFDL/XSD stuff that load() doesn't
check?
##########
File path:
daffodil-sapi/src/test/scala/org/apache/daffodil/example/TestScalaAPI.scala
##########
@@ -1027,6 +1028,7 @@ class TestScalaAPI {
// prep for SAX unparse
val unparseContentHandler = dp.newContentHandlerInstance(wbc)
val unparseXMLReader =
javax.xml.parsers.SAXParserFactory.newInstance.newSAXParser.getXMLReader
+ XMLUtils.setSecureDefaults(unparseXMLReader)
Review comment:
Same question as for the Java API. XMLUtils is technically not part of
the sapi, so perhaps shouldn't be used here, and we should explicitly set
features that make it secure.
##########
File path:
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala
##########
@@ -251,10 +243,17 @@ class DFDLTestSuite private[tdml] (
* our loader here accumulates load-time errors here on the
* test suite object.
*/
- val loader = new DaffodilXMLLoader(errorHandler)
- loader.setValidation(validateTDMLFile)
+ lazy val loader = new DaffodilXMLLoader(errorHandler)
+ lazy val optTDMLSchema =
+ if (validateTDMLFile) {
+ val TDMLSchemaURI =
Misc.getRequiredResource("org/apache/daffodil/xsd/tdml.xsd")
Review comment:
Can this xsd path be a constant in XMLUtils, similar to the dafext.xsd
and schema for dfdl schema xsd?
##########
File path:
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala
##########
@@ -264,66 +263,91 @@ class DFDLTestSuite private[tdml] (
tmpDir.mkdirs()
val src = UnitTestSchemaSource(tsNode, "", Some(tmpDir))
- loader.load(src)
+
+ loader.load(src, optTDMLSchema,
+ addPositionAttributes = true) // want line numbers for TDML
//
(tsNode, src.uriForLoading)
}
case tdmlFile: File => {
log(LogLevel.Info, "loading TDML file: %s", tdmlFile)
val uri = tdmlFile.toURI()
- val newNode = loader.load(URISchemaSource(uri))
+ val newNode = loader.load(URISchemaSource(uri), optTDMLSchema,
+ addPositionAttributes = true)
val res = (newNode, uri)
log(LogLevel.Debug, "done loading TDML file: %s", tdmlFile)
res
}
case uri: URI => {
- val newNode = loader.load(URISchemaSource(uri))
+ val newNode = loader.load(URISchemaSource(uri), optTDMLSchema,
+ addPositionAttributes = true)
val res = (newNode, uri)
res
}
case _ => Assert.usageError("not a Node, File, or URL")
} // end match
- lazy val isTDMLFileValid = !this.isLoadingError
+ lazy val ts = {
+ if (tsRaw eq null) {
+ // must have been a loader error
+ reportLoadingErrors()
+ } else {
+ tsRaw
+ }
+ }
+
+ lazy val isTDMLFileValid = {
+ (ts ne null) &&
+ !loadingExceptions.headOption.isDefined
Review comment:
Can this be `loadingExceptions.isEmpty`?
One thing to consider, loadingExceptions is mutable, so we need to be
careful to make sure we only call this after all loadingExceptions could have
been created. I suspect this isn't an actual issue sine loadingExceptions was a
var before, so still mutable. But maybe it would make senese to make this a def
to be safe?
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala
##########
@@ -86,6 +100,16 @@ class DaffodilConstructingLoader(uri: URI, errorHandler:
org.xml.sax.ErrorHandle
Source.fromInputStream(is, enc)
}, true) {
+ /**
+ * Ensures that DOCTYPES aka DTDs, if encountered, are rejected.
+ */
+ // $COVERAGE-OFF$
+ override def parseDTD(): Unit = {
+ val e = makeSAXParseException(pos, "DOCTYPE is disallowed.")
+ throw e
+ }
+ // $COVERAGE-ON$
Review comment:
Why is coverage off? Seems like this should be possible to hit with a
test?
##########
File path:
daffodil-core/src/test/scala/org/apache/daffodil/processor/TestSAXUtils.scala
##########
@@ -217,8 +224,10 @@ object TestSAXUtils {
val msgs = pf.getDiagnostics.map { _.getMessage() }.mkString("\n")
fail("pf compile errors: " + msgs)
}
- pf.sset.root.erd.preSerialization // force evaluation of all compile-time
constructs
+ // pf.sset.root.erd.preSerialization // force evaluation of all
compile-time constructs
val dp = pf.onPath("/").asInstanceOf[DataProcessor]
+ val baos = new ByteArrayOutputStream()
+ dp.save(Channels.newChannel(baos)) // insures everything serialized.
Review comment:
Yeah, feels like a bug somewhere else if we need to serialize the
DataProcessor to avoid some kind of bug?
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/api/DaffodilSchemaSource.scala
##########
@@ -44,6 +51,12 @@ sealed trait DaffodilSchemaSource {
*/
def newInputSource(): InputSource
+ /**
+ * Don't forget to close() these.
+ * @return an input stream which must be closed by the caller.
+ */
+ final def newInputStream(): InputStream = uriForLoading.toURL.openStream()
Review comment:
New function not covered by tests? Should it be removed or tests added?
##########
File path: daffodil-core/src/test/scala/org/apache/daffodil/util/TestUtils.scala
##########
@@ -85,6 +85,17 @@ object TestUtils {
runSchemaOnRBC(testSchema, Misc.stringToReadableByteChannel(data),
areTracing)
}
+ /**
+ * Exposes the data processor object so that you can test its API
conveniently.
+ */
+ def dataProcessorForTestString(testSchema: Node, data: String):
(DataProcessor, InputStream) = {
Review comment:
This function isn't used anywhere, should we remove it?
##########
File path:
daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java
##########
@@ -1008,6 +1009,7 @@ public void testJavaAPI20() throws IOException,
ClassNotFoundException {
try {
org.xml.sax.XMLReader unparseXMLReader =
javax.xml.parsers.SAXParserFactory.newInstance()
.newSAXParser().getXMLReader();
+ XMLUtils.setSecureDefaults(unparseXMLReader);
Review comment:
XMLUtils isn't really part of our public API (at least it's not in our
javadoc). I wonder if we should just manually set the features in here, so that
people using the TestJavaAPI file as correct API usage see exactly what
features are being set to make things secure?
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala
##########
@@ -68,14 +68,28 @@ object Position {
* wrap these with CDATA, and use this loader, not Xerces, then these will be
* preserved properly.
*
- * Also, enhanced to capture file/line/column info for every element and add it
+ * Also, enhanced so that when addPositionAttributes is true, it will capture
+ * file/line/column info for every element and add it
* as attributes onto each XML element.
*
* The way the constructing loader (aka ConstructingParser (for XML))
* gets positions is different. It is given just an offset into the document
file/stream,
* and it therefore must synthesize line number/col number info itself.
+ *
+ * @param uri URI for the XML to be loaded.
+ * @param errorHandler Called back on load errors.
+ * @param addPositionAttributes Use true if you want dafint:file,
+ * dafint:col, and dafint:line attributes.
+ * Defaults to false.
+ * @param normalizeCRLFtoLF Use true to emulate the scala XML load
+ * behavior of normalizing CRLF to LF, and solitary
CR to LF.
+ * Defaults to true. Should only be changed in
special circumstances
+ * as not normalizing CRLFs is non-standard for XML.
*/
-class DaffodilConstructingLoader(uri: URI, errorHandler:
org.xml.sax.ErrorHandler)
+class DaffodilConstructingLoader(uri: URI,
+ errorHandler: org.xml.sax.ErrorHandler,
+ addPositionAttributes: Boolean = false,
+ normalizeCRLFtoLF: Boolean = true)
Review comment:
Are these issues found during development of the doctype stuff? Seems
reasonable, but seems unreleated? Just want to make sure I'm not missing
something?
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala
##########
@@ -179,6 +203,58 @@ class DaffodilConstructingLoader(uri: URI, errorHandler:
org.xml.sax.ErrorHandle
super.elem(pos, pre, local, newAttrs, newScope, empty, nodes)
}
+ /**
+ * To emulate the behavior of Xerces loader (standard scala loader)
+ * we have to normalize CRLF to LF, and solitary CR to LF.
+ *
+ * This is optional controlled by a constructor parameter.
+ */
+ override def text(pos: Int, txt: String): Text = {
+ val newText:String = {
+ if (normalizeCRLFtoLF) {
+ txt.
+ replaceAll("\r\n", "\n").
+ replaceAll("\r", "\n")
+ } else {
+ txt
+ }
+ }
+ //
+ // On MS-Windows it seems somehow that the TDML Runner
+ // loads expected infosets as XML, and they have CRLFs
+ // in them. Not sure how this happens.
Review comment:
This is probably because of the autocrlf issue? Git for windows replaces
unix LF's with windows CRLF by default. So all TDML files are going to have
CRLF's in them on windows.
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/validation/XercesValidator.scala
##########
@@ -89,28 +82,35 @@ class XercesValidator(schemaFileNames: Seq[String])
private val validator = new ThreadLocal[XercesValidatorImpl] {
override def initialValue(): XercesValidatorImpl =
- initializeValidator(schema.newValidator(), resolver)
+ initializeValidator(schema.newValidator, resolver)
}
- def validateXML(document: java.io.InputStream): ValidationResult = {
+ def validateXML(document: java.io.InputStream) =
+ validateXML(document, new XercesErrorHandler)
+
+ def validateXML(
+ document: java.io.InputStream,
+ eh: ErrorHandler): ValidationResult = {
+
val documentSource = new StreamSource(document)
// get the validator instance for this thread
val xv = validator.get()
- // create a new error handler for this execution
- val eh = new XercesErrorHandler
xv.setErrorHandler(eh)
// validate the document
xv.validate(documentSource)
- // error handler contents as daffodil result
- ValidationResult(eh.warnings, eh.errors)
+ eh match {
+ case xeh: XercesErrorHandler => ValidationResult(xeh.warnings,
xeh.errors)
+ case _ => ValidationResult.empty
+ }
Review comment:
Can you explain this change? If we validate with Xerces, we now have the
option to pass in a different ErrorHandler. But if we pass in something other
than a XercesErrorHandler, then we don't get any validation results. Why would
we ever want to pass in something other than a XercesErrorHandler?
##########
File path:
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala
##########
@@ -2294,29 +2361,60 @@ sealed abstract class DocumentPart(part: Node, parent:
Document) {
}
case class Infoset(i: NodeSeq, parent: TestCase) {
- val Seq(dfdlInfoset) = (i \ "dfdlInfoset").map { node => DFDLInfoset(node,
this) }
- val contents = dfdlInfoset.contents
+ lazy val Seq(dfdlInfoset) = (i \ "dfdlInfoset").map { node =>
DFDLInfoset(node, this) }
+ lazy val contents = dfdlInfoset.contents
}
case class DFDLInfoset(di: Node, parent: Infoset) {
- val infosetNodeSeq = {
- (di \ "@type").toString match {
- case "infoset" | "" => di.child.filter {
- _.isInstanceOf[scala.xml.Elem]
- }
- case "file" => {
- val path = di.text.trim()
- val maybeURI = parent.parent.parent.findTDMLResource(path)
- val uri = maybeURI.getOrElse(throw new
FileNotFoundException("TDMLRunner: infoset file '" + path + "' was not found"))
- val elem = scala.xml.XML.load(uri.toURL)
- elem
+ private lazy val infosetNodeSeq = {
+ val testCase: TestCase = parent.parent
+ val loader = testCase.parent.loader
+ val optDataSchema: Option[URI] = {
+
testCase.optSchemaFileURI.orElse(testCase.optEmbeddedSchema.map{_.uriForLoading})
+ }
+ val src =
+ (di \ "@type").toString match {
+ case "infoset" | "" => {
+ val rawElem = di.child.filter {
+ _.isInstanceOf[scala.xml.Elem]
+ }.head
+ UnitTestSchemaSource(rawElem, testCase.tcName)
+ }
+ case "file" => {
+ val path = di.text.trim()
+ val maybeURI = parent.parent.parent.findTDMLResource(path)
+ val uri = maybeURI.getOrElse(throw new
FileNotFoundException("TDMLRunner: infoset file '" + path + "' was not found"))
+ URISchemaSource(uri)
+ }
+ case value => Assert.abort("Uknown value for type attribute on
dfdlInfoset: " + value)
}
- case value => Assert.abort("Uknown value for type attribute on
dfdlInfoset: " + value)
+ //
+ // TODO: DAFFODIL-288 validate the infoset also
+ // You can pass the optDataSchema, which appears to be the correct thing
+ // but in many cases it doesn't seem to be able to resolve things.
+ //
+ val testSuite = testCase.parent
+ val before = testSuite.loadingExceptions.clone()
+ // val elem: Node = loader.load(src, optDataSchema)
+ val elem = loader.load(src, None) // no schema
+
+ // The expected infoset is loaded using the normalizeCRLFtoLF mode (which
+ // is the default), so MS-DOS/Windows CRLFs in expected data XML files will
+ // have been converted into LF at this point.
+ if (elem ne null)
+ Assert.invariant(!elem.toString.contains("\r"))
Review comment:
Looks like this is the line that's failing in some windows tests? Yeah,
not clear to me why CR would still exist at this point.
##########
File path:
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala
##########
@@ -1751,13 +1796,26 @@ case class DefinedSchema(xml: Node, parent:
DFDLTestSuite) {
* */
final val DEFAULT_ELEMENT_FORM_DEFAULT_VALUE = "qualified"
+ // TODO: We want this default to be false eventually.
+ // The rationale is if people want a default namespace
+ // in their TDML defineSchemas, they should define it. We shouldn't tack
+ // such a sensitive thing on automatically. (81 tests fail in daffodil-test
+ // if you change this however.)
+ //
+ final val DEFAULT_USE_DEFAULT_NAMESPACE_VALUE = true
+
val name = (xml \ "@name").text.toString
val elementFormDefault = {
val value = (xml \ "@elementFormDefault").text.toString
if (value == "") DEFAULT_ELEMENT_FORM_DEFAULT_VALUE
else value
}
+ val useDefaultNamespace = {
+ val value = (xml \ "@useDefaultNamespace").text.toString
+ if (value == "") DEFAULT_USE_DEFAULT_NAMESPACE_VALUE
+ else value.toBoolean
Review comment:
Might be worth adding a test that uses this to make sure it works.
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
##########
@@ -538,47 +401,232 @@ trait SchemaAwareLoaderMixin {
* do any validation either), however, once Daffodil starts processing the
* DFDL schema nodes, it resolves references using the same one true XML
catalog resolver.
*/
-class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler) {
+class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler)
+ extends NoBindingFactoryAdapter {
def this() = this(RethrowSchemaErrorHandler)
- def xercesAdapter = new DFDLXercesAdapter(errorHandler)
+ private def resolver = DFDLCatalogResolver.get
- //
- // Controls whether we setup Xerces for validation or not.
- //
- final var doValidation: Boolean = true
+ /**
+ * UPA errors are detected by xerces if the schema-full-checking feature is
+ * turned on, AND if you inform xerces that it is reading an XML Schema
+ * (i.e., xsd).
+ *
+ * Detecting these requires that we do THREE passes
+ * 1) load the DFDL schema as an XML document. This validates it against the
XML Schema
+ * for DFDL schemas.
+ * 2) load the DFDL schema as an XSD - xerces then does lots of more
intensive checking
+ * of the schema
+ * 3) load the schema for our own consumption by Daffodil code. This uses the
+ * constructing parser so as to preserve CDATA regions (xerces just does the
wrong
+ * thing with those,...fatally so). Then our own semantic checking is
performed
+ * as part of compiling the DFDL schema.
+ *
+ * Checks like UPA are in step (2) above. They are coded algorithmically
+ * right into Xerces. This is accomplished by
+ * using the below SchemaFactory and SchemaFactory.newSchema calls. The
+ * newSchema call is what forces schema validation to take place.
+ */
+ private lazy val schemaFactory = {
+ val sf = new org.apache.xerces.jaxp.validation.XMLSchemaFactory()
+ sf.setResourceResolver(resolver)
+ sf.setErrorHandler(errorHandler)
+ sf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true)
+ sf.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true)
+ // These are not recognized by a schemaFactory
+ // sf.setFeature("http://xml.org/sax/features/validation", true)
+ // sf.setFeature("http://apache.org/xml/features/validation/schema", true)
+ //
sf.setFeature("http://apache.org/xml/features/validation/schema-full-checking",
true)
+ // sf.setFeature(XMLUtils.SAX_NAMESPACES_FEATURE, true)
+ // sf.setFeature(XMLUtils.SAX_NAMESPACE_PREFIXES_FEATURE, true)
Review comment:
I'm not sure if it matters if these two features are set or not since
the results should be the same regardless of the values. But they I believe
these two are the only two features that all XMLReaders must support. I guess
maybe a SchemaFactory is different from an XMLReader, and so maybe it's not
bound by those requirments?
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
##########
@@ -538,47 +401,232 @@ trait SchemaAwareLoaderMixin {
* do any validation either), however, once Daffodil starts processing the
* DFDL schema nodes, it resolves references using the same one true XML
catalog resolver.
*/
-class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler) {
+class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler)
+ extends NoBindingFactoryAdapter {
def this() = this(RethrowSchemaErrorHandler)
- def xercesAdapter = new DFDLXercesAdapter(errorHandler)
+ private def resolver = DFDLCatalogResolver.get
- //
- // Controls whether we setup Xerces for validation or not.
- //
- final var doValidation: Boolean = true
+ /**
+ * UPA errors are detected by xerces if the schema-full-checking feature is
+ * turned on, AND if you inform xerces that it is reading an XML Schema
+ * (i.e., xsd).
+ *
+ * Detecting these requires that we do THREE passes
+ * 1) load the DFDL schema as an XML document. This validates it against the
XML Schema
+ * for DFDL schemas.
+ * 2) load the DFDL schema as an XSD - xerces then does lots of more
intensive checking
+ * of the schema
+ * 3) load the schema for our own consumption by Daffodil code. This uses the
+ * constructing parser so as to preserve CDATA regions (xerces just does the
wrong
+ * thing with those,...fatally so). Then our own semantic checking is
performed
+ * as part of compiling the DFDL schema.
+ *
+ * Checks like UPA are in step (2) above. They are coded algorithmically
+ * right into Xerces. This is accomplished by
+ * using the below SchemaFactory and SchemaFactory.newSchema calls. The
+ * newSchema call is what forces schema validation to take place.
+ */
+ private lazy val schemaFactory = {
+ val sf = new org.apache.xerces.jaxp.validation.XMLSchemaFactory()
+ sf.setResourceResolver(resolver)
+ sf.setErrorHandler(errorHandler)
+ sf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true)
+ sf.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true)
+ // These are not recognized by a schemaFactory
+ // sf.setFeature("http://xml.org/sax/features/validation", true)
+ // sf.setFeature("http://apache.org/xml/features/validation/schema", true)
+ //
sf.setFeature("http://apache.org/xml/features/validation/schema-full-checking",
true)
+ // sf.setFeature(XMLUtils.SAX_NAMESPACES_FEATURE, true)
+ // sf.setFeature(XMLUtils.SAX_NAMESPACE_PREFIXES_FEATURE, true)
+ sf
+ }
+
+ /**
+ * This loads the DFDL schema as an XML Schema. This will
+ * check many more things (ex: UPA) about the DFDL schema other than
+ * just whether it validates against the XML Schema for DFDL schemas.
+ *
+ * Unfortunately, we don't have control over how Xerces loads up these
schemas
+ * (other than the resolver anyway), so we can't really redirect the way
+ * it issues error messages so that it properly lays blame at say, the
schema fragments
+ * inside an embedded schema of a TDML file.
+ *
+ * So if we want good file/line/column info from this, we have to give
+ * it a plain old file or resource, and not try to play games to get it to
+ * pick up the file/line/col information from attributes of the elements.
+ *
+ * Due to limitations in the xerces newSchema() method
+ * this method should be called only after loading
+ * the schema as a regular XML file, which itself insists
+ * on the XMLUtils.setSecureDefaults, so we don't need to
+ * further check that here.
+ */
+ def validateAsXMLSchema(source: DaffodilSchemaSource): Unit = {
+ // first we load it, with validation explicitly against the
+ // schema for DFDL Schemas.
+ load(source, Some(XMLUtils.schemaForDFDLSchemas), addPositionAttributes =
true)
+ //
+ // Then we validate explicitly so Xerces can check things
+ // such as for UPA violations
+ //
+ val inputSource = source.newInputSource()
+ val saxSource = new SAXSource(inputSource)
+ //
+ // We would like this saxSource to be created from an XMLReader
+ // so that we can call XMLUtils.setSecureDefaults on it.
+ // but we get strange errors if I do that, where every symbol
+ // in the schema has an unrecognized namespace prefix.
+ //
+ schemaFactory.newSchema(saxSource)
+ inputSource.getByteStream().close()
+ }
- def setValidation(flag: Boolean): Unit = {
- doValidation = flag
+ // $COVERAGE-OFF$
+ override def parser = {
+ Assert.usageError("not to be called.")
}
+ // $COVERAGE-ON$
/**
+ * Obtain and initialize parser which validates the schema is defined.
+ */
+ private def parserFromURI(optSchemaURI: Option[URI]): SAXParser = {
+ if (optSchemaURI.isEmpty) noSchemaParser
+ else {
+ val f = parserFactory()
+ val schema = schemaFromURI(optSchemaURI)
+ f.setSchema(schema)
+ parserFromFactory(f)
+ }
+ }
+
+ private def schemaFromURI(optSchemaURI: Option[URI]): Schema = {
Review comment:
Should this parameter just be a URI instead of an Option[URI]? Seems
like calling this as a None should be an error?
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
##########
@@ -538,47 +401,232 @@ trait SchemaAwareLoaderMixin {
* do any validation either), however, once Daffodil starts processing the
* DFDL schema nodes, it resolves references using the same one true XML
catalog resolver.
*/
-class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler) {
+class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler)
+ extends NoBindingFactoryAdapter {
def this() = this(RethrowSchemaErrorHandler)
- def xercesAdapter = new DFDLXercesAdapter(errorHandler)
+ private def resolver = DFDLCatalogResolver.get
- //
- // Controls whether we setup Xerces for validation or not.
- //
- final var doValidation: Boolean = true
+ /**
+ * UPA errors are detected by xerces if the schema-full-checking feature is
+ * turned on, AND if you inform xerces that it is reading an XML Schema
+ * (i.e., xsd).
+ *
+ * Detecting these requires that we do THREE passes
+ * 1) load the DFDL schema as an XML document. This validates it against the
XML Schema
+ * for DFDL schemas.
+ * 2) load the DFDL schema as an XSD - xerces then does lots of more
intensive checking
+ * of the schema
+ * 3) load the schema for our own consumption by Daffodil code. This uses the
+ * constructing parser so as to preserve CDATA regions (xerces just does the
wrong
+ * thing with those,...fatally so). Then our own semantic checking is
performed
+ * as part of compiling the DFDL schema.
Review comment:
Are 1 and 3 done by different things? Is 1 loaded by scala-xml, 2 loaded
by xerces, and 3 loade by our ConstructingParser (which is an extension of
scala-xml)? Can we use our custom contstructing loader in 1, and only throw it
away if 2 fails?
##########
File path:
daffodil-tdml-processor/src/main/scala/org/apache/daffodil/tdml/TDMLInfosetInputter.scala
##########
@@ -68,22 +67,20 @@ class TDMLInfosetInputter(val scalaInputter:
ScalaXMLInfosetInputter, others: Se
override def getSimpleText(primType: NodeInfo.Kind): String = {
val res = scalaInputter.getSimpleText(primType)
val resIsEmpty = res == null || res == ""
- val othersmatch = others.forall { i =>
- val st = i.getSimpleText(primType)
+ val otherStrings = others.map { i =>
+ val firstVersion = i.getSimpleText(primType)
+ val finalVersion = i match {
+ case _ if (firstVersion eq null) => ""
+ case jsonii: JsonInfosetInputter =>
firstVersion.replaceAll("(\r\n|\r)", "\n").
+ replaceAll("\r", "\n")
+ case _ => firstVersion
+ }
+ finalVersion
+ }
+ val othersmatch = otherStrings.forall { case st: String =>
val stIsEmpty = st == null || res == ""
val areSame = res == st || (resIsEmpty && stIsEmpty)
- if (areSame) {
- true
- } else {
- if (i.isInstanceOf[JsonInfosetInputter]) {
- // the json infoset inputter maintains CRLF/CR, but XML converts
CRLF/CR to
- // LF. So if this is Json, then compare with the CRLF/CR converted
to LF
Review comment:
Can you add back this comment as to why we're doing this replace all? I
think that's useful to have.
##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/xml/XMLUtils.scala
##########
@@ -450,6 +464,55 @@ object XMLUtils {
val SAX_NAMESPACES_FEATURE = "http://xml.org/sax/features/namespaces"
val SAX_NAMESPACE_PREFIXES_FEATURE =
"http://xml.org/sax/features/namespace-prefixes"
+ /**
+ * Always enable this feature (which disables doctypes).
+ */
+ val XML_DISALLOW_DOCTYPE_FEATURE =
"http://apache.org/xml/features/disallow-doctype-decl"
+
+ /**
+ * Always disable this. Might not be necessary if doctypes are disallowed.
+ */
+ val XML_EXTERNAL_PARAMETER_ENTITIES_FEATURE =
"http://xml.org/sax/features/external-parameter-entities"
+
+ /**
+ * Always disable this. Might not be necessary if doctypes are disallowed.
+ */
+ val XML_EXTERNAL_GENERAL_ENTITIES_FEATURE =
"http://xml.org/sax/features/external-general-entities"
+
+ /**
+ * Sets properties that disable insecure XML reader behaviors.
+ * @param xmlReader - the reader to change feature settings on.
+ */
+ def setSecureDefaults(xmlReader: XMLReader) : Unit = {
+ // Try all of these. Then see if any of them were rejected.
+ try{ Seq(
+ Try { xmlReader.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true) },
+ Try { xmlReader.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true) },
+ Try {
xmlReader.setFeature(XMLUtils.XML_EXTERNAL_PARAMETER_ENTITIES_FEATURE, false) },
+ Try { xmlReader.setFeature(XMLUtils.XML_EXTERNAL_GENERAL_ENTITIES_FEATURE,
false) }
+ // not available on an XMLReader
+ // xmlReader.setProperty(XMLUtils.SAX_NAMESPACE_PREFIXES_FEATURE, true)
+ // not available on an XMLReader
+ // xmlReader.setProperty(XMLUtils.SAX_NAMESPACES_FEATURE, true)
+ // System.err.println("SAX XMLReader supports disallow properties: " +
xmlReader)
+ ).find(_.isFailure).getOrElse(()) // throws the first failure, after
attempting all of them.
+ } catch {
+ case e: SAXNotRecognizedException =>
+ //
+ // Our specific SAX XMLReader does not accept these options.
+ // But this is in daffodil-lib, and the DaffodilParseXMLReader is in
+ // runtime1, so we can't use the class directly here. So we play
+ // these class name tricks to see if the reader is ours, and
+ // only re-throw if it isn't.
+ //
+ val className = Misc.getNameFromClass(xmlReader)
+ if (!className.contains("DaffodilParseXMLReader")) {
+ // $COVERAGE-OFF$
+ Assert.abort(e) // bug.
Review comment:
See comment above, but this feels like it shouldn't necessarily be a bug
if we are going to allow people to to change a classpath to use different
XMLReaders. Seems if we allow that, then we should try our best to make things
secure but setting as many secure features as possible, and just ignore
failures if we happened to get an XMLReader that doesn't support those features.
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/Evaluatable.scala
##########
@@ -319,7 +319,7 @@ abstract class Evaluatable[+T <: AnyRef](protected val ci:
DPathCompileInfo, qNa
* Looks like an attribute definition, but the value part
* may contain XML-element-like syntax. If so it is either escaped with
* CDATA bracketing (preferred), or if there are quotes in it, then
- * it is escapified meaning " < etc.
+ * it is escapified meaning &quot; &lt; etc.
Review comment:
Was this an accidentally search/replace?
##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/xml/XMLUtils.scala
##########
@@ -450,6 +464,55 @@ object XMLUtils {
val SAX_NAMESPACES_FEATURE = "http://xml.org/sax/features/namespaces"
val SAX_NAMESPACE_PREFIXES_FEATURE =
"http://xml.org/sax/features/namespace-prefixes"
+ /**
+ * Always enable this feature (which disables doctypes).
+ */
+ val XML_DISALLOW_DOCTYPE_FEATURE =
"http://apache.org/xml/features/disallow-doctype-decl"
+
+ /**
+ * Always disable this. Might not be necessary if doctypes are disallowed.
+ */
+ val XML_EXTERNAL_PARAMETER_ENTITIES_FEATURE =
"http://xml.org/sax/features/external-parameter-entities"
+
+ /**
+ * Always disable this. Might not be necessary if doctypes are disallowed.
+ */
+ val XML_EXTERNAL_GENERAL_ENTITIES_FEATURE =
"http://xml.org/sax/features/external-general-entities"
+
+ /**
+ * Sets properties that disable insecure XML reader behaviors.
+ * @param xmlReader - the reader to change feature settings on.
+ */
+ def setSecureDefaults(xmlReader: XMLReader) : Unit = {
+ // Try all of these. Then see if any of them were rejected.
+ try{ Seq(
+ Try { xmlReader.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true) },
+ Try { xmlReader.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true) },
+ Try {
xmlReader.setFeature(XMLUtils.XML_EXTERNAL_PARAMETER_ENTITIES_FEATURE, false) },
+ Try { xmlReader.setFeature(XMLUtils.XML_EXTERNAL_GENERAL_ENTITIES_FEATURE,
false) }
+ // not available on an XMLReader
+ // xmlReader.setProperty(XMLUtils.SAX_NAMESPACE_PREFIXES_FEATURE, true)
+ // not available on an XMLReader
+ // xmlReader.setProperty(XMLUtils.SAX_NAMESPACES_FEATURE, true)
+ // System.err.println("SAX XMLReader supports disallow properties: " +
xmlReader)
+ ).find(_.isFailure).getOrElse(()) // throws the first failure, after
attempting all of them.
+ } catch {
+ case e: SAXNotRecognizedException =>
+ //
+ // Our specific SAX XMLReader does not accept these options.
+ // But this is in daffodil-lib, and the DaffodilParseXMLReader is in
+ // runtime1, so we can't use the class directly here. So we play
+ // these class name tricks to see if the reader is ours, and
+ // only re-throw if it isn't.
+ //
Review comment:
Some thoughts about this:
1) This could accept any XMLReader, and I think the XMLReader's we get just
come from some Java factory, so potentially someone could change classpaths or
something and we could use a different reader for loading XML? Should we be
explicitly about instantiating a specific XMLReader to ensure that these
features do exist?
2) Should we change the Daffodil XML Reader to accept and just ignore these
features? The DaffodilXMLReader doesn't actually parse XML, so none of these
features actually apply? Or perhaps we shouldn't be setting these secure
features on our DaffodiLXMLReader? It's not parsing XML so the XML secure
features don't really apply. Just because it's an XMLReader doesn't mean we
need to call the makeSecure function on it. We completely control it so know
it's secure.
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
##########
@@ -538,47 +401,232 @@ trait SchemaAwareLoaderMixin {
* do any validation either), however, once Daffodil starts processing the
* DFDL schema nodes, it resolves references using the same one true XML
catalog resolver.
*/
-class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler) {
+class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler)
+ extends NoBindingFactoryAdapter {
def this() = this(RethrowSchemaErrorHandler)
- def xercesAdapter = new DFDLXercesAdapter(errorHandler)
+ private def resolver = DFDLCatalogResolver.get
- //
- // Controls whether we setup Xerces for validation or not.
- //
- final var doValidation: Boolean = true
+ /**
+ * UPA errors are detected by xerces if the schema-full-checking feature is
+ * turned on, AND if you inform xerces that it is reading an XML Schema
+ * (i.e., xsd).
+ *
+ * Detecting these requires that we do THREE passes
+ * 1) load the DFDL schema as an XML document. This validates it against the
XML Schema
+ * for DFDL schemas.
+ * 2) load the DFDL schema as an XSD - xerces then does lots of more
intensive checking
+ * of the schema
+ * 3) load the schema for our own consumption by Daffodil code. This uses the
+ * constructing parser so as to preserve CDATA regions (xerces just does the
wrong
+ * thing with those,...fatally so). Then our own semantic checking is
performed
+ * as part of compiling the DFDL schema.
+ *
+ * Checks like UPA are in step (2) above. They are coded algorithmically
+ * right into Xerces. This is accomplished by
+ * using the below SchemaFactory and SchemaFactory.newSchema calls. The
+ * newSchema call is what forces schema validation to take place.
+ */
+ private lazy val schemaFactory = {
+ val sf = new org.apache.xerces.jaxp.validation.XMLSchemaFactory()
+ sf.setResourceResolver(resolver)
+ sf.setErrorHandler(errorHandler)
+ sf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true)
+ sf.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true)
+ // These are not recognized by a schemaFactory
+ // sf.setFeature("http://xml.org/sax/features/validation", true)
+ // sf.setFeature("http://apache.org/xml/features/validation/schema", true)
+ //
sf.setFeature("http://apache.org/xml/features/validation/schema-full-checking",
true)
+ // sf.setFeature(XMLUtils.SAX_NAMESPACES_FEATURE, true)
+ // sf.setFeature(XMLUtils.SAX_NAMESPACE_PREFIXES_FEATURE, true)
+ sf
+ }
+
+ /**
+ * This loads the DFDL schema as an XML Schema. This will
+ * check many more things (ex: UPA) about the DFDL schema other than
+ * just whether it validates against the XML Schema for DFDL schemas.
+ *
+ * Unfortunately, we don't have control over how Xerces loads up these
schemas
+ * (other than the resolver anyway), so we can't really redirect the way
+ * it issues error messages so that it properly lays blame at say, the
schema fragments
+ * inside an embedded schema of a TDML file.
+ *
+ * So if we want good file/line/column info from this, we have to give
+ * it a plain old file or resource, and not try to play games to get it to
+ * pick up the file/line/col information from attributes of the elements.
+ *
+ * Due to limitations in the xerces newSchema() method
+ * this method should be called only after loading
+ * the schema as a regular XML file, which itself insists
+ * on the XMLUtils.setSecureDefaults, so we don't need to
+ * further check that here.
+ */
+ def validateAsXMLSchema(source: DaffodilSchemaSource): Unit = {
+ // first we load it, with validation explicitly against the
+ // schema for DFDL Schemas.
+ load(source, Some(XMLUtils.schemaForDFDLSchemas), addPositionAttributes =
true)
+ //
+ // Then we validate explicitly so Xerces can check things
+ // such as for UPA violations
+ //
+ val inputSource = source.newInputSource()
+ val saxSource = new SAXSource(inputSource)
+ //
+ // We would like this saxSource to be created from an XMLReader
+ // so that we can call XMLUtils.setSecureDefaults on it.
+ // but we get strange errors if I do that, where every symbol
+ // in the schema has an unrecognized namespace prefix.
+ //
+ schemaFactory.newSchema(saxSource)
+ inputSource.getByteStream().close()
+ }
- def setValidation(flag: Boolean): Unit = {
- doValidation = flag
+ // $COVERAGE-OFF$
+ override def parser = {
+ Assert.usageError("not to be called.")
}
+ // $COVERAGE-ON$
/**
+ * Obtain and initialize parser which validates the schema is defined.
+ */
+ private def parserFromURI(optSchemaURI: Option[URI]): SAXParser = {
+ if (optSchemaURI.isEmpty) noSchemaParser
+ else {
+ val f = parserFactory()
+ val schema = schemaFromURI(optSchemaURI)
+ f.setSchema(schema)
+ parserFromFactory(f)
+ }
+ }
+
+ private def schemaFromURI(optSchemaURI: Option[URI]): Schema = {
+ val sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
+ sf.setErrorHandler(errorHandler)
+ sf.setResourceResolver(resolver)
+ val schema = sf.newSchema(new StreamSource(optSchemaURI.get.toString))
+ schema
+ }
+
+ private def parserFactory() = {
+ val f = new SAXParserFactoryImpl
+ f.setNamespaceAware(true)
+ f.setFeature(XMLUtils.SAX_NAMESPACE_PREFIXES_FEATURE, true)
+ f.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true)
+ f.setValidating(false)// according to javadoc, just controls DTD validation
+ f.setFeature("http://xml.org/sax/features/validation", true)
+ // not recognized by SAXParserFactory
+ // f.setFeature("http://xml.org/sax/features/validation/dynamic", true)
+ f.setFeature("http://apache.org/xml/features/honour-all-schemaLocations",
true)
+ f.setFeature("http://apache.org/xml/features/validation/schema", true)
+
f.setFeature("http://apache.org/xml/features/validation/schema-full-checking",
true)
+ f
+ }
+
+ private lazy val noSchemaParser: SAXParser = {
+ parserFromFactory(parserFactory())
+ }
+
+ private def parserFromFactory(f: SAXParserFactory) = {
+ val p = f.newSAXParser()
+ //
+ // Not allowed on a SAXParser
+ // p.setProperty(XMLUtils.SAX_NAvMESPACES_FEATURE, true)
+ // Not allowed on a SAXParser
+ // p.setProperty(XMLUtils.SAX_NAMESPACE_PREFIXES_FEATURE, true)
+ val xrdr = p.getXMLReader()
+ XMLUtils.setSecureDefaults(xrdr)
+ xrdr.setErrorHandler(errorHandler)
+ // not recognized by XMLReader
+ // xrdr.setFeature("http://xml.org/sax/features/validation/dynamic", true)
+ xrdr.setContentHandler(this)
+ //
+ // This is required to get the parse to really use our resolver.
+ // The setEntityResolver(resolver) does not work.
+ //
+
xrdr.setProperty("http://apache.org/xml/properties/internal/entity-resolver",
resolver)
+ p
+ }
+
+ /**
+ * This is the common routine called by all the load calls to actually
+ * carry out the loading of the schema.
+ *
+ * There are no calls to this in Daffodil code base as of this writing, but
+ * the base class scala.xml.factory.XMLLoader calls it.
+ *
* Does (optional) validation,
+ *
+ * @param source The URI for the XML document which may be a XML or DFDL
schema, or just XML data.
+ * @param optSchemaURI Optional URI for XML schema for the XML source
document.
+ * @param addPositionAttributes True to add dafint:file dafint:line
attributes to all elements.
+ * Defaults to false.
+ * @param normalizeCRLFtoLF True to normalize CRLF and isolated CR to LF.
This should usually be true,
+ * but some special case situations may require
preservation of CRLF/CR.
+ * @return an scala.xml.Node (Element actually) which is the document
element of the source.
*/
- def load(source: DaffodilSchemaSource): scala.xml.Node = {
- if (doValidation) {
- val inputSource = source.newInputSource()
- val xercesNode = xercesAdapter.load(inputSource) // validates
- inputSource.getByteStream().close()
-
- if (xercesNode == null) return null
- // Note: we don't call xercesAdapter.validateSchema(source)
- // here, because this is an XML loader, not necessarily
- // just a DFDL schema loader. So for example the doValidation flag
- // above could be telling us to validate a TDML file or not.
+ def load(source: DaffodilSchemaSource,
+ optSchemaURI: Option[URI],
+ addPositionAttributes: Boolean = false,
+ normalizeCRLFtoLF: Boolean = true): scala.xml.Node = {
+ //
+ // First we invoke the validator to explicitly validate the XML against
+ // the XML Schema (not necessarily a DFDL schema), via the
+ // javax.xml.validation.Validator's validate method.
+ //
+ optSchemaURI.foreach { schemaURI =>
+
+ val validator = XercesValidator.fromURIs(Seq(schemaURI))
+ val inputStream = source.uriForLoading.toURL.openStream()
+ validator.validateXML(inputStream, errorHandler)
+ inputStream.close()
+ //
+ // Next we have to invoke a regular xerces loader, setup for validation
+ // because that will actually interpret things like xsi:schemaLocation
attributes
+ // of the root element.
+ //
+ // The check of xsi:schemaLocation schemas seems to be the only reason we
+ // have to run this additional test.
+ //
+ // Possibly xsi:noNamespaceSchemaLocation would have the same issue, but
as of
+ // this writing, we have no tests that use that.
+ //
+ // scopeStack.push(TopScope) // not in scala xml v2.0.0
+ val parser = parserFromURI(optSchemaURI)
+ val xrdr = parser.getXMLReader()
+ val saxSource = scala.xml.Source.fromSysId(source.uriForLoading.toString)
+ // parser.parse(source.uriForLoading.toURL.openStream(), this)
Review comment:
Remove parser.parse line.
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
##########
@@ -538,47 +401,232 @@ trait SchemaAwareLoaderMixin {
* do any validation either), however, once Daffodil starts processing the
* DFDL schema nodes, it resolves references using the same one true XML
catalog resolver.
*/
-class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler) {
+class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler)
+ extends NoBindingFactoryAdapter {
def this() = this(RethrowSchemaErrorHandler)
- def xercesAdapter = new DFDLXercesAdapter(errorHandler)
+ private def resolver = DFDLCatalogResolver.get
- //
- // Controls whether we setup Xerces for validation or not.
- //
- final var doValidation: Boolean = true
+ /**
+ * UPA errors are detected by xerces if the schema-full-checking feature is
+ * turned on, AND if you inform xerces that it is reading an XML Schema
+ * (i.e., xsd).
+ *
+ * Detecting these requires that we do THREE passes
+ * 1) load the DFDL schema as an XML document. This validates it against the
XML Schema
+ * for DFDL schemas.
+ * 2) load the DFDL schema as an XSD - xerces then does lots of more
intensive checking
+ * of the schema
+ * 3) load the schema for our own consumption by Daffodil code. This uses the
+ * constructing parser so as to preserve CDATA regions (xerces just does the
wrong
+ * thing with those,...fatally so). Then our own semantic checking is
performed
+ * as part of compiling the DFDL schema.
+ *
+ * Checks like UPA are in step (2) above. They are coded algorithmically
+ * right into Xerces. This is accomplished by
+ * using the below SchemaFactory and SchemaFactory.newSchema calls. The
+ * newSchema call is what forces schema validation to take place.
+ */
+ private lazy val schemaFactory = {
+ val sf = new org.apache.xerces.jaxp.validation.XMLSchemaFactory()
+ sf.setResourceResolver(resolver)
+ sf.setErrorHandler(errorHandler)
+ sf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true)
+ sf.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true)
+ // These are not recognized by a schemaFactory
+ // sf.setFeature("http://xml.org/sax/features/validation", true)
+ // sf.setFeature("http://apache.org/xml/features/validation/schema", true)
+ //
sf.setFeature("http://apache.org/xml/features/validation/schema-full-checking",
true)
+ // sf.setFeature(XMLUtils.SAX_NAMESPACES_FEATURE, true)
+ // sf.setFeature(XMLUtils.SAX_NAMESPACE_PREFIXES_FEATURE, true)
+ sf
+ }
+
+ /**
+ * This loads the DFDL schema as an XML Schema. This will
+ * check many more things (ex: UPA) about the DFDL schema other than
+ * just whether it validates against the XML Schema for DFDL schemas.
+ *
+ * Unfortunately, we don't have control over how Xerces loads up these
schemas
+ * (other than the resolver anyway), so we can't really redirect the way
+ * it issues error messages so that it properly lays blame at say, the
schema fragments
+ * inside an embedded schema of a TDML file.
+ *
+ * So if we want good file/line/column info from this, we have to give
+ * it a plain old file or resource, and not try to play games to get it to
+ * pick up the file/line/col information from attributes of the elements.
+ *
+ * Due to limitations in the xerces newSchema() method
+ * this method should be called only after loading
+ * the schema as a regular XML file, which itself insists
+ * on the XMLUtils.setSecureDefaults, so we don't need to
+ * further check that here.
+ */
+ def validateAsXMLSchema(source: DaffodilSchemaSource): Unit = {
+ // first we load it, with validation explicitly against the
+ // schema for DFDL Schemas.
+ load(source, Some(XMLUtils.schemaForDFDLSchemas), addPositionAttributes =
true)
+ //
+ // Then we validate explicitly so Xerces can check things
+ // such as for UPA violations
+ //
+ val inputSource = source.newInputSource()
+ val saxSource = new SAXSource(inputSource)
+ //
+ // We would like this saxSource to be created from an XMLReader
+ // so that we can call XMLUtils.setSecureDefaults on it.
+ // but we get strange errors if I do that, where every symbol
+ // in the schema has an unrecognized namespace prefix.
+ //
+ schemaFactory.newSchema(saxSource)
+ inputSource.getByteStream().close()
+ }
- def setValidation(flag: Boolean): Unit = {
- doValidation = flag
+ // $COVERAGE-OFF$
+ override def parser = {
+ Assert.usageError("not to be called.")
}
+ // $COVERAGE-ON$
/**
+ * Obtain and initialize parser which validates the schema is defined.
+ */
+ private def parserFromURI(optSchemaURI: Option[URI]): SAXParser = {
+ if (optSchemaURI.isEmpty) noSchemaParser
+ else {
+ val f = parserFactory()
+ val schema = schemaFromURI(optSchemaURI)
+ f.setSchema(schema)
+ parserFromFactory(f)
+ }
+ }
+
+ private def schemaFromURI(optSchemaURI: Option[URI]): Schema = {
+ val sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
+ sf.setErrorHandler(errorHandler)
+ sf.setResourceResolver(resolver)
+ val schema = sf.newSchema(new StreamSource(optSchemaURI.get.toString))
+ schema
+ }
+
+ private def parserFactory() = {
+ val f = new SAXParserFactoryImpl
+ f.setNamespaceAware(true)
+ f.setFeature(XMLUtils.SAX_NAMESPACE_PREFIXES_FEATURE, true)
+ f.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true)
+ f.setValidating(false)// according to javadoc, just controls DTD validation
+ f.setFeature("http://xml.org/sax/features/validation", true)
+ // not recognized by SAXParserFactory
+ // f.setFeature("http://xml.org/sax/features/validation/dynamic", true)
+ f.setFeature("http://apache.org/xml/features/honour-all-schemaLocations",
true)
+ f.setFeature("http://apache.org/xml/features/validation/schema", true)
+
f.setFeature("http://apache.org/xml/features/validation/schema-full-checking",
true)
+ f
+ }
+
+ private lazy val noSchemaParser: SAXParser = {
+ parserFromFactory(parserFactory())
+ }
+
+ private def parserFromFactory(f: SAXParserFactory) = {
+ val p = f.newSAXParser()
+ //
+ // Not allowed on a SAXParser
+ // p.setProperty(XMLUtils.SAX_NAvMESPACES_FEATURE, true)
+ // Not allowed on a SAXParser
+ // p.setProperty(XMLUtils.SAX_NAMESPACE_PREFIXES_FEATURE, true)
+ val xrdr = p.getXMLReader()
+ XMLUtils.setSecureDefaults(xrdr)
+ xrdr.setErrorHandler(errorHandler)
+ // not recognized by XMLReader
+ // xrdr.setFeature("http://xml.org/sax/features/validation/dynamic", true)
+ xrdr.setContentHandler(this)
+ //
+ // This is required to get the parse to really use our resolver.
+ // The setEntityResolver(resolver) does not work.
+ //
+
xrdr.setProperty("http://apache.org/xml/properties/internal/entity-resolver",
resolver)
+ p
+ }
+
+ /**
+ * This is the common routine called by all the load calls to actually
+ * carry out the loading of the schema.
+ *
+ * There are no calls to this in Daffodil code base as of this writing, but
+ * the base class scala.xml.factory.XMLLoader calls it.
+ *
* Does (optional) validation,
+ *
+ * @param source The URI for the XML document which may be a XML or DFDL
schema, or just XML data.
+ * @param optSchemaURI Optional URI for XML schema for the XML source
document.
+ * @param addPositionAttributes True to add dafint:file dafint:line
attributes to all elements.
+ * Defaults to false.
+ * @param normalizeCRLFtoLF True to normalize CRLF and isolated CR to LF.
This should usually be true,
+ * but some special case situations may require
preservation of CRLF/CR.
+ * @return an scala.xml.Node (Element actually) which is the document
element of the source.
*/
- def load(source: DaffodilSchemaSource): scala.xml.Node = {
- if (doValidation) {
- val inputSource = source.newInputSource()
- val xercesNode = xercesAdapter.load(inputSource) // validates
- inputSource.getByteStream().close()
-
- if (xercesNode == null) return null
- // Note: we don't call xercesAdapter.validateSchema(source)
- // here, because this is an XML loader, not necessarily
- // just a DFDL schema loader. So for example the doValidation flag
- // above could be telling us to validate a TDML file or not.
+ def load(source: DaffodilSchemaSource,
+ optSchemaURI: Option[URI],
+ addPositionAttributes: Boolean = false,
+ normalizeCRLFtoLF: Boolean = true): scala.xml.Node = {
+ //
+ // First we invoke the validator to explicitly validate the XML against
+ // the XML Schema (not necessarily a DFDL schema), via the
+ // javax.xml.validation.Validator's validate method.
+ //
+ optSchemaURI.foreach { schemaURI =>
+
+ val validator = XercesValidator.fromURIs(Seq(schemaURI))
+ val inputStream = source.uriForLoading.toURL.openStream()
+ validator.validateXML(inputStream, errorHandler)
+ inputStream.close()
+ //
+ // Next we have to invoke a regular xerces loader, setup for validation
+ // because that will actually interpret things like xsi:schemaLocation
attributes
+ // of the root element.
+ //
+ // The check of xsi:schemaLocation schemas seems to be the only reason we
+ // have to run this additional test.
+ //
+ // Possibly xsi:noNamespaceSchemaLocation would have the same issue, but
as of
+ // this writing, we have no tests that use that.
+ //
+ // scopeStack.push(TopScope) // not in scala xml v2.0.0
Review comment:
Can we remove the scope stack lines? I assume this means scala-xml 2.0
fixed some issues related to scope?
##########
File path:
daffodil-tdml-processor/src/main/scala/org/apache/daffodil/tdml/processor/DaffodilTDMLDFDLProcessor.scala
##########
@@ -159,14 +163,18 @@ final class TDMLDFDLProcessorFactory private (
override def getProcessor(
schemaSource: DaffodilSchemaSource,
useSerializedProcessor: Boolean,
- optRootName: Option[String] = None,
- optRootNamespace: Option[String] = None): TDML.CompileResult = {
+ optRootName: Option[String],
+ optRootNamespace: Option[String],
+ tunables: Map[String, String]): TDML.CompileResult = {
Review comment:
Why pass in tunbles now? Doesn't the TDML runner call setTunables?
##########
File path:
daffodil-lib/src/test/scala/org/apache/daffodil/util/TestXMLCatalogAndValidate.scala
##########
@@ -449,23 +460,15 @@ class SchemaAwareFactoryAdapter()
}
})
- // val schemaFactory =
SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
- // //System.err.println("Reading the schema")
- // schemaFactory.setResourceResolver(res)
- // val schema = schemaFactory.newSchema(schemaFile);
- // //System.err.println("Done reading the schema")
- // // What if, instead of this, we just setFeature(...validation...)
above?
- // // Answer: it doesn't throw exceptions on validation errors.
- // val vh = schema.newValidatorHandler()
- // vh.setContentHandler(this)
- // xr.setContentHandler(vh)
+ // validation occurs during the loading process because
+ // we set the feature requiring it above where the parser is constructed.
// parse file
- scopeStack.push(TopScope)
+ // scopeStack.push(TopScope) // not in scala xml v2.0.0
//System.err.println("beginning parse")
xr.parse(source)
//System.err.println("ending parse")
- scopeStack.pop
+ // scopeStack.pop // not in scala xml v2.0.0
Review comment:
Remove commented code
##########
File path:
daffodil-tdml-processor/src/main/scala/org/apache/daffodil/tdml/SchemaCache.scala
##########
@@ -33,6 +33,21 @@ object ENoWarnTDML { EqualitySuppressUnusedImportWarning() }
*/
object SchemaDataProcessorCache extends SchemaCache[(Seq[Diagnostic],
DFDL.DataProcessor), Seq[Diagnostic]]
+object SchemaCache {
+ /**
+ * A key object for caching compiled DFDL schemas. Everything that effects
compilation
+ * must be captured here.
+ */
+ case class Key (
+ uri: URISchemaSource, // The DFDL schema's URI
+ useSerializedProcessor: Boolean,
+ compileAllTopLevels: Boolean,
+ optRootName: Option[String],
+ optRootNamespace: Option[String],
+ tunables: Map[String, String]) // tunables map - required since tunables
effect compilation
+ // (ex: parseUnparsePolicy, unqualifiedPathStepPolicy, perhaps others)
+
+}
Review comment:
Is this related to the doctype fix? Seems unrelated. It feels like
there's attempt to fix multiple bugs in this single PR (eg. CRLF change, lazy
evaluating in TDML runner, TDML default namespace stuff, etc), making it
difficult to review. In the future, multilpe small PR's is much easier to
review.
##########
File path:
daffodil-tdml-processor/src/main/scala/org/apache/daffodil/tdml/TDMLInfosetInputter.scala
##########
@@ -68,22 +67,20 @@ class TDMLInfosetInputter(val scalaInputter:
ScalaXMLInfosetInputter, others: Se
override def getSimpleText(primType: NodeInfo.Kind): String = {
val res = scalaInputter.getSimpleText(primType)
val resIsEmpty = res == null || res == ""
- val othersmatch = others.forall { i =>
- val st = i.getSimpleText(primType)
+ val otherStrings = others.map { i =>
+ val firstVersion = i.getSimpleText(primType)
+ val finalVersion = i match {
+ case _ if (firstVersion eq null) => ""
+ case jsonii: JsonInfosetInputter =>
firstVersion.replaceAll("(\r\n|\r)", "\n").
+ replaceAll("\r", "\n")
+ case _ => firstVersion
+ }
+ finalVersion
+ }
+ val othersmatch = otherStrings.forall { case st: String =>
val stIsEmpty = st == null || res == ""
Review comment:
I think this should be ``st == null || st == ""``? Potentially a bug
that already existed?
##########
File path:
daffodil-tdml-processor/src/test/scala/org/apache/daffodil/tdml/TestTDMLRunnerConfig.scala
##########
@@ -203,7 +203,7 @@ class TestTDMLRunnerConfig {
ts.runOneTest("test1")
}
val msg = e.getMessage()
- println(msg)
+ // println(msg)
Review comment:
Remove comment, same with below tests
##########
File path:
daffodil-test/src/test/resources/org/apache/daffodil/section06/entities/charClassEntities.tdml
##########
@@ -62,11 +62,12 @@
</parserTestCase>
<parserTestCase name="CarriageReturn" root="matrix"
- model="charClassEntities.dfdl.xsd" description="CSV-style tests and use of
%NL; - DFDL-6-045R"
- roundTrip="twoPass">
+ model="charClassEntities.dfdl.xsd" description="CSV-style tests and use of
%NL; - DFDL-6-045R">
<document>
- <documentPart type="text"><![CDATA[0,1,2,3,4
5,6,7,8,9
10,11,12,13,14]]></documentPart>
+ <documentPart type="text"><![CDATA[0,1,2,3,4
+5,6,7,8,9
+10,11,12,13,14]]></documentPart>
Review comment:
git autocrlf might change this file. Is this testing CR in TDML files?
I don't think that really going to work without doing some special stuff to
ensure git doesn't muck with things, or that future formatters don't change to
a standard like LF.
I would say if we need to test loading files with CR/CRLF, we should create
strings of data and load that. We really can't easily rely on files having the
expected line endings.
##########
File path:
daffodil-tdml-processor/src/main/scala/org/apache/daffodil/tdml/processor/DaffodilTDMLDFDLProcessor.scala
##########
@@ -126,16 +125,21 @@ final class TDMLDFDLProcessorFactory private (
val diags = p.getDiagnostics
Left(diags)
} else {
- val dp =
+ val dp = {
+ // Always serialize the parser/unparser objects.
+ // This insures nothing is left uncomputed by lazy evaluation.
Review comment:
Mentioned this earlier, but this sounds like a bug to me. We shouldn't
need to serialize the parser to ensure everything is evaluated.
##########
File path:
daffodil-tdml-processor/src/main/scala/org/apache/daffodil/tdml/TDMLInfosetInputter.scala
##########
@@ -68,22 +67,20 @@ class TDMLInfosetInputter(val scalaInputter:
ScalaXMLInfosetInputter, others: Se
override def getSimpleText(primType: NodeInfo.Kind): String = {
val res = scalaInputter.getSimpleText(primType)
val resIsEmpty = res == null || res == ""
- val othersmatch = others.forall { i =>
- val st = i.getSimpleText(primType)
+ val otherStrings = others.map { i =>
+ val firstVersion = i.getSimpleText(primType)
+ val finalVersion = i match {
+ case _ if (firstVersion eq null) => ""
+ case jsonii: JsonInfosetInputter =>
firstVersion.replaceAll("(\r\n|\r)", "\n").
+ replaceAll("\r", "\n")
Review comment:
Doesn't seem like this second replaceAll is needed, the first one should
change both CRLF and CR.
##########
File path: project/Dependencies.scala
##########
@@ -48,7 +48,8 @@ object Dependencies {
lazy val test = Seq(
"junit" % "junit" % "4.13.2" % "it,test",
"com.novocode" % "junit-interface" % "0.11" % "it,test",
- "org.scalacheck" %% "scalacheck" % "1.15.4" % "it,test"
+ "org.scalacheck" %% "scalacheck" % "1.15.4" % "it,test",
+ "xmlunit" % "xmlunit" % "1.6" % "test"
Review comment:
Did you verify the license of XML unit and any dependecies? I also don't
see it used any where? Can this be removed?
##########
File path:
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala
##########
@@ -1751,13 +1796,26 @@ case class DefinedSchema(xml: Node, parent:
DFDLTestSuite) {
* */
final val DEFAULT_ELEMENT_FORM_DEFAULT_VALUE = "qualified"
+ // TODO: We want this default to be false eventually.
Review comment:
Can we create a bug for this?
##########
File path:
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala
##########
@@ -264,66 +263,91 @@ class DFDLTestSuite private[tdml] (
tmpDir.mkdirs()
val src = UnitTestSchemaSource(tsNode, "", Some(tmpDir))
- loader.load(src)
+
+ loader.load(src, optTDMLSchema,
+ addPositionAttributes = true) // want line numbers for TDML
//
(tsNode, src.uriForLoading)
}
case tdmlFile: File => {
log(LogLevel.Info, "loading TDML file: %s", tdmlFile)
val uri = tdmlFile.toURI()
- val newNode = loader.load(URISchemaSource(uri))
+ val newNode = loader.load(URISchemaSource(uri), optTDMLSchema,
+ addPositionAttributes = true)
val res = (newNode, uri)
log(LogLevel.Debug, "done loading TDML file: %s", tdmlFile)
res
}
case uri: URI => {
- val newNode = loader.load(URISchemaSource(uri))
+ val newNode = loader.load(URISchemaSource(uri), optTDMLSchema,
+ addPositionAttributes = true)
val res = (newNode, uri)
res
}
case _ => Assert.usageError("not a Node, File, or URL")
} // end match
- lazy val isTDMLFileValid = !this.isLoadingError
+ lazy val ts = {
+ if (tsRaw eq null) {
+ // must have been a loader error
+ reportLoadingErrors()
+ } else {
+ tsRaw
+ }
+ }
+
+ lazy val isTDMLFileValid = {
+ (ts ne null) &&
+ !loadingExceptions.headOption.isDefined
+ }
+
+ def reportLoadingErrors(): Nothing = {
+ log(LogLevel.Error, "TDML file %s is not valid.", tsURI)
+ throw TDMLException(loadingExceptions.toSeq, None)
+ }
var checkAllTopLevel: Boolean = compileAllTopLevel
def setCheckAllTopLevel(flag: Boolean): Unit = {
checkAllTopLevel = flag
}
- val parserTestCases = (ts \ "parserTestCase").map { node =>
ParserTestCase(node, this) }
+ lazy val parserTestCases = (ts \ "parserTestCase").map { node =>
ParserTestCase(node, this) }
//
// Note: IBM started this TDML file format. They call an unparser test a
"serializer" test.
// We call it an UnparserTestCase
//
- val unparserTestCases = (ts \ "unparserTestCase").map { node =>
UnparserTestCase(node, this) }
- val testCases: Seq[TestCase] = parserTestCases ++ unparserTestCases
- val duplicateTestCases = testCases.groupBy {
- _.tcName
- }.filter { case (name, seq) => seq.length > 1 }
- if (duplicateTestCases.nonEmpty) {
- val listOfDups = duplicateTestCases.map{ case(name, _) => name}
- throw new TDMLExceptionImpl(
- "More than one test for names: " + listOfDups.mkString(","), None)
- }
- val testCaseMap = testCases.map { tc => (tc.tcName -> tc) }.toMap
- val suiteName = (ts \ "@suiteName").text
- val suiteID = (ts \ "@ID").text
- val description = (ts \ "@description").text
- val defaultRoundTrip = {
+ lazy val unparserTestCases = (ts \ "unparserTestCase").map { node =>
UnparserTestCase(node, this) }
+ lazy val testCases = {
+ val tcs: Seq[TestCase] = parserTestCases ++ unparserTestCases
+ val dups = tcs.groupBy {
+ _.tcName
+ }.filter { case (name, seq) => seq.length > 1 }
+ if (dups.nonEmpty) {
+ val listOfDups = dups.map{ case(name, _) => name}
+ throw new TDMLExceptionImpl(
+ "More than one test for names: " + listOfDups.mkString(","), None)
Review comment:
Some something catch this and include the tdml file in the erro
rmessage? If not, it might be useful so we can easily jump to tdml file has
duplicates. Also, I feel like there is a DAFFODIL- bug for this? Might be worth
checking so we can close it when this is all merged.
##########
File path:
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala
##########
@@ -376,19 +400,19 @@ class DFDLTestSuite private[tdml] (
System.gc()
Thread.sleep(1) // needed to give tools like jvisualvm ability to "grab
on" quickly
}
+ val testCase = testCaseMap.get(testName) // causes loading
if (isTDMLFileValid) {
Review comment:
Should we parse the TDML File when we create the test suite rather than
lazily parsing it? That way it's impossible to create a DFDLTestSuite from an
invalid TDMLFile and we don't have to have these isTDMLFileValid checks
throughout?
Or is there a reason things need to fail when actually trying to run a test?
Is this to get around some issue with JUnit if an exception is thrown outside
of a test? Maybe it's worth adding a comment as to why everything is lazy?
##########
File path:
daffodil-tdml-processor/src/test/scala/org/apache/daffodil/tdml/TestTDMLUnparseCases.scala
##########
@@ -49,17 +47,18 @@ class TestTDMLUnparseCases {
</ts:unparserTestCase>
</ts:testSuite>
lazy val ts = new DFDLTestSuite(testSuite)
- val tc: UnparserTestCase = ts.unparserTestCases.find { utc => utc.tcName
== "test1" }.get
- // println(tc)
- tc.document
- val is = tc.inputInfoset
- // println(is)
- val bar = is.dfdlInfoset.contents
- val scala.xml.Elem(pre, label, _, _, _) = bar
- assertEquals("ex", pre)
- assertEquals("bar", label)
+// val tc: UnparserTestCase = ts.unparserTestCases.find { utc => utc.tcName
== "test1" }.get
+// // println(tc)
+// tc.document
+// val is = tc.inputInfoset
+// // println(is)
+// val bar = is.dfdlInfoset.contents
+// val scala.xml.Elem(pre, label, _, _, _) = bar
+// assertEquals("ex", pre)
+// assertEquals("bar", label)
// ts.trace
ts.runOneTest("test1")
+ println(ts.loadingDiagnosticMessages)
Review comment:
Remove println, maybe add an assert tht checks diagnostics?
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilXMLLoader.scala
##########
@@ -538,47 +401,226 @@ trait SchemaAwareLoaderMixin {
* do any validation either), however, once Daffodil starts processing the
* DFDL schema nodes, it resolves references using the same one true XML
catalog resolver.
*/
-class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler) {
+class DaffodilXMLLoader(val errorHandler: org.xml.sax.ErrorHandler)
+ extends NoBindingFactoryAdapter {
Review comment:
Did you determine if this is needed or not?
##########
File path:
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala
##########
@@ -376,19 +400,19 @@ class DFDLTestSuite private[tdml] (
System.gc()
Thread.sleep(1) // needed to give tools like jvisualvm ability to "grab
on" quickly
}
+ val testCase = testCaseMap.get(testName) // causes loading
if (isTDMLFileValid) {
- loadingExceptions.foreach { le => log(LogLevel.Warning, le.toString) }
- val testCase = testCaseMap.get(testName)
+ // display warnings if any
+ // too verbose. removing
+ // loadingExceptions.foreach { le => log(LogLevel.Warning, le.toString)
}
Review comment:
Do we have a bug for this? It would be really nice if we could error if
there are warnings that weren't expected. I imagine right now lots of our tests
our outputting warnings and need to be tweaked and we just aren't aware because
they are hidden.
##########
File path:
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala
##########
@@ -2294,29 +2361,60 @@ sealed abstract class DocumentPart(part: Node, parent:
Document) {
}
case class Infoset(i: NodeSeq, parent: TestCase) {
- val Seq(dfdlInfoset) = (i \ "dfdlInfoset").map { node => DFDLInfoset(node,
this) }
- val contents = dfdlInfoset.contents
+ lazy val Seq(dfdlInfoset) = (i \ "dfdlInfoset").map { node =>
DFDLInfoset(node, this) }
+ lazy val contents = dfdlInfoset.contents
}
case class DFDLInfoset(di: Node, parent: Infoset) {
- val infosetNodeSeq = {
- (di \ "@type").toString match {
- case "infoset" | "" => di.child.filter {
- _.isInstanceOf[scala.xml.Elem]
- }
- case "file" => {
- val path = di.text.trim()
- val maybeURI = parent.parent.parent.findTDMLResource(path)
- val uri = maybeURI.getOrElse(throw new
FileNotFoundException("TDMLRunner: infoset file '" + path + "' was not found"))
- val elem = scala.xml.XML.load(uri.toURL)
- elem
+ private lazy val infosetNodeSeq = {
+ val testCase: TestCase = parent.parent
+ val loader = testCase.parent.loader
+ val optDataSchema: Option[URI] = {
+
testCase.optSchemaFileURI.orElse(testCase.optEmbeddedSchema.map{_.uriForLoading})
+ }
+ val src =
+ (di \ "@type").toString match {
+ case "infoset" | "" => {
+ val rawElem = di.child.filter {
+ _.isInstanceOf[scala.xml.Elem]
+ }.head
+ UnitTestSchemaSource(rawElem, testCase.tcName)
+ }
+ case "file" => {
+ val path = di.text.trim()
+ val maybeURI = parent.parent.parent.findTDMLResource(path)
+ val uri = maybeURI.getOrElse(throw new
FileNotFoundException("TDMLRunner: infoset file '" + path + "' was not found"))
+ URISchemaSource(uri)
+ }
+ case value => Assert.abort("Uknown value for type attribute on
dfdlInfoset: " + value)
}
- case value => Assert.abort("Uknown value for type attribute on
dfdlInfoset: " + value)
+ //
+ // TODO: DAFFODIL-288 validate the infoset also
+ // You can pass the optDataSchema, which appears to be the correct thing
+ // but in many cases it doesn't seem to be able to resolve things.
+ //
+ val testSuite = testCase.parent
+ val before = testSuite.loadingExceptions.clone()
+ // val elem: Node = loader.load(src, optDataSchema)
+ val elem = loader.load(src, None) // no schema
Review comment:
Should we have a unique loader per infoset so we don't have to worry
about TDML loader exceptions mixing in with element loader exceptions? Or do
you think that would be too much overhead? Having to share the same exceptions
list seems a bit fragile, and potentially confusing.
##########
File path:
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala
##########
@@ -1751,13 +1796,26 @@ case class DefinedSchema(xml: Node, parent:
DFDLTestSuite) {
* */
final val DEFAULT_ELEMENT_FORM_DEFAULT_VALUE = "qualified"
+ // TODO: We want this default to be false eventually.
+ // The rationale is if people want a default namespace
+ // in their TDML defineSchemas, they should define it. We shouldn't tack
+ // such a sensitive thing on automatically. (81 tests fail in daffodil-test
+ // if you change this however.)
+ //
+ final val DEFAULT_USE_DEFAULT_NAMESPACE_VALUE = true
Review comment:
Looks like nothing is using this according to the codecov?
##########
File path:
daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala
##########
@@ -179,6 +203,58 @@ class DaffodilConstructingLoader(uri: URI, errorHandler:
org.xml.sax.ErrorHandle
super.elem(pos, pre, local, newAttrs, newScope, empty, nodes)
}
+ /**
+ * To emulate the behavior of Xerces loader (standard scala loader)
+ * we have to normalize CRLF to LF, and solitary CR to LF.
+ *
+ * This is optional controlled by a constructor parameter.
+ */
+ override def text(pos: Int, txt: String): Text = {
+ val newText:String = {
+ if (normalizeCRLFtoLF) {
+ txt.
+ replaceAll("\r\n", "\n").
+ replaceAll("\r", "\n")
+ } else {
+ txt
+ }
+ }
+ //
+ // On MS-Windows it seems somehow that the TDML Runner
+ // loads expected infosets as XML, and they have CRLFs
+ // in them. Not sure how this happens.
+ //
+ if (normalizeCRLFtoLF)
+ Assert.invariant(!newText.contains("\r"))
Review comment:
This assertion seems sort of unnecessary to me. It's really just
asserting that the replaceAll function works. The line ``replaceAll("\r",
"\n")`` is pretty clearly going to remove all CR's.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]