stevedlawrence commented on code in PR #1065:
URL: https://github.com/apache/daffodil/pull/1065#discussion_r1528497651
##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/DFDLSchemaFile.scala:
##########
@@ -45,16 +46,38 @@ class DFDLSchemaFileLoadErrorHandler(schemaFileLocation:
SchemaFileLocation)
private def loaderErrors = loaderErrors_
private def loaderWarnings = loaderWarnings_
- private def loaderSDEs: Seq[Diagnostic] = loaderErrors.map {
- new SchemaDefinitionError(schemaFileLocation, "Error loading schema due to
%s", _)
+ private def loaderSDEs: Seq[Diagnostic] = loaderErrors.map { err =>
+ val errMessage = err.getMessage
+ // we create a new SchemaFileLocation because the Xerces error has line
and column info that
+ // the original schemaFileLocation that's passed in doesn't contain, so we
create this more
+ // complete SchemaFileLocation
+ val xsfl = new XercesSchemaFileLocatable(err, "")
+ val newSchemaLocation = SchemaFileLocation(
+ xsfl,
+ schemaFileLocation,
+ )
Review Comment:
I noticed `XercesSchmaFileLocatable` doesn't implement the
`SchemaFileLocatable` trait. Should it, then you wouldn't need a new
SchemaFileLocation apply method specific to Xerces. This could just bt
```scala
val xsfl = new XercesSchemaFileLocatable(err, "", schemaFileLocation)
val newSchemaLocation = SchemaFileLocation(xsfl)
```
Just thinking of ways to simplfy things. This is nice since now
ScheaFileLocation doesn't have any know anything about some weird Xerces
special case.
##########
daffodil-cli/src/main/scala/org/apache/daffodil/cli/Main.scala:
##########
@@ -259,16 +259,16 @@ class CLIConf(arguments: Array[String], stdout:
PrintStream, stderr: PrintStream
// URIs resolved from the current working directory. We can convert
this to a string and
// pass it to resolveSchemaLocation to find the actual file or resource
val cwd = Paths.get("").toUri
- XMLUtils.resolveSchemaLocation(uri.toString, Some(cwd))
+ XMLUtils.resolveSchemaLocation(uri.toString,
Some(URISchemaSource(uri.getPath, cwd)))
Review Comment:
This feels weird to me:
```scaa
URISchemaSource(uri.getPath, cwd)
```
This creates a uri schema source where the uri is CWD but the diagnostic
path to be used for that is the path of the URI? Shouldn't the diagnostic of
the context also be cwd?
I guess because resolveSchemaLocation uses `resolveSibling()` it kindof
requires the diagnostic part of URISchemaSource to be to an actual file? Which
kindof makes sense, a schemaLocation also comes from a file doing an
include/import, not from a directory. Maybe this wants to be something like:
```
val contextPath = Paths.get("fakeContext.dfdl.xsd").toAbsolutePath
val contextSource = URISchemaSource(contextPath.toString, conextPath.toUri)
XMLUtils.resolveSchemaLocation(uri.toString, Some(contextSource))
```
This way `resolveSchemaLocation` has an actual file to resolve relative
paths against (even though that file doesn't technically exist). The context
will never be output so the fact that it's fake doesn't really matter, but we
need a comment that describes that resolveSchemaLocation expects a context that
is a a file for diagnostic purposes.
##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaDocIncludesAndImportsMixin.scala:
##########
@@ -144,6 +144,12 @@ trait SchemaDocIncludesAndImportsMixin { self:
XMLSchemaDocument =>
this.uriStringFromAttribute.getOrElse("file:unknown")
}
Review Comment:
Yeah, here's another uriString that probably wants to go away? We may need
another pass after this is merged o remove unused cruft.
##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/Import.scala:
##########
@@ -101,7 +101,13 @@ final class Import(importNode: Node, xsd:
XMLSchemaDocument, seenArg: IIMap)
val uri = resolver.resolveURI(ns.toString)
if (uri == null) None
else {
- val res = URISchemaSource(URI.create(uri))
+ val dfp = if (schemaFile.isDefined) {
+ schemaFile.get.schemaSource.diagnosticFilepath
+ } else {
+ xsd.diagnosticFilepath
+ }
Review Comment:
I'm not sure I understand this. Isn't this using the diagnostic path of the
`schemaFile` doing the importing? Seems like since this is a case where we
can't use the `schemaLocation` to come up with a reasonable diagnostic path,
the only option is to just convert the `res` URI into a reasonable diagnostic.
Fortunately, imports without schemaLocations are pretty rare since those use an
XML catalog, which not a lot of people use.
##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/exceptions/SchemaFileLocatable.scala:
##########
@@ -35,24 +34,35 @@ trait HasSchemaFileLocation extends LookupLocation {
}
object SchemaFileLocation {
- def apply(context: SchemaFileLocatable, tunables: DaffodilTunables) =
+ def apply(context: SchemaFileLocatable) =
new SchemaFileLocation(
context.lineNumber,
context.columnNumber,
context.uriString,
+ context.diagnosticFilepath,
context.toString,
context.diagnosticDebugName,
- tunables.maxParentDirectoriesForDiagnostics,
)
+
+ def apply(xsfl: XercesSchemaFileLocatable, sfl: SchemaFileLocation):
SchemaFileLocation = {
Review Comment:
Mentioned elsewhere, but it might be nice if `XercesSchemaFileLocatable`
implemented the `SchemaFileLocatable` trait so the above apply method could be
used, this avoids creating an apply function specific to one class.
##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaDocIncludesAndImportsMixin.scala:
##########
@@ -144,6 +144,12 @@ trait SchemaDocIncludesAndImportsMixin { self:
XMLSchemaDocument =>
this.uriStringFromAttribute.getOrElse("file:unknown")
}
+ override lazy val diagnosticFilepath = if (self.schemaFile.isDefined) {
+ self.schemaFile.get.diagnosticFilepath
+ } else {
+ self.optLexicalParent.get.diagnosticFilepath
+ }
Review Comment:
This needs a comment, it's not clear why schemaFile wouldn't be defined and
why optLexicalParent is the right thing to use in that case.
##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/DFDLSchemaFile.scala:
##########
@@ -45,16 +46,38 @@ class DFDLSchemaFileLoadErrorHandler(schemaFileLocation:
SchemaFileLocation)
private def loaderErrors = loaderErrors_
private def loaderWarnings = loaderWarnings_
- private def loaderSDEs: Seq[Diagnostic] = loaderErrors.map {
- new SchemaDefinitionError(schemaFileLocation, "Error loading schema due to
%s", _)
+ private def loaderSDEs: Seq[Diagnostic] = loaderErrors.map { err =>
+ val errMessage = err.getMessage
+ // we create a new SchemaFileLocation because the Xerces error has line
and column info that
+ // the original schemaFileLocation that's passed in doesn't contain, so we
create this more
+ // complete SchemaFileLocation
+ val xsfl = new XercesSchemaFileLocatable(err, "")
Review Comment:
The second parameter to XercesSchemaFileLocatable is always empty string. Is
this intentional or was this a place holder?
##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/xml/XMLUtils.scala:
##########
@@ -1452,7 +1453,8 @@ Differences were (path, expected, actual):
// absolute URIs in includes/imports and cannot remove this yet.
try {
uri.toURL.openStream.close
- Some(uri, false)
+ val uss = URISchemaSource(uri.getPath, uri)
Review Comment:
Suggest we call `uriToDiagosticPath` here to get a diganostic path. Absolute
URI's should be pretty rare, and as mentioned in a comment above, we may even
want to remove support for it some day. In the rare case they are used, we can
fall back to our heuristic.
##########
daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java:
##########
@@ -122,7 +122,7 @@ public void testJavaAPI1() throws IOException,
ClassNotFoundException {
DebuggerRunnerForJAPITest debugger = new DebuggerRunnerForJAPITest();
org.apache.daffodil.japi.Compiler c = Daffodil.compiler();
- java.io.File schemaFile = getResource("/test/japi/mySchema1.dfdl.xsd");
+ java.io.File schemaFile = new
File("daffodil-japi/src/test/resources/test/japi/mySchema1.dfdl.xsd");
Review Comment:
This is actually our own `getResource` function which wraps java's and
converts the `URI` to a `File`
I think the idea with these changes is that the `File` that is returned is
an absolute `File` path to a jar (e.g. something like
`jar:file:/path/to/foo.jar!/xsd/foo.dfdl.xsd`), and so any diagnostics will not
be depersonalized, they'll be these long paths.
By using a relative `File`, the diagnostic path is relative and should be
depersonalized.
But both should work though. In fact, I would suggest we undo this change
since we don't really care about diagnostics for unit tests, and the
getResource stuff is a bit cleaner. It would also ensure we have tests that
make sure you can call compileFile with a `jar` File.
If we want, when `compileFile` is called we could detect if the file is a
jar and do the `Misc.stripJarPrefix` thing to get a good diagnostic path for
cases like this where jar files are used.
##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaComponentIncludesAndImportsMixin.scala:
##########
@@ -35,4 +35,6 @@ trait SchemaComponentIncludesAndImportsMixin extends
CommonContextMixin {
xmlSchemaDocument.uriString
}.toOption.getOrElse(orElseURL)
Review Comment:
The comment above says this is used in diagnostic messages. I wonder if this
trait can just be removed, since we now have this diagnosticFilepath thing? In
fact, does uriString go away too? Seems like we are now more tightly coupling
uris and the diagnostic for that URI, so the concept of a separate uriString
goes away? This is maybe just cleanup so can do done in another PR, but we
shouldn't forget about it.
##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/api/DaffodilSchemaSource.scala:
##########
@@ -58,24 +58,29 @@ sealed trait DaffodilSchemaSource {
* False otherwise.
*/
def isXSD: Boolean
+
+ /**
+ * Use to get the original file path passed into Daffodil
+ */
+ def diagnosticFilepath: String
}
object URISchemaSource {
- def apply(fileOrResource: URI) = {
- new URISchemaSource(fileOrResource)
+ def apply(fileOrResourceRaw: String, fileOrResourceUri: URI):
URISchemaSource = {
Review Comment:
Instead of raw, I would suggest we call it diagosticFilePath like you have
everywhere else. Makes it more clear that this is used for diagnostics, and
doesn't necessarily need to be the "raw" version of the URI. Also do he same
for URISchemaSource class.
##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/util/Misc.scala:
##########
@@ -708,4 +708,19 @@ object Misc {
}
Some(res)
}
+
+ /**
+ * Strip the part of the path matching the classpath from the uri
+ * @param uriString
+ * @return
+ */
+ def stripJarPrefix(uriString: String): String = {
+ val withoutJar = if (!isNullOrBlank(uriString)) {
+ uriString.split("\\.jar!").last
+ } else {
+ uriString
+ }
+ withoutJar
+ }
Review Comment:
Toughts on tweaking this to be something like `def uriToDiagnosticPath`, and
this can be the one place where we have our heurstic to convert a URI to a
reasonable diagnosticPath in cases where we don't have another option (e.g.
compileFile, resolvedNamespaceURI). I think most places we call this we already
have a URI and we are calling it to convert that URI to a reasonable diagnostic.
And by using a URI we can get inspect things like the scheme to make better
heuristic, we could eventually make it much more complicated. But for now,
maybe something simple like:
```scala
uri.getScheme match {
"jar" => {
Paths.get(uri.getPath.split("\\.jar!").last)
}
"file" => {
uri.getPath
}
null = {
uri.getPath
}
}
```
##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/IIBase.scala:
##########
@@ -195,15 +196,32 @@ abstract class IIBase(
protected final lazy val resolvedSchemaLocation:
Option[DaffodilSchemaSource] =
LV('resolvedSchemaLocation) {
val res = schemaLocationProperty.flatMap { slText =>
- val enclosingSchemaURI = schemaFile.map { _.schemaSource.uriForLoading
}
- val optURI = XMLUtils.resolveSchemaLocation(slText, enclosingSchemaURI)
- val optSource = optURI.map { case (uri, relToAbs) =>
+ val enclosingSchemaURIAndDFP = schemaFile.map { sf =>
+ (sf.schemaSource.diagnosticFilepath, sf.schemaSource.uriForLoading)
+ }
+ val enclosingSchemaURISchemaSource = enclosingSchemaURIAndDFP
Review Comment:
Can we just do something like
```scala
val enclosingSchemaURISchemaSource = schemaFile.map { sf => sf.schemaSource }
```
Why do we extract the uri and diagnostic out of `sf.schemaSouce` and then
create a new schema source?
##########
daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala:
##########
@@ -758,7 +760,12 @@ abstract class TestCase(testCaseXML: NodeSeq, val parent:
DFDLTestSuite) {
case (None, Some(uri)) => {
//
// In this case, we have a real TDML file (or resource) to open
- URISchemaSource(uri)
+ val path = if (Misc.isNullOrBlank(uri.getPath)) {
Review Comment:
Is this in case uri is a "jar" URI? Maybe this goes away with the
`uriToDiagnosticPath` idea mentioend above?
This probably raises an issue that we want to be careful about calling
uri.getPath, since that doesn't work for "jar" uris. I imagine we don't do that
except for converting URI's to diagnostics.
##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/api/DaffodilSchemaSource.scala:
##########
@@ -58,24 +58,29 @@ sealed trait DaffodilSchemaSource {
* False otherwise.
*/
def isXSD: Boolean
+
+ /**
+ * Use to get the original file path passed into Daffodil
+ */
+ def diagnosticFilepath: String
}
object URISchemaSource {
- def apply(fileOrResource: URI) = {
- new URISchemaSource(fileOrResource)
+ def apply(fileOrResourceRaw: String, fileOrResourceUri: URI):
URISchemaSource = {
+ new URISchemaSource(fileOrResourceRaw, fileOrResourceUri)
}
}
-class URISchemaSource protected (val fileOrResource: URI) extends
DaffodilSchemaSource {
+class URISchemaSource protected (val raw: String, val uri: URI) extends
DaffodilSchemaSource {
Review Comment:
I've noticed that we treat this raw/diagnosticFilePath as a
`java.nio.file.Path` in a lot of places (e.g. we call `Path.resolveSibling`). I
wonder if we should just require this be an actual `Path` and change this type
to a `Path`, and users need to figure out what Path best represents the URI. In
most cases, it's probably just URI.getPath, but we should leave that up to the
callers. For example, if the URI is to a a `jar` file, then maybe they want to
use something similar to Misc.stripJarPrefix and convert that to a Path.
We might also want to replace diagnosticFilePath to just `diagnosticPath` to
make it clear this is just some kind of `java.nio.file.Path`.
##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/api/DaffodilSchemaSource.scala:
##########
@@ -58,24 +58,29 @@ sealed trait DaffodilSchemaSource {
* False otherwise.
*/
def isXSD: Boolean
+
+ /**
+ * Use to get the original file path passed into Daffodil
Review Comment:
Is this comment correct? This is just the diagnostic that represent this
schema source. The SchemaSource could have been found by an import or something
to, so it isn't strictly about things passed into to Daffodil. Maybe the
comment just wants to be `Diagnostic path associated with this schema source`
##########
daffodil-test/src/test/resources/org/apache/daffodil/section00/general/disallowDocTypes.tdml:
##########
@@ -59,8 +59,6 @@
<tdml:document/>
<tdml:errors>
<tdml:error>Schema Definition Error</tdml:error>
- <tdml:error>org.xml.sax.SAXParseException</tdml:error>
- <tdml:error>hasDocType-include.dfdl.xsd</tdml:error>
<tdml:error>DOCTYPE is disallowed</tdml:error>
Review Comment:
hasDocType-include.dfdl.xsd looks like a diagnosticPath. Did we lose that?
##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/IIBase.scala:
##########
@@ -195,15 +196,32 @@ abstract class IIBase(
protected final lazy val resolvedSchemaLocation:
Option[DaffodilSchemaSource] =
LV('resolvedSchemaLocation) {
val res = schemaLocationProperty.flatMap { slText =>
- val enclosingSchemaURI = schemaFile.map { _.schemaSource.uriForLoading
}
- val optURI = XMLUtils.resolveSchemaLocation(slText, enclosingSchemaURI)
- val optSource = optURI.map { case (uri, relToAbs) =>
+ val enclosingSchemaURIAndDFP = schemaFile.map { sf =>
+ (sf.schemaSource.diagnosticFilepath, sf.schemaSource.uriForLoading)
+ }
+ val enclosingSchemaURISchemaSource = enclosingSchemaURIAndDFP
+ .map { case (dfp, uri) =>
+ URISchemaSource(dfp, uri)
+ }
+ val optURISchemaSource =
+ XMLUtils.resolveSchemaLocation(slText,
enclosingSchemaURISchemaSource)
+ val optSource = optURISchemaSource.map { case (uriSchemaSource,
relToAbs) =>
schemaDefinitionWarningWhen(
WarnID.DeprecatedRelativeSchemaLocation,
relToAbs,
s"Resolving relative schemaLocations absolutely is deprecated. Did
you mean /$slText",
)
- URISchemaSource(uri)
+ // if isBootStrapSD is true, we assume we are using the
fakeXMLSchemaDocument, which means
+ // we will be passing in and receiving back an absolute
diagnosticFilepath from resolveSchemaLocation.
+ // In just this case, we want to ignore that absolute filepath and
use the diagnosticFilepath
+ // from main, which is the XMLSchemaDocument diagnosticFilepath
+ val finalUriSchemaSource = if (xsdArg.isBootStrapSD) {
Review Comment:
I believe Schema Document? To bootstrap schema compilation we create a fake
schema that imports the actual main schema. This makes some logic a little
weird since we need special cases for this bootstrap logic, but it makes a lot
of other logic more consistent, since importing the main schema is exaclyt the
same as importing any other schema.
Lola and I breifly looked at removing the fake schema and we came to the
conclusion that too many things rely on it.
##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/exceptions/SchemaFileLocatable.scala:
##########
@@ -155,5 +155,15 @@ trait SchemaFileLocatable extends LocationInSchemaFile
with HasSchemaFileLocatio
}
- override lazy val schemaFileLocation = SchemaFileLocation(this, tunables)
+ override lazy val schemaFileLocation = SchemaFileLocation(this)
+}
+
+class XercesSchemaFileLocatable(
+ xercesError: SAXParseException,
+ val diagnosticDebugName: String,
+) {
Review Comment:
Mentioned elsewhere, but probably wants to implement the
`SchemaFileLocatable` trait. I thin it simplifies other logic a bit.
##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/xml/DaffodilXMLLoader.scala:
##########
@@ -244,12 +245,18 @@ class DFDLCatalogResolver private ()
// comes from the schemaLocation attribute) on classpath or as a file.
val optURI =
try {
- val contextURI = Some(new URI(baseURIString))
+ val baseURI = new URI(baseURIString)
+ // we set diagnostic file path to empty string here because
baseURIString has a scheme
+ // and that can cause problems down the road in
resolveSchemaLocation. We don't care
+ // about the diagnosticFilepath here since this information is
from Xerces, so we're fine
+ // with it being blank
Review Comment:
I think the issue isn't necessarily the baseURIString has a scheme. I think
the core reason we do this is something like this:
DaffodilXMLLoader implements a Xerces API, and Xerces doesn't have a concept
of diagnostic path that we need for a URISchemaSource to pass in as the
context for resolveSchemaLocations. We could use some heuristic to come up with
a diagnostic path (e.g. call uriToDiagosticPath). But we don't actually use the
diagnosticPath returned by resolveSchemaLocation here, all we care about is
getting a URI for Xerces. So the diagnostic path in the schema source doesn't
really matter as long as it doesn't break resolveSchemaLocation, and an empty
diagnostic path works fine for that.
##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/xml/XMLUtils.scala:
##########
@@ -1488,8 +1499,24 @@ Differences were (path, expected, actual):
val contextURI = optContextURI.get
val optResolvedRelative = None
.orElse {
+ val contextURIDFP = contextURI.diagnosticFilepath
+ val diagnosticFilepathPaths = if (contextURIDFP.startsWith("/")) {
+ new File(contextURIDFP).toPath
+ } else {
+ Paths.get(Misc.stripJarPrefix(contextURIDFP))
+ }
+ val relativeDiagnosticFilepath =
+ if (new
File(uri.getPath).compareTo(diagnosticFilepathPaths.toFile) == 0) { // paths
are the same
+ diagnosticFilepathPaths.toString
+ } else {
+ diagnosticFilepathPaths.resolveSibling(uri.getPath).toString
+ }
Review Comment:
We need some comments about what is going on here. I'm not sure I understand
why we need this.
Seem this logic is because sometimes the diagnosticFilePath isn't a Path,
like it's a URI or something? Maybe if we require that it must be a Path,
mentioned in another comment, then this special case goes away? If
diagnosticFilePath is just a Path, we can always just call resolveSibling here
and it should just work?
##########
daffodil-cli/src/test/scala/org/apache/daffodil/cli/cliTest/TestCLISaveParser.scala:
##########
@@ -318,4 +318,20 @@ class TestCLISaveParser {
}
}
+ /**
+ * Attempted to save-parser an invalid schema so we can check the diagnostic
error for leaked information
+ */
+ @Test def test_CLI_Saving_SaveParser_error(): Unit = {
+ val schema = path(
+ "daffodil-sapi/src/test/resources/test/sapi/mySchema6.dfdl.xsd",
+ )
+
+ withTempFile { parser =>
+ runCLI(args"save-parser -s $schema $parser") { cli =>
+ cli.expectErr("[error]")
+ cli.expectErr(s"Schema context: Location line 32 column 74 in
$schema")
+ }(ExitCode.UnableToCreateProcessor)
+ }
+ }
Review Comment:
I manually ran this in a CLI and the diagostic location looks good, but I
got the same error twice:
```
[error] Schema Definition Error: Error loading schema due to src-resolve:
Cannot resolve the name 'tns:nonExistent' to a(n) 'type definition' component.
Schema context: Location line 32 column 74 in
daffodil-sapi/src/test/resources/test/sapi/mySchema6.dfdl.xsd
[error] Schema Definition Error: Error loading schema due to src-resolve:
Cannot resolve the name 'tns:nonExistent' to a(n) 'type definition' component.
Schema context: Location line 32 column 74 in
daffodil-sapi/src/test/resources/test/sapi/mySchema6.dfdl.xsd
```
I ran the same command on the main branch and only get it once (but with a
much less human readable error emssage). I'm not sure what would have changed
to break that. But this error does come from xerces and this PR does slightly
change how xerces errors are handled, maybe something is wrong there?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]