stevedlawrence commented on code in PR #1065:
URL: https://github.com/apache/daffodil/pull/1065#discussion_r1496045236


##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/DFDLSchemaFile.scala:
##########
@@ -45,16 +45,42 @@ class DFDLSchemaFileLoadErrorHandler(schemaFileLocation: 
SchemaFileLocation)
   private def loaderErrors = loaderErrors_
   private def loaderWarnings = loaderWarnings_
 
-  private def loaderSDEs: Seq[Diagnostic] = loaderErrors.map {
-    new SchemaDefinitionError(schemaFileLocation, "Error loading schema due to 
%s", _)
+  private def loaderSDEs: Seq[Diagnostic] = loaderErrors.map { err =>
+    val errMessage = err.getMessage

Review Comment:
   It's worth adding a comment why we are creating a custom SchemaFileLocation 
here instead of using the on passed in. I believe the reason is because these 
errors come from Xerces and it knows the details of line/column/file 
information, but the schemaFileLocation passed into this class only knows the 
file location. Creating a custom location context based on the Xerces error 
gives more information.
   
   In fact, should we just remove the `schemaFileLocation` parameter from the 
constructor? I think that parameter is no longer used and any errors in from 
Xerces probably do want to use this?



##########
daffodil-cli/src/main/scala/org/apache/daffodil/cli/Main.scala:
##########
@@ -1150,10 +1162,15 @@ class Main(
             }
 
             val infosetType = parseOpts.infosetType.toOption.get
+            val parseOptsSchema = if (parseOpts.schema.isDefined) {
+              Option(parseOpts.schema()._2)
+            } else {
+              None
+            }
             val infosetHandler = InfosetType.getInfosetHandler(
-              parseOpts.infosetType.toOption.get,
+              infosetType,
               processor,
-              parseOpts.schema.toOption,
+              parseOptsSchema,

Review Comment:
   I think you can do something like to get the URI:
   ```scala
   parseOpts.schema.map(_._2).toOption
   ```
   
   Note that we can also change `parseOpts.infosetType.toOption.get` to just 
`parseOpts.infosetType()`. In fact, if we do that, I suggest we just remove the 
`infosetType` variable--I don't think it really adds anything over 
`parseOpts.infosetType()` is only used in that one spot.
   
   Another thought, it might be useful to create a case class like this
   
   ```scala
   case class SchemaOption(raw: String, uri: URI)
   ```
   
   And then have the value converter return `SchmaOption(s, uri)` instead of a 
tuple.
   
   Then the we can use `.raw` or `.uri` to access what we want instead of `._1` 
or `._2`, some examples:
   ```scala
   parseOpts.schema.map(_.uri).toOption
   ```
   
   This CLI code is so spread out that it's hard to remember what the values of 
tuples represent, this makes that a little easier.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/DFDLSchemaFile.scala:
##########
@@ -45,16 +45,42 @@ class DFDLSchemaFileLoadErrorHandler(schemaFileLocation: 
SchemaFileLocation)
   private def loaderErrors = loaderErrors_
   private def loaderWarnings = loaderWarnings_
 
-  private def loaderSDEs: Seq[Diagnostic] = loaderErrors.map {
-    new SchemaDefinitionError(schemaFileLocation, "Error loading schema due to 
%s", _)
+  private def loaderSDEs: Seq[Diagnostic] = loaderErrors.map { err =>
+    val errMessage = err.getMessage
+    val lineNumber = err.getLineNumber
+    val columnNumber = err.getColumnNumber
+    val newSchemaLocation = SchemaFileLocation(
+      Option(lineNumber.toString),
+      Option(columnNumber.toString),
+      schemaFileLocation.uriString,
+      schemaFileLocation.diagnosticFilepath,
+      schemaFileLocation.toString,
+      "Schema File",

Review Comment:
   I did a test to see what this looks like and it's something like this:
   
   > Schema context: Schema File Location line 32 column 74 in 
daffodil-sapi/src/test/resources/test/sapi/mySchema6.dfdl.xsd
   
   I wonder if `Schema File` doesn't really add anything and can be removed? 
Normally this would be something like `Schema context: ex:someElement Location 
...` where `ex:someElement` is helpful information, but we don't have access to 
that when errors come from Xerces--we just have to rely on Xerces include that 
in the error message.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/IIBase.scala:
##########
@@ -203,7 +203,13 @@ abstract class IIBase(
             relToAbs,
             s"Resolving relative schemaLocations absolutely is deprecated. Did 
you mean /$slText",
           )
-          URISchemaSource(uri)
+          val slTextUri = new URI(slText)
+          val schemaPath = if (slTextUri.isAbsolute) {
+            this.xsdArg.diagnosticFilepath

Review Comment:
   Isn't `this.xsdArg` the importing file, not the file being imported?
   
   Also, something to consider is `URI.isAbsolute` basically just tests if the 
URI has a scheme, it's different from the concept of "absolute" and "relative" 
paths. Very few schemaLocationProperties are absolute in the URI sense, and so 
this is just going to use the other branch.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaSet.scala:
##########
@@ -139,6 +139,8 @@ final class SchemaSet private (
    */
   override lazy val uriString: String = schemaSource.uriForLoading.toString
 
+  override lazy val diagnosticFilepath: String = 
schemaSource.diagnosticFilepath

Review Comment:
   I'm not sure I understand this since a schemaSet is a collection of schemas, 
I thought. But there is a uriString above, so I'm not sure why a SchemaSet 
would have a uriStrig either. I guess it's not clear to me what a SchemaSet 
represents.



##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/api/DaffodilSchemaSource.scala:
##########
@@ -167,7 +178,7 @@ protected sealed abstract class NodeSchemaSourceBase(
 ) extends URISchemaSource({
     val tempSchemaFile = XMLUtils.convertNodeToTempFile(node, tmpDir.orNull, 
nameHint)
     val tempURI = tempSchemaFile.toURI
-    tempURI
+    (Misc.stripJarPrefix(tempURI.getPath), tempURI)

Review Comment:
   Similar, NodeSchemaSources are only used for tests, so this might not 
matter. Using the tempURI.getPath is probably fine for these.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/compiler/Compiler.scala:
##########
@@ -325,7 +325,7 @@ class Compiler private (
     optRootName: Option[String] = None,
     optRootNamespace: Option[String] = None,
   ): ProcessorFactory = {
-    val source = URISchemaSource(file.toURI)
+    val source = URISchemaSource((file.getPath, file.toURI))

Review Comment:
   Thoughts on using separate parameters instead of a tuple for the 
URISchemaSource constructor?



##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/exceptions/SchemaFileLocatable.scala:
##########
@@ -35,24 +32,42 @@ trait HasSchemaFileLocation extends LookupLocation {
 }
 
 object SchemaFileLocation {
-  def apply(context: SchemaFileLocatable, tunables: DaffodilTunables) =
+  def apply(context: SchemaFileLocatable) =
     new SchemaFileLocation(
       context.lineNumber,
       context.columnNumber,
       context.uriString,
+      context.diagnosticFilepath,
       context.toString,
       context.diagnosticDebugName,
-      tunables.maxParentDirectoriesForDiagnostics,
     )
+
+  def apply(
+    lineNumber: Option[String],
+    columnNumber: Option[String],
+    uriString: String,
+    diagnosticFilepath: String,
+    contextToString: String,
+    diagnosticDebugName: String,
+  ) = {
+    new SchemaFileLocation(
+      lineNumber,
+      columnNumber,
+      uriString,
+      diagnosticFilepath,
+      contextToString,
+      diagnosticDebugName,
+    )
+  }

Review Comment:
   This essentially just makes the `SchemaFileLocation` constructor public. If 
we wanted to keep it that way we could create a custom 
`XercesSchmeaFileLocatable`. Then the places where we use this new constructor, 
we instead do something like
   ```scala
   val xsfl = new XerexSchemaFileLocatable(xercesError, ....)
   val schemaLocation = SchemaLocation(xsfl)
   new SchemaDefinitionError(schemaLocation, ...)
   ```
   ``` 



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/Import.scala:
##########
@@ -101,7 +101,8 @@ final class Import(importNode: Node, xsd: 
XMLSchemaDocument, seenArg: IIMap)
         val uri = resolver.resolveURI(ns.toString)
         if (uri == null) None
         else {
-          val res = URISchemaSource(URI.create(uri))
+          val res =
+            URISchemaSource((schemaSet.schemaSource.diagnosticFilepath, 
URI.create(uri)))

Review Comment:
   Is using `schemaSet.schemaSource` correct? I think `schemaSet` contains 
references to all included schemas, and `schemaSet.schemaSource` is the root 
schema. The diagnostic for this imported schema should not be the root schema.
   
   That said, I'm not sure exactly what to use here. In this case, the URI is 
found using the "namespace" which is pretty different from how `schemaLocation` 
is resolved. Might need some more thought put into this part... 



##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/api/DaffodilSchemaSource.scala:
##########
@@ -158,6 +167,8 @@ class InputStreamSchemaSource(
     inSrc
   }
   override def uriForLoading = tempURI
+
+  override def diagnosticFilepath: String = 
Misc.stripJarPrefix(tempURI.getPath)

Review Comment:
   The tempURI will never be in a jar. I think the diagnostic file path for an 
InputStreamSchemaSource can just be the full URI path. I believe this is only 
used for tests so the diganostic probably doesn't matter that much.
   
   Alternativeyl, we could allow users to pass in a diagnostic path, but I'm 
not sure it matters.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/DFDLSchemaFile.scala:
##########
@@ -137,9 +163,16 @@ final class DFDLSchemaFile(
 
   override lazy val optXMLSchemaDocument = iiParent.optXMLSchemaDocument
 
-  override lazy val uriString = schemaSource.uriForLoading.toString
+  override lazy val uriString = {
+    val uriString_ = schemaSource.uriForLoading.toString
+    uriString_
+  }

Review Comment:
   Was this added for debugging?



##########
daffodil-cli/src/test/scala/org/apache/daffodil/cli/cliTest/TestCLISaveParser.scala:
##########
@@ -318,4 +318,21 @@ class TestCLISaveParser {
     }
   }
 
+  /**
+   * Attempted to save-parser an invalid schema so we can check the diagnostic 
error for leaked information
+   */
+  @Test def test_CLI_Saving_SaveParser_error(): Unit = {
+    val schema = path(
+      "daffodil-sapi/src/test/resources/test/sapi/mySchema6.dfdl.xsd",
+    )
+
+    withTempFile { parser =>
+      runCLI(args"save-parser -s $schema $parser") { cli =>
+        cli.expectErr("[error]")
+        cli.expectErr("Schema context")
+        cli.expectErr("daffodil-sapi")

Review Comment:
   Can we expect the full `Schema Context` line since this test is about making 
sure we aren't outputting absolute paths? For example, something like this:
   ```scala
   cli.expectErr("Schema context: ... 
daffodil-sapi/src/test/resources/test/sapi/mySchema6.dfdl.xsd")
   ```
   
   This way if we something ever changes and we output 
`file://some/absolute/path/daffodil-sapi/src/test/....` this test will fail and 
catch it? Right now this test would still pass even if that happened.



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SchemaDocIncludesAndImportsMixin.scala:
##########
@@ -144,6 +144,12 @@ trait SchemaDocIncludesAndImportsMixin { self: 
XMLSchemaDocument =>
     this.uriStringFromAttribute.getOrElse("file:unknown")
   }
 
+  override lazy val diagnosticFilepath = if (self.schemaFile.isDefined) {
+    self.schemaFile.get.diagnosticFilepath
+  } else {
+    self.schemaSet.diagnosticFilepath

Review Comment:
   Mentioned elsewhere, but are we sure this is correct? Isn't this just the 
filepath of the root schema? I think getting information from the schemaSet 
might not be correct? I'm not even sure the schemaSet should have a 
diagnosticFilePath since it's a set of schemas?



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/DFDLSchemaFile.scala:
##########
@@ -137,9 +163,16 @@ final class DFDLSchemaFile(
 
   override lazy val optXMLSchemaDocument = iiParent.optXMLSchemaDocument
 
-  override lazy val uriString = schemaSource.uriForLoading.toString
+  override lazy val uriString = {
+    val uriString_ = schemaSource.uriForLoading.toString
+    uriString_
+  }
+
+  override def diagnosticFilepath = schemaSource.diagnosticFilepath
 
-  override protected lazy val diagnosticDebugNameImpl = 
schemaSource.uriForLoading.toString
+  override protected lazy val diagnosticDebugNameImpl = {
+    sset.schemaSource.diagnosticFilepath

Review Comment:
   Can this just be `lazy val diagnosticDebugNameImpl = diagnosticFilePath`



##########
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/IIBase.scala:
##########
@@ -203,7 +203,13 @@ abstract class IIBase(
             relToAbs,
             s"Resolving relative schemaLocations absolutely is deprecated. Did 
you mean /$slText",
           )
-          URISchemaSource(uri)
+          val slTextUri = new URI(slText)
+          val schemaPath = if (slTextUri.isAbsolute) {
+            this.xsdArg.diagnosticFilepath
+          } else {
+            Misc.stripJarPrefix(slTextUri.getPath)

Review Comment:
   Looking at stripJarPrefix, this just gets everything after `.jar!` in the 
URI. So if slTextURI isn't found a jar (e.g. schemaLocation is relative path 
and the enclossingSchema is a file, then it's going to be a full path, which 
might not be what we want.
   
   I'm not sure what the right solution is, but this feels like we might be 
missing some edge cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to