stevedlawrence commented on code in PR #1065:
URL: https://github.com/apache/daffodil/pull/1065#discussion_r1284593547


##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/util/Misc.scala:
##########
@@ -660,4 +660,15 @@ object Misc {
     }
     Some(res)
   }
+
+  /**
+   * Replace HOME env var in Path with ~
+   */
+  def depersonalizePath(path: String): String = {
+    path.replace(
+      System.getenv("HOME"),

Review Comment:
   Daffodil never looks at the home directory when resolving schemas unless 
it's on the classpath so if feels like we shouldn't need this. Feels like this 
is a sign that maybe our approach isn't right.
   
   This also doesn't fully depersonalize schema paths. For example, if my 
schemas are in `/home/slawrence/mycompany/confidential/schemas/`, there's sill 
going to be personal information in the path. Same issue if schemas are in a 
different mount point like `/dev/myschemas/`. I'm wondering if there's a 
different approach we can take?
   
   I'm wondering if a different approach could tie into recent discussions 
about absolute vs relative schemaLocation paths?
   
   If an import schemaLocation is absolute (i.e. starts with slash), then that 
is what we display for whatever it resolves to. For example, if we have 
schemaLocation="/foo/bar/baz.dfdl.xsd" and that resolves to 
`jar:file:/home/slawrence/.ivy1/pom/jars/foo.jar!/foo/bar/baz.dfdl.xsd`, then 
the display schemaLocation is just `/foo/bar/baz.dfdl.xsd`, since that was the 
absolute schemaLocation that found it.
   
   If an import schemaLocation is relative (i.e. doesn't start with a slash), 
then the display location becomes the schemaLocation resolved relative to the 
display URI of including schema. So for example, if the above `baz.dfdl.xsd` 
file imported `../qaz.dfdl.xsd`, then display schemaLocation of that file be 
calculated like `URI("/foo/bar/baz.dfdl.xsd).resolve("../qaz.dfdl.xsd")` which 
would become `URI("/foo/qaz.dfdl.xsd")`. Note that this is very basically what 
we do to resolve relative paths, so the logic is very similar. We are just 
resolving reslative to the the dispaly schema location of the importing schema 
it's actual schema location
   
   So we always end up with an absolute URI as the display schemaLocation. I 
imagine this is very similar to the results you get by looking at the 
classpath, but should be less finicky. For example, Misc.classpath wont' always 
return somethign if it's using a different classloader, which we have talk 
about for things like making plugins easier.



##########
daffodil-lib/src/main/scala/org/apache/daffodil/lib/api/Diagnostic.scala:
##########
@@ -263,7 +263,12 @@ trait LocationInSchemaFile {
   ): String = {
     // works for both windows and unix
     val splitter = "/"
-    val tokens = uriString.split(splitter)
+    val classpathTrimmedUriString =
+      uriString.replace(
+        Misc.classPath.find(cp => 
uriString.startsWith(cp.toString)).getOrElse("").toString,

Review Comment:
   Mentioned elsewhere, but Misc.classpath is not very reliable. If a differnet 
classloader is used than a URLClassLoader (which we have discussed) this is 
going to break. Suggested an alternative in other comments about using 
schemaLocation instead.



##########
daffodil-cli/src/test/scala/org/apache/daffodil/cli/cliTest/TestCLISaveParser.scala:
##########
@@ -318,4 +318,18 @@ class TestCLISaveParser {
     }
   }
 
+  @Test def test_CLI_Saving_SaveParser_error(): Unit = {
+    val schema = path(
+      "daffodil-sapi/src/test/resources/test/sapi/mySchema6.dfdl.xsd",
+    )
+
+    withTempFile { parser =>
+      runCLI(args"save-parser -s $schema $parser") { cli =>
+        cli.expectErr("[error]")
+        cli.expectErr("systemId: file:~")
+        cli.expectErr("Schema context: file:~")

Review Comment:
   Instead of outputting `file:~/...`, it feels like this wants to be something 
like:
   ```
   Schema context: 
daffodil-sapi/src/test/resources/test/sapi/mySchema6.dfdl.xsd",
   ```
   since that is what the user provided. Similar to my other comment about how 
we want to use the impor/include schemaLocation as a hint of what to display, I 
think for the root schema we want to display the path that was actually 
provided.
   
   As another example, if we had foo.jar on DAFFODIL_CLASSPATH and ran 
something like:
   ```
   daffodil save-parser -s /some/file/in/foo/jar/dfdl.xsd
   ```
   And that resolves to a path in foo.jar, then I think the schema context we 
output wants to be exactly `/some/file/in/foo/jar/dfdl.xsd`.
   
   So I suggested that based on my other comment, when we resolve a schema, we 
either want to use the path directly (in the case of importing absolute 
schemaLocations or locations without an importing schema like this one), or 
resolve the schemaLocation relative to the importing display location in the 
case of relative imports with a context.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to