stevedlawrence commented on PR #1605:
URL: https://github.com/apache/daffodil/pull/1605#issuecomment-3700161979

   It looks like there are two tests related to spaces in names that are 
failing. I don't think they are unrelated failures, but are due to additional 
changes needed to support spaces. One test is the new spaces_in_names_01 test, 
which fails with this stack trace:
   
   ```
   Test org.apache.daffodil.section00.general.TestSpace2.spaces_in_names_01 
failed: java.lang.IllegalArgumentException: schemaLocation is not a valid URI: 
test space/test 2/multi_A_05_nons.dfdl.xsd, took 0.225 sec
   [error]     at 
org.apache.daffodil.lib.xml.XMLUtils$.resolveSchemaLocation(XMLUtils.scala:1473)
   [error]     at 
org.apache.daffodil.core.dsom.IIBase.$anonfun$4(IIBase.scala:203)
   [error]     at scala.Option.flatMap(Option.scala:283)
   [error]     at 
org.apache.daffodil.core.dsom.IIBase.resolvedSchemaLocation$lzyINIT1$$anonfun$1(IIBase.scala:198)
   [error]     at 
org.apache.daffodil.lib.oolag.OOLAG$OOLAGValueBase.body$lzyINIT1(OOLAG.scala:523)
   [error]     at 
org.apache.daffodil.lib.oolag.OOLAG$OOLAGValueBase.body(OOLAG.scala:523)
   [error]     at 
org.apache.daffodil.lib.oolag.OOLAG$OOLAGValueBase.valueAsAny(OOLAG.scala:708)
   ...
   Caused by: java.net.URISyntaxException: Illegal character in path at index 
4: test space/test 2/multi_A_05_nons.dfdl.xsd
   [error]     at java.net.URI$Parser.fail(URI.java:2995)
   [error]     at java.net.URI$Parser.checkChars(URI.java:3166)
   [error]     at java.net.URI$Parser.parseHierarchical(URI.java:3248)
   [error]     at java.net.URI$Parser.parse(URI.java:3207)
   [error]     at java.net.URI.<init>(URI.java:645)
   [error]     at 
org.apache.daffodil.lib.xml.XMLUtils$.resolveSchemaLocation(XMLUtils.scala:1468)
   ```
   
   The "Caused by" part of the exception indicates we probably we have 
something like `schemaLocation="test space/test 2/multi_A_05_nons.dfdl.xsd"`, 
which we can find in the test schema files in the "test spaces/" directory.
   
   So it seems like we don't currently support spaces in the 
resolveSchemaLocation function. We may need further investigations to figure 
out how to correctly fix things and what other functions might need changes.
   
   I think the core issue is that some functions expect string paths to be 
valid URIs where spaces are escaped with %20 (e.g. resolveSchemaLocation, 
optRelativeFileURI, optRelativeResourceURI), but some expect them to be normal 
paths without escapes (e.g. getResourceOption, getRequiredResource) and some 
I'm not sure it's very well defined (e.g. getResourceAbsoluteOrRelativeOption). 
I'm sure there are other functions not listed that have some built-in 
assumptions.
   
   It feels like the right approach to convert our strings to `URI`s as early 
as possible and pass the `URI` around instead of strings. That way we know 
exactly what we are dealing with. And if we ever need to call a function that 
expects an unescaped string (e.g. `getResourceOption`), then we can call 
`uri.getPath()` which will do that unescaping and provide us a string that can 
be used for something like `Class.getResource()`.
   
   Though, this could potentially have a pretty big cascading effect. Changing 
one variable to a `URI` might require other functions to need `URI`s and it 
might turn into a significant change. Another potential issue is some 
functions, while not technically part of the public API, are used externally. 
For example, `getRequiredResource`, `getResourceOption`, etc. are used in a lot 
of tests, and we don't really want to have to change those all to use 
URIs--they still really do want to use strings.
   
   Or alternatively, instead of changing things to URIs we just kindof have a 
convention that we expect these string paths to be valid URIs with escaped 
spaces (and sometimes we'll make escape things for the user), and the functions 
that don't want escapes can unescape the strings before use.
   
   So all this to say, I'm not really sure what the perfect approach is. It 
might require making some changes and see what gets affect, what tests start 
failing, etc. and use that to guide us to the best solution.
   
   You're welcome to dig into this and we're happy to help, but this might also 
be veering into an issue that isn't quite as beginner friendly since it might 
start touching a lot of the codebase and dealing with undocumented assumptions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to