[ 
https://issues.apache.org/jira/browse/DAFFODIL-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Thompson closed DAFFODIL-2918.
-----------------------------------

Verified the specified commit (commit 9fb8337863f1277b7aa8edc0a8407fe8536ad6f5) 
is included in the latest pull from the daffodil repository.

Verified, via review, changes identified in the commit comment were 
implemented. 

Verified the affected daffodil subproject sbt test suites executed successfully.

Verified saved parser.bin files created with the same schema in different 
directories render the same output from the cat command specified by dev in the 
ticket.

Verified the nightly test suite schemas compile and save successfully.

Verified the nightly test suite executes successfully with no unexpected 
failures.

> SchemaFileLocation uriString leads to non-reproducible saved parsers
> --------------------------------------------------------------------
>
>                 Key: DAFFODIL-2918
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2918
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Front End
>    Affects Versions: 3.8.0
>            Reporter: Steve Lawrence
>            Priority: Minor
>             Fix For: 3.9.0
>
>
> The SchemaFileLocation class contains diagnostic information, including 
> line/column number, a file path used for diagnostics, a URI, etc.
> DAFFODIL-2195 made changes so that the file path used for diagnostics is 
> depersonalized and and should be reproducible. However, the uriString member 
> in the SchemaFileLocation is an absolute URI that is not depersonalized. 
> Although this URI is only used for resolving imports, it is still serialized 
> in saved parsers and so can make reproducible saved parsers if they are built 
> from different directories.
> To reproduce, create a saved parser for a schema, then move that schema to a 
> different directory and create the saved parser again. The saved parsers will 
> have different hashes. Here's a command that can find the paths in a saved 
> parser:
> cat saved-parser.bin | tail -z -n +2 | gunzip -c | strings | grep dfdl.xsd
> The tail removes the header in the saved parser file, then we uncompress the 
> remaining serialized parser, pull out all the strings, and display any that 
> contain a DFDL schema extension. There should be a bunch of absolute URI's 
> that contain non-depersonalized paths that can cause reproducibility issues. 
> To fix this, we should try to remove uriString from SchemaFileLocation by 
> just passing it around to various functions, maybe storing it somewhere that 
> isn't serialized. If that isn't possible, an alternative could be to make 
> uriString transient--it should only ever be used to resolve imports, so once 
> a saved parser is created it should never be needed again once reloaded.
> Note that it is possible that some tools (e.g. VS Daffodil Extension) might 
> need the absolute URI from diagnostics. By removing uriString, they no longer 
> have access to that, so we may need to add a toggle that allows 
> diagnosticFile to keep the full absolute path, essentially making 
> depersonalization an optional feature. Keep in mind that diagnosticFile is a 
> File which can't represent jar URI's, so it may need to be changed to a 
> String.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to