[
https://issues.apache.org/jira/browse/DAFFODIL-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Lawrence resolved DAFFODIL-2918.
--------------------------------------
Resolution: Fixed
Fixed in commit 9fb8337863f1277b7aa8edc0a8407fe8536ad6f5
Note that this greatly improves reproducibility but fix it in all all cases.
See DAFFODIL-2925 for the issue about reproducibility breaking due to JVM
optimizations.
> SchemaFileLocation uriString leads to non-reproducible saved parsers
> --------------------------------------------------------------------
>
> Key: DAFFODIL-2918
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2918
> Project: Daffodil
> Issue Type: Bug
> Components: Front End
> Affects Versions: 3.8.0
> Reporter: Steve Lawrence
> Priority: Minor
> Fix For: 3.9.0
>
>
> The SchemaFileLocation class contains diagnostic information, including
> line/column number, a file path used for diagnostics, a URI, etc.
> DAFFODIL-2195 made changes so that the file path used for diagnostics is
> depersonalized and and should be reproducible. However, the uriString member
> in the SchemaFileLocation is an absolute URI that is not depersonalized.
> Although this URI is only used for resolving imports, it is still serialized
> in saved parsers and so can make reproducible saved parsers if they are built
> from different directories.
> To reproduce, create a saved parser for a schema, then move that schema to a
> different directory and create the saved parser again. The saved parsers will
> have different hashes. Here's a command that can find the paths in a saved
> parser:
> cat saved-parser.bin | tail -z -n +2 | gunzip -c | strings | grep dfdl.xsd
> The tail removes the header in the saved parser file, then we uncompress the
> remaining serialized parser, pull out all the strings, and display any that
> contain a DFDL schema extension. There should be a bunch of absolute URI's
> that contain non-depersonalized paths that can cause reproducibility issues.
> To fix this, we should try to remove uriString from SchemaFileLocation by
> just passing it around to various functions, maybe storing it somewhere that
> isn't serialized. If that isn't possible, an alternative could be to make
> uriString transient--it should only ever be used to resolve imports, so once
> a saved parser is created it should never be needed again once reloaded.
> Note that it is possible that some tools (e.g. VS Daffodil Extension) might
> need the absolute URI from diagnostics. By removing uriString, they no longer
> have access to that, so we may need to add a toggle that allows
> diagnosticFile to keep the full absolute path, essentially making
> depersonalization an optional feature. Keep in mind that diagnosticFile is a
> File which can't represent jar URI's, so it may need to be changed to a
> String.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)