[ 
https://issues.apache.org/jira/browse/DAFFODIL-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Lawrence resolved DAFFODIL-2918.
--------------------------------------
    Resolution: Fixed

Fixed in commit 9fb8337863f1277b7aa8edc0a8407fe8536ad6f5

Note that this greatly improves reproducibility but fix it in all all cases. 
See DAFFODIL-2925 for the issue about reproducibility breaking due to JVM 
optimizations.

> SchemaFileLocation uriString leads to non-reproducible saved parsers
> --------------------------------------------------------------------
>
>                 Key: DAFFODIL-2918
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2918
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Front End
>    Affects Versions: 3.8.0
>            Reporter: Steve Lawrence
>            Priority: Minor
>             Fix For: 3.9.0
>
>
> The SchemaFileLocation class contains diagnostic information, including 
> line/column number, a file path used for diagnostics, a URI, etc.
> DAFFODIL-2195 made changes so that the file path used for diagnostics is 
> depersonalized and and should be reproducible. However, the uriString member 
> in the SchemaFileLocation is an absolute URI that is not depersonalized. 
> Although this URI is only used for resolving imports, it is still serialized 
> in saved parsers and so can make reproducible saved parsers if they are built 
> from different directories.
> To reproduce, create a saved parser for a schema, then move that schema to a 
> different directory and create the saved parser again. The saved parsers will 
> have different hashes. Here's a command that can find the paths in a saved 
> parser:
> cat saved-parser.bin | tail -z -n +2 | gunzip -c | strings | grep dfdl.xsd
> The tail removes the header in the saved parser file, then we uncompress the 
> remaining serialized parser, pull out all the strings, and display any that 
> contain a DFDL schema extension. There should be a bunch of absolute URI's 
> that contain non-depersonalized paths that can cause reproducibility issues. 
> To fix this, we should try to remove uriString from SchemaFileLocation by 
> just passing it around to various functions, maybe storing it somewhere that 
> isn't serialized. If that isn't possible, an alternative could be to make 
> uriString transient--it should only ever be used to resolve imports, so once 
> a saved parser is created it should never be needed again once reloaded.
> Note that it is possible that some tools (e.g. VS Daffodil Extension) might 
> need the absolute URI from diagnostics. By removing uriString, they no longer 
> have access to that, so we may need to add a toggle that allows 
> diagnosticFile to keep the full absolute path, essentially making 
> depersonalization an optional feature. Keep in mind that diagnosticFile is a 
> File which can't represent jar URI's, so it may need to be changed to a 
> String.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to