stevedlawrence commented on pull request #674:
URL: https://github.com/apache/daffodil/pull/674#issuecomment-967322823
I dug into this issue a bit and think I've track down the cause, still not
sure of the best fix. Here's a pretty minimal schema that reproduces the issue:
```xml
<xs:schema
xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:include
schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:defineVariable defaultValue="x" name="var1" type="xs:string"/>
<dfdl:format ref="GeneralFormat" />
</xs:appinfo>
</xs:annotation>
<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element name="opt" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:sequence>
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert test="{ $var1 eq 'x' }" />
</xs:appinfo>
</xs:annotation>
</xs:sequence>
<xs:sequence>
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert test="{ $var1 eq 'y' }" />
</xs:appinfo>
</xs:annotation>
</xs:sequence>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
```
So we define a variable with a default value, in this case `x`. We have an
optional element which creates a point of uncertainty. Within this PoU we read
the variable twice, once in the first assert (which passes) and again in the
second assert (which fails). The second assert fails and so we need to back
track to the PoU and undo the changes to the variable state.
How we undo this state is broken. For our variables, the variables
themselves keep track of the current state and the prior state. When we need to
reset back to the PoU, sometimes we use this prior state, but sometimes we just
make assumptions about what the prior state should have been. These assumptions
aren't always correct. This test finds a case where this isn't correct.
What happens here is we read a variable twice, so we mark it as needing to
be reset to the prior states twice to get back to the correct state. But the
second read didn't actually change the state. So when we reset once we reset to
VariableDefined. And when we reset again we hit this case. Note that in some
cases we do actually need to reset twice, so it's not just a matter of ignoring
duplicate resets.
I think the core issue here is that variables are trying to keep track of
their own state, and only really keep track of their prior state, and then we
make assumptions from there. I think the *right* fix is probably to handle
variables like we do other state and PoU's and just take a snapshot. When we
reset back to the PoU we just restore that snapshot. Unfortunately, variables
are a bit more complicated and are mutable, so taking a snapshot may add some
complication and overhead.
Maybe another option is instead of PState snapshotting the entire variable
map, we instead just snapshot the state of each individual value prior to the
change. So it's a sort of copy-on-write behavior. Then when we reset back to
the PoU, we just need to reset back to the state of things that were changed.
That might be a better approach with less overhead.
Note that both EDIFACT and NACHA hit this double-read issue but in different
ways. NACHA reads an `encoding` variable multiple times in a PoU, and EDIFACT
reads variables related to escape schemas multiple times in a PoU.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]