mbeckerle commented on code in PR #1337:
URL: https://github.com/apache/daffodil/pull/1337#discussion_r1799532062
##########
daffodil-test/src/test/resources/org/apache/daffodil/extensions/enum/enums.tdml:
##########
@@ -134,7 +134,7 @@
</element>
<simpleType name="myBit" dfdl:lengthKind="explicit" dfdl:length="1">
- <restriction base="xs:byte"/>
+ <restriction base="xs:unsignedByte"/>
Review Comment:
Note that there are obscure corner cases involving charset encodings like
"bits" where the number of bits could turn out to be 0 or 1 using lengthKind
'pattern'. Detecting these at runtime only is fine with me. Ex:
```
<element name="b" type="xs:byte" dfdl:representation="text"
dfdl:encoding="bits" dfdl:lengthKind="pattern"
dfdl:lengthPattern=".{0,1}(?=11111111)" />
```
That captures any zeros and ones followed by a run of 8 consecutive "1"s.
This is just a corner case, and so long as we don't abort, and if the length
turns out to be 0 or 1, we issue an error, then I'm good with it.
##########
daffodil-test/src/test/resources/org/apache/daffodil/extensions/enum/enums.tdml:
##########
@@ -134,7 +134,7 @@
</element>
<simpleType name="myBit" dfdl:lengthKind="explicit" dfdl:length="1">
- <restriction base="xs:byte"/>
+ <restriction base="xs:unsignedByte"/>
Review Comment:
Pretty sure we're going to need the compatibility tunable.
Particularly in Cyberian applications, it's quite likely that users have
schemas with signed integers of length 1 bit, since we've been tolerating that,
and their application really doesn't care if the values were 0 and -1 or 0 and
1.
But their test cases might have expected infoset XML that assumes that 0 and
-1 are the values for a signed integer of length 1.
Our regression test set doesn't include all schemas, and I think we have to
assume many others are using DFDL now, and have large sets of schemas. (I
recently learned of a group which has 300+ DFDL schemas that we have no access
to.)
So we will need a compatibility tunable like "allowSignedIntegerLength1Bit"
or something like that. False means we strictly enforce min length 2 for signed
integers. "true" means we issue a warning (which can be suppressed by usual
means) but implement the current behavior (which I think creates values 0 and
-1.
Note that issuing a warning, then assuming the user intended to use an
unsigned integer instead would be a change in behavior (producing values 0 and
1, not 0 and -1), so we definitely don't want to do that.
The warning message should suggest converting to use an unsigned integer,
and the SDE message if disallowing should also suggest that an unsigned integer
is the fix for length 1 bit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]