stevedlawrence commented on code in PR #976:
URL: https://github.com/apache/daffodil/pull/976#discussion_r1123639796
##########
daffodil-test/src/test/resources/org/apache/daffodil/section06/entities/Entities.tdml:
##########
@@ -495,7 +495,7 @@ is multiple bytes in UTF-8 encoding that is used -
DFDL-6-042R"
is multiple bytes in UTF-8 encoding that is used"
model="Entities_01-Embedded.dfdl.xsd" root="seq_10" roundTrip="false">
<tdml:document>
- <tdml:documentPart type="byte">30 ab 31 32 32 7f</tdml:documentPart>
+ <tdml:documentPart type="byte">30 c2 ab 31 32 32 7f</tdml:documentPart>
Review Comment:
Yeah, I think former is what would need to happen. Which is very different
that what happens now. Right now Daffodil decode bytes as it seems them in the
input stream and then looks for an encoded terminator. Delimiter scanning has
no concept of raw bytes, so if a raw byte entities doesn't map to the same byte
has a non raw byte entitity, things will get funky.
And my guess is we would need a significant redeisgn to how delimiter
scanning works so it doesn't decode on the fly, or scanning does some sort of
lookahead prior to decoding or something. Probably not trivial.
Yes, I think DAFFODIL-2102 should be removed from 3.5.0. I don't think we
realized it was related to raw byte entities when we assigned it to this
release. We should either keep DAFFODIL-2102 open, or we could close as a
duplicate of DAFFODI-258 and add a comment mentioning the test to make sure it
is fixed when DAFFODIL-258 is fixed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]