[
https://issues.apache.org/jira/browse/DAFFODIL-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939692#comment-16939692
]
Mike Beckerle commented on DAFFODIL-2208:
-----------------------------------------
This change is not yet agreed by the DFDL Workgroup. Daffodil behavior may be
properly compliant with the spec.
Section 9.2.5 may simply be incorrect that zero-length strings with no framing
can be NormalRep.
Discussion on the mailing list suggests a mode where ZL strings are NormalRep
one where ZL strings are EmptyRep, with NormalRep being a NEW mode of behavior.
(using dfdlx:emptyElementParsePolicy - a new enum value for it.)
> Empty strings never allowed as optional repeats - not compliant with DFDL
> spec.
> -------------------------------------------------------------------------------
>
> Key: DAFFODIL-2208
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2208
> Project: Daffodil
> Issue Type: Bug
> Components: Back End
> Affects Versions: 2.4.0
> Reporter: Mike Beckerle
> Assignee: Mike Beckerle
> Priority: Major
> Fix For: 2.5.0
>
>
> Exerpts here from emails on the [[email protected]|mailto:[email protected]]
> mailing list.
> {noformat}
> Problem: simple format that is impossible to model
> InboxxMike Beckerle <[email protected]> 1:47 PM (35 minutes ago)
> to DFDL-WG
> I have a dead-simple little format:
> data/data/data/data
> data/data/data/data
> it is lines of "/" separated strings. All elements are optional.
> I simply want this:
> data//data
> to round trip. For that to happen I need it to parse into
> <field>data</field><field></field><field>data</field>
> That is, I require that empty field element in the middle to be created and
> put into the infoset.
> I can find no way to do this.
> The
> strings have no initiator/terminator, so dfdl:emptyValueDelimiterPolicy
> is not relevant. All the elements are optional, so default values
> aren't relevant.
> The spec states:
> 9.4.2.2 Simple element (xs:string or xs:hexBinary)
> Required occurrence: If the element has a default value then an item is
> added to the infoset using the default value, otherwise an item is added
> to the Infoset using empty string (type xs:string) or empty hexBinary
> (type xs:hexBinary) as the value.
> Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none'[12] then
> an item is added to the Infoset using empty string (type xs:string) or empty
> hexBinary (type xs:hexBinary) as the value, otherwise nothing is added to the
> Infoset.
> There
> are errata/actions to clarify wording here around
> dfdl:emptyValueDelimiterPolicy being in effect or not (because there is
> no initiator/terminator for it to use as opposed to the property in
> isolation just being 'none').
> But that doesn't change anything about this issue.
> If this very simple format is not possible, then we need a property or new
> property enum value that makes it possible.
> Thoughts?{noformat}
> Subsequently to that I figured out what I believe is the spec flaw.
>
> {noformat}
> To start discussion on my own issue.....
> The problem here may be that for a string (or hexBinary), if there is no
> initiator/terminator, there is no way to distinguish EmptyRep from NormalRep.
> I.e., an empty string is a "normal" value for a string.
> Sections 9.2.3 and 9.2.4 seem to define EmptyRep and NormalRep such that an
> empty string will be a EmptyRep, not a NormalRep.
> However section 9.2.5 on zero-length says:
> "The normal representation can be a zero-length representation if the type
> is xs:string or xs:hexBinary and there is no framing."
> That suggests that when there is no framing, a zero-length string is
> NormalRep, not EmptyRep, which is the opposite conclusion from what is in
> sections 9.2.3 and 9.2.4.
> If this latter clarification is correct, then my format *should* work as I
> expect, because the empty string elements will be considered NormalRep and
> infoset values will be created for them.
> It simply doesn't work because of a bug in daffodil which has not interpreted
> this correctly.{noformat}
> That's the bug to fix: Strings and HexBinary with no framing are NormalRep,
> not EmptyRep.
>
> Note that some tests in our test suite will have to be revised to take this
> into account.
> Behavior for public schemas should not change, as the above behavior is all
> subject to the new property (still a proposal) dfdlx:emptyElementParsePolicy
> being "treatAsEmpty" (the enum names are subject to change).
> The IBM-created schemas for EDIFACT and others depend on a behavior in IBM
> DFDL that we call dfdlx:emptyElementParsePolicy='treatAsMissing' (again enums
> subject to change). That behavior doesn't allow empty strings to be
> distinguished from absent strings. Under that policy the behavior of daffodil
> shouldn't change, so those schemas should still interoperate.
> The need for this bug fix is so as to be able to implement a generic schema
> for a format called USMTF, which is unfortunately, not public. But the
> simplified examples above illustrate the issue.
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)