[
https://issues.apache.org/jira/browse/DAFFODIL-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324006#comment-17324006
]
Mike Beckerle commented on DAFFODIL-2208:
-----------------------------------------
Current Daffodil behavior agrees with the DFDL v1.0 specification.
Section 9.3.2.1 clearly indicates that tests of the representation type first
try nil representation, then empty, then normal in that order. Hence, a
zero-length string with no framing will always be considered the empty
representation and will NOT be added to the infoset if optional. If required,
then if there is no default value, the behavior will depend on the
dfdl:emptyElementParsePolicy. If 'treatAsEmpty' an empty string would be
created as the element value. If 'treatAsAbsent' a parse error occurs. But this
is only in the required case.
The only way to make the use case described here work, where
{code:java}
data//data{code}
needs to round trip preserving the empty string between the "/" characters, is
to use nillable elements with nilValue="%ES;"
Unfortunately, there are other things we were hoping to use the nilValue for.
E.g.,
{code:java}
data/-/data{code}
But these are not alternative representations for the same concept. That is,
one cannot use both %ES; and "-" as alternative nilValues.
So the only way to model this in DFDL, preserving all information and such that
a parse/unparse preserves the data exactly, is to model each data item as a
complex type element. A nest of two nillable elements can provide the necessary
alternatives.
{code:java}
<element name="data" nillable="true" dfdl:nilValue="%ES;">
<complexType>
<sequence>
<element name="value" nillable="true" dfdl:nilValue="-" type="xs:string"/>
</sequence>
</complexType>
</element>{code}
This allows this data string:
{code:java}
a//b/-/c {code}
to become
{code:java}
<data><value>a</value></data>
<data xsi:nil="true"/>
<data><value>b</value></data>
<data><value xsi:nil="true"/></data>
<data><value>c</value></data>
{code}
This will round trip preserving all characters of the original string.
> Empty strings never allowed as optional repeats - not compliant with DFDL
> spec.
> -------------------------------------------------------------------------------
>
> Key: DAFFODIL-2208
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2208
> Project: Daffodil
> Issue Type: Bug
> Components: Back End
> Affects Versions: 2.4.0
> Reporter: Mike Beckerle
> Assignee: Mike Beckerle
> Priority: Major
>
> Exerpts here from emails on the [[email protected]|mailto:[email protected]]
> mailing list.
> {noformat}
> Problem: simple format that is impossible to model
> InboxxMike Beckerle <[email protected]> 1:47 PM (35 minutes ago)
> to DFDL-WG
> I have a dead-simple little format:
> data/data/data/data
> data/data/data/data
> it is lines of "/" separated strings. All elements are optional.
> I simply want this:
> data//data
> to round trip. For that to happen I need it to parse into
> <field>data</field><field></field><field>data</field>
> That is, I require that empty field element in the middle to be created and
> put into the infoset.
> I can find no way to do this.
> The
> strings have no initiator/terminator, so dfdl:emptyValueDelimiterPolicy
> is not relevant. All the elements are optional, so default values
> aren't relevant.
> The spec states:
> 9.4.2.2 Simple element (xs:string or xs:hexBinary)
> Required occurrence: If the element has a default value then an item is
> added to the infoset using the default value, otherwise an item is added
> to the Infoset using empty string (type xs:string) or empty hexBinary
> (type xs:hexBinary) as the value.
> Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none'[12] then
> an item is added to the Infoset using empty string (type xs:string) or empty
> hexBinary (type xs:hexBinary) as the value, otherwise nothing is added to the
> Infoset.
> There
> are errata/actions to clarify wording here around
> dfdl:emptyValueDelimiterPolicy being in effect or not (because there is
> no initiator/terminator for it to use as opposed to the property in
> isolation just being 'none').
> But that doesn't change anything about this issue.
> If this very simple format is not possible, then we need a property or new
> property enum value that makes it possible.
> Thoughts?{noformat}
> Subsequently to that I figured out what I believe is the spec flaw.
>
> {noformat}
> To start discussion on my own issue.....
> The problem here may be that for a string (or hexBinary), if there is no
> initiator/terminator, there is no way to distinguish EmptyRep from NormalRep.
> I.e., an empty string is a "normal" value for a string.
> Sections 9.2.3 and 9.2.4 seem to define EmptyRep and NormalRep such that an
> empty string will be a EmptyRep, not a NormalRep.
> However section 9.2.5 on zero-length says:
> "The normal representation can be a zero-length representation if the type
> is xs:string or xs:hexBinary and there is no framing."
> That suggests that when there is no framing, a zero-length string is
> NormalRep, not EmptyRep, which is the opposite conclusion from what is in
> sections 9.2.3 and 9.2.4.
> If this latter clarification is correct, then my format *should* work as I
> expect, because the empty string elements will be considered NormalRep and
> infoset values will be created for them.
> It simply doesn't work because of a bug in daffodil which has not interpreted
> this correctly.{noformat}
> That's the bug to fix: Strings and HexBinary with no framing are NormalRep,
> not EmptyRep.
>
> Note that some tests in our test suite will have to be revised to take this
> into account.
> Behavior for public schemas should not change, as the above behavior is all
> subject to the new property (still a proposal) dfdlx:emptyElementParsePolicy
> being "treatAsEmpty" (the enum names are subject to change).
> The IBM-created schemas for EDIFACT and others depend on a behavior in IBM
> DFDL that we call dfdlx:emptyElementParsePolicy='treatAsMissing' (again enums
> subject to change). That behavior doesn't allow empty strings to be
> distinguished from absent strings. Under that policy the behavior of daffodil
> shouldn't change, so those schemas should still interoperate.
> The need for this bug fix is so as to be able to implement a generic schema
> for a format called USMTF, which is unfortunately, not public. But the
> simplified examples above illustrate the issue.
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)