So the relevant grammar productions are in LocalElementGrammarMixin.scala, and
there are lots of comments in there that
indicate a known limitation e.g.,
in lazy val separatedContentAtMostN
//FIXME: we don't know whether we can absorb trailing separators or not.
// We don't know if this repeating thing is in trailing position or in the
middle of a sequence.
In the lazy val arrayContentsWithSeparators we find this line:
case (Trailing___, Implicit__, max, ___) => separatedContentAtMostN //
FIXME: have to have all of them - not trailing position
The comment there indicates that this is insufficient.
I think we're going to need a bunch of tests that do not use binary data, just
text, in order to test all the combinations here, so that we can see what is
going wrong.
I don't really understand a delimited by separator situation where minOccurs is
zero, maxOccurs 1.
If we start from the grammar production for recurrance, and I inline substitute
the productions that match guards for these optional elements (i.e., "a" and
"b) we will get
OptionalCombinator(
RepExactlyN(self, 0, separatedRecurringDefaultable) ~
RepAtMostTotalN(this, 1, separatedRecurringNonDefault) )
And RepExactlyN(self, 0, ...) should get an assertion failure because in the
constructor for a base class it insists that N > 0.
At least that's how it looks to me. Is that what you are getting?
Seems to me RepExactlyN when N is zero should simply optimize out - the guard
should be false if N is zero. That would fix the assertion failure.
________________________________
From: Joshua Adams
Sent: Tuesday, November 28, 2017 12:37:46 PM
To: Mike Beckerle; Steve Lawrence; [email protected]
Subject: Re: Issue with separatorSuppressionPolicy and empty elements in
IBM4690-TLOG scheams
That is correct. I tried using trailingEmptyStrict after previously using
trailingEmpty just to see if that would make a difference.
In the sample data file, ace_00_01.dat, the separators for the missing
SpecialTime element are present, as the data looks like this: ...:<TenderTime
data>::<InactiveTime data>:...
Josh
________________________________
From: Mike Beckerle
Sent: Tuesday, November 28, 2017 12:34:09 PM
To: Steve Lawrence; Joshua Adams; [email protected]
Subject: Re: Issue with separatorSuppressionPolicy and empty elements in
IBM4690-TLOG scheams
When I look at the IBM TLog schema (TLogAceFormat.xsd) I see separator policy
is "suppressedAtEndLax" which is the old property value for
dfdl:separatorSuppressionPolicy "trailingEmpty". I.e., not strict per
discussion below.
I want to make sure we're looking at the same schema. This matters in that for
the optional items, separators can be present or absent.
________________________________
From: Steve Lawrence <[email protected]>
Sent: Tuesday, November 28, 2017 8:08:25 AM
To: Mike Beckerle; Joshua Adams; [email protected]
Subject: Re: Issue with separatorSuppressionPolicy and empty elements in
IBM4690-TLOG scheams
I think it might help to show some schema snippets and the resulting
parsers to get an idea of what is going on. A stripped down snippit of
the tlog schema looks something like this:
<dfdl:format occursCountKind="implicit"
lengthKind="delimited"
separatorSuppressionPolicy="trailingEmptyStrict"
separatorPosition="prefix" />
<xs:element name="root">
<xs:complexType>
<xs:sequence dfdl:separator=":">
<xs:element name="a" type="xs:long" />
<xs:element name="b" type="xs:long" minOccurs="0" />
<xs:element name="c" type="xs:long" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
According the the tlog data and expected infoset, the colon separator
should always exist, even if the elements do not exist. So each of the
following are valid data:
5:6:7
5:6:
5::7
5::
So there's a mandatory element "a", followed by some optional elements
"b" and "c", and the separators always exist. The generated parser for
this looks like this:
<seq>
<Element name="a">
...
</Element>
<Optional>
<RepAtMostTotalN name="b" n="1">
<seq>
<Separator/>
<Element name="b">
...
</Element>
</seq>
</RepAtMostTotalN>
</Optional>
<Optional>
<RepAtMostTotalN name="c" n="1">
<seq>
<Separator/>
<Element name="c">
...
</Element>
</seq>
</RepAtMostTotalN>
</Optional>
</seq>
The ... in the above are the parsers for finding delmiters and
converting the delimited text to a string, which isn't too important here.
So it first parsers element "a". Then it optionally parsers 0 to 1
element "b"'s, where each b that is parsed must be preceeded by a
separator. Note however, that if element b fails to parse, we backtrack
so that the separator was not consumed. And element "b" will fail to
parse on zero length delimited value, since only xs:hexBinary and
xs:string allow zero-length representations). Same thing goes for
element "c". Which means if elements "b" or "c" do not exist in the
data, the preceeding separator will not be consumed, which is not what
we want.
I think perhaps we want something like the below instead?
<seq>
<Element name="a">
...
</Element>
<Separator/>
<Optional>
<RepAtMostTotalN name="b" n="1">
<seq>
<OptionalInfixSep><Separator/><OptionalInfixSep>
<Element name="b">
...
</Element>
</seq>
</RepAtMostTotalN>
</Optional>
<Separator/>
<Optional>
<RepAtMostTotalN name="c" n="1">
<seq>
<OptionalInfixSep><Separator/><OptionalInfixSep>
<Element name="c">
...
</Element>
</seq>
</RepAtMostTotalN>
</Optional>
</seq>
So in between each <Optional> or <Element> are mandatory <Separator>'s,
and each RepAtMostTotalN contains an OptionalInfixSep which will only
consume a Separator when more than one element exist. It's not
immediately obvious to me where this change in the grammar should occur,
of if this is even correct, but this might help provide some
insight/background.
- Steve
On 11/27/2017 05:00 PM, Mike Beckerle wrote:
> Well the separator suppression code has not had a lot of scrutiny. I wrote
> this
> a *long time* ago, and honestly have not revisited it since. I assume you
> figured out that separatorSuppressionPolicy replaced the separatorPolicy
> property. This happened after IBM released it's first DFDL product, as a
> result
> handling both the old and new property names was required.
>
>
> For any of these packed numbers, if you are using delimited lengthKind, then
> zero-length is possible, and it means "absent", meaning that if optional, the
> element is not present. If required, it's an error unless zero-length
> triggers a
> nil value. If an element is both optional, and empty is a legitimate value,
> then
> I think empty->optional not present is the winner, but I have to look it up.
>
>
> I wasn't sure what you meant below by "....for IBM4690 and other packed binary
> formats the associated separators aren't processed,...".
>
>
> Probably best for us to talk this through on phone tomorrow (Tuesday). Look
> for
> me on the instant messenger.
>
> --------------------------------------------------------------------------------
> *From:* Joshua Adams
> *Sent:* Monday, November 27, 2017 3:59:14 PM
> *To:* Mike Beckerle; [email protected]
> *Cc:* Stephen Lawrence
> *Subject:* Issue with separatorSuppressionPolicy and empty elements in
> IBM4690-TLOG scheams
>
> Hey Mike,
>
> Wanted to get your opinion on the issue I've been running into with
> IBM4690-TLOG
> schemas. I talked with Steve for a while trying to figure out what was going
> on
> and we came to the opinion that there is either an issue with the TLOG
> schemas,
> or (perhaps more likely) there is an error in the separatorSuppressionPolicy
> code when dealing with infix separators in Daffodil.
>
> In the TlogAce.xsd file
> (https://github.com/DFDLSchemas/IBM4690-TLOG/blob/master/ACE/TlogAce.xsd#L155)
> it seems that the way the schema and data files were written assumed that the
> IBM4690 packed format could have a valid zero length representation, ie an
> optional element that doesn't occur would just be an empty string surrounded
> by
> separators. While this works just fine for strings or hex binary that have
> valid zero length representations, for IBM4690 and other packed binary formats
> the associated separators aren't processed, and in the TlogAce.xsd file, when
> the element SpecialTime is missing all subsequent parsed data in the sequence
> become CustomUserField's as that is the only element that matches the
> separators
> (I think).
>
> So, just wanted to get your opinion on whether or not this is an issue with
> the
> current Daffodil separator suppression policy code or if this is a case of an
> incorrectly formed schema. Steve may jump in to clarify anything I didn't
> explain correctly, as he is a bit more familiar with the separatorSuppression
> code in Daffodil.
>
> Thanks,
>
> Josh
>