Re: Issue with separatorSuppressionPolicy and empty elements in IBM4690-TLOG scheams

Steve Lawrence Tue, 28 Nov 2017 10:06:21 -0800

The snippet below results in the same parser with
dfdl:separatorSupressionPolicy="trailingEmpty".



On 11/28/2017 12:34 PM, Mike Beckerle wrote:
> When I look at the IBM TLog schema (TLogAceFormat.xsd)  I see separator 
> policy 
> is "suppressedAtEndLax" which is the old property value for  
> dfdl:separatorSuppressionPolicy "trailingEmpty". I.e., not strict per 
> discussion 
> below.
> 
> 
> I want to make sure we're looking at the same schema. This matters in that 
> for 
> the optional items, separators can be present or absent.
> 
> --------------------------------------------------------------------------------
> *From:* Steve Lawrence <[email protected]>
> *Sent:* Tuesday, November 28, 2017 8:08:25 AM
> *To:* Mike Beckerle; Joshua Adams; [email protected]
> *Subject:* Re: Issue with separatorSuppressionPolicy and empty elements in 
> IBM4690-TLOG scheams
> I think it might help to show some schema snippets and the resulting
> parsers to get an idea of what is going on. A stripped down snippit of
> the tlog schema looks something like this:
> 
>    <dfdl:format occursCountKind="implicit"
>                 lengthKind="delimited"
>                 separatorSuppressionPolicy="trailingEmptyStrict"
>                 separatorPosition="prefix" />
> 
>    <xs:element name="root">
>      <xs:complexType>
>        <xs:sequence dfdl:separator=":">
>          <xs:element name="a" type="xs:long" />
>          <xs:element name="b" type="xs:long" minOccurs="0" />
>          <xs:element name="c" type="xs:long" minOccurs="0" />
>        </xs:sequence>
>       </xs:complexType>
>    </xs:element>
> 
> According the the tlog data and expected infoset, the colon separator
> should always exist, even if the elements do not exist. So each of the
> following are valid data:
> 
>    5:6:7
>    5:6:
>    5::7
>    5::
> 
> So there's a mandatory element "a", followed by some optional elements
> "b" and "c", and the separators always exist. The generated parser for
> this looks like this:
> 
>    <seq>
>      <Element name="a">
>        ...
>      </Element>
>      <Optional>
>        <RepAtMostTotalN name="b" n="1">
>          <seq>
>            <Separator/>
>            <Element name="b">
>            ...
>            </Element>
>          </seq>
>        </RepAtMostTotalN>
>      </Optional>
>      <Optional>
>        <RepAtMostTotalN name="c" n="1">
>          <seq>
>            <Separator/>
>            <Element name="c">
>            ...
>            </Element>
>          </seq>
>        </RepAtMostTotalN>
>      </Optional>
>    </seq>
> 
> The ... in the above are the parsers for finding delmiters and
> converting the delimited text to a string, which isn't too important here.
> 
> So it first parsers element "a". Then it optionally parsers 0 to 1
> element "b"'s, where each b that is parsed must be preceeded by a
> separator. Note however, that if element b fails to parse, we backtrack
> so that the separator was not consumed. And element "b" will fail to
> parse on zero length delimited value, since only xs:hexBinary and
> xs:string allow zero-length representations). Same thing goes for
> element "c". Which means if elements "b" or "c" do not exist in the
> data, the preceeding separator will not be consumed, which is not what
> we want.
> 
> I think perhaps we want something like the below instead?
> 
>    <seq>
>      <Element name="a">
>        ...
>      </Element>
>      <Separator/>
>      <Optional>
>        <RepAtMostTotalN name="b" n="1">
>          <seq>
>            <OptionalInfixSep><Separator/><OptionalInfixSep>
>            <Element name="b">
>            ...
>            </Element>
>          </seq>
>        </RepAtMostTotalN>
>      </Optional>
>      <Separator/>
>      <Optional>
>        <RepAtMostTotalN name="c" n="1">
>          <seq>
>            <OptionalInfixSep><Separator/><OptionalInfixSep>
>            <Element name="c">
>            ...
>            </Element>
>          </seq>
>        </RepAtMostTotalN>
>      </Optional>
>    </seq>
> 
> So in between each <Optional> or <Element> are mandatory <Separator>'s,
> and each RepAtMostTotalN contains an OptionalInfixSep which will only
> consume a Separator when more than one element exist. It's not
> immediately obvious to me where this change in the grammar should occur,
> of if this is even correct, but this might help provide some
> insight/background.
> 
> - Steve
> 
> On 11/27/2017 05:00 PM, Mike Beckerle wrote:
>> Well the separator suppression code has not had a lot of scrutiny. I wrote 
>> this
>> a *long time* ago, and honestly have not revisited it since. I assume you 
>> figured out that separatorSuppressionPolicy replaced the separatorPolicy 
>> property. This happened after IBM released it's first DFDL product, as a 
>> result
>> handling both the old and new property names was required.
>> 
>> 
>> For any of these packed numbers, if you are using delimited lengthKind, then 
>> zero-length is possible, and it means "absent", meaning that if optional, 
>> the 
>> element is not present. If required, it's an error unless zero-length 
>> triggers a
>> nil value. If an element is both optional, and empty is a legitimate value, 
>> then
>> I think empty->optional not present is the winner, but I have to look it up.
>> 
>> 
>> I wasn't sure what you meant below by "....for IBM4690 and other packed 
>> binary 
>> formats the associated separators aren't processed,...".
>> 
>> 
>> Probably best for us to talk this through on phone tomorrow (Tuesday). Look 
>> for
>> me on the instant messenger.
>> 
>> --------------------------------------------------------------------------------
>> *From:* Joshua Adams
>> *Sent:* Monday, November 27, 2017 3:59:14 PM
>> *To:* Mike Beckerle; [email protected]
>> *Cc:* Stephen Lawrence
>> *Subject:* Issue with separatorSuppressionPolicy and empty elements in 
>> IBM4690-TLOG scheams
>> 
>> Hey Mike,
>> 
>> Wanted to get your opinion on the issue I've been running into with 
>> IBM4690-TLOG
>> schemas.  I talked with Steve for a while trying to figure out what was 
>> going on
>> and we came to the opinion that there is either an issue with the TLOG 
>> schemas,
>> or (perhaps more likely) there is an error in the separatorSuppressionPolicy 
>> code when dealing with infix separators in Daffodil.
>> 
>> In the TlogAce.xsd file 
>> (https://github.com/DFDLSchemas/IBM4690-TLOG/blob/master/ACE/TlogAce.xsd#L155)
>> it seems that the way the schema and data files were written assumed that 
>> the 
>> IBM4690 packed format could have a valid zero length representation, ie an 
>> optional element that doesn't occur would just be an empty string surrounded 
>> by
>> separators.  While this works just fine for strings or hex binary that have 
>> valid zero length representations, for IBM4690 and other packed binary 
>> formats 
>> the associated separators aren't processed, and in the TlogAce.xsd file, 
>> when 
>> the element SpecialTime is missing all subsequent parsed data in the 
>> sequence 
>> become CustomUserField's as that is the only element that matches the 
>> separators
>> (I think).
>> 
>> So, just wanted to get your opinion on whether or not this is an issue with 
>> the
>> current Daffodil separator suppression policy code or if this is a case of 
>> an 
>> incorrectly formed schema.  Steve may jump in to clarify anything I didn't 
>> explain correctly, as he is a bit more familiar with the 
>> separatorSuppression 
>> code in Daffodil.
>> 
>> Thanks,
>> 
>> Josh
>> 
>

Re: Issue with separatorSuppressionPolicy and empty elements in IBM4690-TLOG scheams

Reply via email to