When I look at the IBM TLog schema (TLogAceFormat.xsd) I see separator policy is "suppressedAtEndLax" which is the old property value for dfdl:separatorSuppressionPolicy "trailingEmpty". I.e., not strict per discussion below.
I want to make sure we're looking at the same schema. This matters in that for the optional items, separators can be present or absent. ________________________________ From: Steve Lawrence <[email protected]> Sent: Tuesday, November 28, 2017 8:08:25 AM To: Mike Beckerle; Joshua Adams; [email protected] Subject: Re: Issue with separatorSuppressionPolicy and empty elements in IBM4690-TLOG scheams I think it might help to show some schema snippets and the resulting parsers to get an idea of what is going on. A stripped down snippit of the tlog schema looks something like this: <dfdl:format occursCountKind="implicit" lengthKind="delimited" separatorSuppressionPolicy="trailingEmptyStrict" separatorPosition="prefix" /> <xs:element name="root"> <xs:complexType> <xs:sequence dfdl:separator=":"> <xs:element name="a" type="xs:long" /> <xs:element name="b" type="xs:long" minOccurs="0" /> <xs:element name="c" type="xs:long" minOccurs="0" /> </xs:sequence> </xs:complexType> </xs:element> According the the tlog data and expected infoset, the colon separator should always exist, even if the elements do not exist. So each of the following are valid data: 5:6:7 5:6: 5::7 5:: So there's a mandatory element "a", followed by some optional elements "b" and "c", and the separators always exist. The generated parser for this looks like this: <seq> <Element name="a"> ... </Element> <Optional> <RepAtMostTotalN name="b" n="1"> <seq> <Separator/> <Element name="b"> ... </Element> </seq> </RepAtMostTotalN> </Optional> <Optional> <RepAtMostTotalN name="c" n="1"> <seq> <Separator/> <Element name="c"> ... </Element> </seq> </RepAtMostTotalN> </Optional> </seq> The ... in the above are the parsers for finding delmiters and converting the delimited text to a string, which isn't too important here. So it first parsers element "a". Then it optionally parsers 0 to 1 element "b"'s, where each b that is parsed must be preceeded by a separator. Note however, that if element b fails to parse, we backtrack so that the separator was not consumed. And element "b" will fail to parse on zero length delimited value, since only xs:hexBinary and xs:string allow zero-length representations). Same thing goes for element "c". Which means if elements "b" or "c" do not exist in the data, the preceeding separator will not be consumed, which is not what we want. I think perhaps we want something like the below instead? <seq> <Element name="a"> ... </Element> <Separator/> <Optional> <RepAtMostTotalN name="b" n="1"> <seq> <OptionalInfixSep><Separator/><OptionalInfixSep> <Element name="b"> ... </Element> </seq> </RepAtMostTotalN> </Optional> <Separator/> <Optional> <RepAtMostTotalN name="c" n="1"> <seq> <OptionalInfixSep><Separator/><OptionalInfixSep> <Element name="c"> ... </Element> </seq> </RepAtMostTotalN> </Optional> </seq> So in between each <Optional> or <Element> are mandatory <Separator>'s, and each RepAtMostTotalN contains an OptionalInfixSep which will only consume a Separator when more than one element exist. It's not immediately obvious to me where this change in the grammar should occur, of if this is even correct, but this might help provide some insight/background. - Steve On 11/27/2017 05:00 PM, Mike Beckerle wrote: > Well the separator suppression code has not had a lot of scrutiny. I wrote > this > a *long time* ago, and honestly have not revisited it since. I assume you > figured out that separatorSuppressionPolicy replaced the separatorPolicy > property. This happened after IBM released it's first DFDL product, as a > result > handling both the old and new property names was required. > > > For any of these packed numbers, if you are using delimited lengthKind, then > zero-length is possible, and it means "absent", meaning that if optional, the > element is not present. If required, it's an error unless zero-length > triggers a > nil value. If an element is both optional, and empty is a legitimate value, > then > I think empty->optional not present is the winner, but I have to look it up. > > > I wasn't sure what you meant below by "....for IBM4690 and other packed binary > formats the associated separators aren't processed,...". > > > Probably best for us to talk this through on phone tomorrow (Tuesday). Look > for > me on the instant messenger. > > -------------------------------------------------------------------------------- > *From:* Joshua Adams > *Sent:* Monday, November 27, 2017 3:59:14 PM > *To:* Mike Beckerle; [email protected] > *Cc:* Stephen Lawrence > *Subject:* Issue with separatorSuppressionPolicy and empty elements in > IBM4690-TLOG scheams > > Hey Mike, > > Wanted to get your opinion on the issue I've been running into with > IBM4690-TLOG > schemas. I talked with Steve for a while trying to figure out what was going > on > and we came to the opinion that there is either an issue with the TLOG > schemas, > or (perhaps more likely) there is an error in the separatorSuppressionPolicy > code when dealing with infix separators in Daffodil. > > In the TlogAce.xsd file > (https://github.com/DFDLSchemas/IBM4690-TLOG/blob/master/ACE/TlogAce.xsd#L155) > it seems that the way the schema and data files were written assumed that the > IBM4690 packed format could have a valid zero length representation, ie an > optional element that doesn't occur would just be an empty string surrounded > by > separators. While this works just fine for strings or hex binary that have > valid zero length representations, for IBM4690 and other packed binary formats > the associated separators aren't processed, and in the TlogAce.xsd file, when > the element SpecialTime is missing all subsequent parsed data in the sequence > become CustomUserField's as that is the only element that matches the > separators > (I think). > > So, just wanted to get your opinion on whether or not this is an issue with > the > current Daffodil separator suppression policy code or if this is a case of an > incorrectly formed schema. Steve may jump in to clarify anything I didn't > explain correctly, as he is a bit more familiar with the separatorSuppression > code in Daffodil. > > Thanks, > > Josh >
