The snippet below results in the same parser with dfdl:separatorSupressionPolicy="trailingEmpty".
On 11/28/2017 12:34 PM, Mike Beckerle wrote: > When I look at the IBM TLog schema (TLogAceFormat.xsd) I see separator > policy > is "suppressedAtEndLax" which is the old property value for > dfdl:separatorSuppressionPolicy "trailingEmpty". I.e., not strict per > discussion > below. > > > I want to make sure we're looking at the same schema. This matters in that > for > the optional items, separators can be present or absent. > > -------------------------------------------------------------------------------- > *From:* Steve Lawrence <[email protected]> > *Sent:* Tuesday, November 28, 2017 8:08:25 AM > *To:* Mike Beckerle; Joshua Adams; [email protected] > *Subject:* Re: Issue with separatorSuppressionPolicy and empty elements in > IBM4690-TLOG scheams > I think it might help to show some schema snippets and the resulting > parsers to get an idea of what is going on. A stripped down snippit of > the tlog schema looks something like this: > > <dfdl:format occursCountKind="implicit" > lengthKind="delimited" > separatorSuppressionPolicy="trailingEmptyStrict" > separatorPosition="prefix" /> > > <xs:element name="root"> > <xs:complexType> > <xs:sequence dfdl:separator=":"> > <xs:element name="a" type="xs:long" /> > <xs:element name="b" type="xs:long" minOccurs="0" /> > <xs:element name="c" type="xs:long" minOccurs="0" /> > </xs:sequence> > </xs:complexType> > </xs:element> > > According the the tlog data and expected infoset, the colon separator > should always exist, even if the elements do not exist. So each of the > following are valid data: > > 5:6:7 > 5:6: > 5::7 > 5:: > > So there's a mandatory element "a", followed by some optional elements > "b" and "c", and the separators always exist. The generated parser for > this looks like this: > > <seq> > <Element name="a"> > ... > </Element> > <Optional> > <RepAtMostTotalN name="b" n="1"> > <seq> > <Separator/> > <Element name="b"> > ... > </Element> > </seq> > </RepAtMostTotalN> > </Optional> > <Optional> > <RepAtMostTotalN name="c" n="1"> > <seq> > <Separator/> > <Element name="c"> > ... > </Element> > </seq> > </RepAtMostTotalN> > </Optional> > </seq> > > The ... in the above are the parsers for finding delmiters and > converting the delimited text to a string, which isn't too important here. > > So it first parsers element "a". Then it optionally parsers 0 to 1 > element "b"'s, where each b that is parsed must be preceeded by a > separator. Note however, that if element b fails to parse, we backtrack > so that the separator was not consumed. And element "b" will fail to > parse on zero length delimited value, since only xs:hexBinary and > xs:string allow zero-length representations). Same thing goes for > element "c". Which means if elements "b" or "c" do not exist in the > data, the preceeding separator will not be consumed, which is not what > we want. > > I think perhaps we want something like the below instead? > > <seq> > <Element name="a"> > ... > </Element> > <Separator/> > <Optional> > <RepAtMostTotalN name="b" n="1"> > <seq> > <OptionalInfixSep><Separator/><OptionalInfixSep> > <Element name="b"> > ... > </Element> > </seq> > </RepAtMostTotalN> > </Optional> > <Separator/> > <Optional> > <RepAtMostTotalN name="c" n="1"> > <seq> > <OptionalInfixSep><Separator/><OptionalInfixSep> > <Element name="c"> > ... > </Element> > </seq> > </RepAtMostTotalN> > </Optional> > </seq> > > So in between each <Optional> or <Element> are mandatory <Separator>'s, > and each RepAtMostTotalN contains an OptionalInfixSep which will only > consume a Separator when more than one element exist. It's not > immediately obvious to me where this change in the grammar should occur, > of if this is even correct, but this might help provide some > insight/background. > > - Steve > > On 11/27/2017 05:00 PM, Mike Beckerle wrote: >> Well the separator suppression code has not had a lot of scrutiny. I wrote >> this >> a *long time* ago, and honestly have not revisited it since. I assume you >> figured out that separatorSuppressionPolicy replaced the separatorPolicy >> property. This happened after IBM released it's first DFDL product, as a >> result >> handling both the old and new property names was required. >> >> >> For any of these packed numbers, if you are using delimited lengthKind, then >> zero-length is possible, and it means "absent", meaning that if optional, >> the >> element is not present. If required, it's an error unless zero-length >> triggers a >> nil value. If an element is both optional, and empty is a legitimate value, >> then >> I think empty->optional not present is the winner, but I have to look it up. >> >> >> I wasn't sure what you meant below by "....for IBM4690 and other packed >> binary >> formats the associated separators aren't processed,...". >> >> >> Probably best for us to talk this through on phone tomorrow (Tuesday). Look >> for >> me on the instant messenger. >> >> -------------------------------------------------------------------------------- >> *From:* Joshua Adams >> *Sent:* Monday, November 27, 2017 3:59:14 PM >> *To:* Mike Beckerle; [email protected] >> *Cc:* Stephen Lawrence >> *Subject:* Issue with separatorSuppressionPolicy and empty elements in >> IBM4690-TLOG scheams >> >> Hey Mike, >> >> Wanted to get your opinion on the issue I've been running into with >> IBM4690-TLOG >> schemas. I talked with Steve for a while trying to figure out what was >> going on >> and we came to the opinion that there is either an issue with the TLOG >> schemas, >> or (perhaps more likely) there is an error in the separatorSuppressionPolicy >> code when dealing with infix separators in Daffodil. >> >> In the TlogAce.xsd file >> (https://github.com/DFDLSchemas/IBM4690-TLOG/blob/master/ACE/TlogAce.xsd#L155) >> it seems that the way the schema and data files were written assumed that >> the >> IBM4690 packed format could have a valid zero length representation, ie an >> optional element that doesn't occur would just be an empty string surrounded >> by >> separators. While this works just fine for strings or hex binary that have >> valid zero length representations, for IBM4690 and other packed binary >> formats >> the associated separators aren't processed, and in the TlogAce.xsd file, >> when >> the element SpecialTime is missing all subsequent parsed data in the >> sequence >> become CustomUserField's as that is the only element that matches the >> separators >> (I think). >> >> So, just wanted to get your opinion on whether or not this is an issue with >> the >> current Daffodil separator suppression policy code or if this is a case of >> an >> incorrectly formed schema. Steve may jump in to clarify anything I didn't >> explain correctly, as he is a bit more familiar with the >> separatorSuppression >> code in Daffodil. >> >> Thanks, >> >> Josh >> >
