The offset proposal would be sufficient, but seems like overkill for my case. Even if we already had it implemented, I think it would still be worthwhile to have a lookAhead function, just because its usage is simpler.
I'll go ahead and create the wiki page, then add the feature to Daffodil. ________________________________ From: Beckerle, Mike <[email protected]> Sent: Thursday, May 30, 2019 1:25:00 PM To: [email protected] Subject: Re: Proposal: Add support for lookahead in DFDL expressions I really like your idea of doing this with a lookAhead function instead of some specialty variant of a choice. There is an existing solution to this in the nato-stanag-5516 DFDL schema which I know you (Brandon) have access to. I like your proposal better, but it is a viable workaround. If you are motivated, I would suggest go ahead and add your lookAhead function and let's try it out. If we like it we propose it for inclusion in DFDLv2.0. I would just request you create a Daffodil Wiki page describing this new function, and what it means when parsing, unparsing (SDE?). One thing to be sure to comment on is what are the limits on how far you can look ahead. Presumably there is an implementation dependent limit, but not less than N bits... or something like that. Lastly, there is this position by offset proposal. That seems like it would solve this same issue, as well as others. You can offset forward, parse the tag, then offset backward, and parse based on the tag. It also seems pretty simple to implement, though perhaps not quite as simple as your lookAhead which is isolated to one function. https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74687382 <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74687382> ________________________________ From: Sloane, Brandon <[email protected]> Sent: Thursday, May 30, 2019 2:27:31 PM To: [email protected] Subject: Re: Proposal: Add support for lookahead in DFDL expressions My initial survey of link16 found 8 instances of this; which I don't think changes the calculus substantially. I suspect that adding lookahead to Daffodil would be less work than getting the Link16 schema working with late-discriminators; and we would then have support for lookahead in the future should we encounter other formats that require it. If this were a more complicated/invasive feature, I would share your concern about adding it to DFDL, but it does not strike me as something that would impose significant maintenance or future development constraints. ________________________________ From: Beckerle, Mike <[email protected]> Sent: Wednesday, May 29, 2019 6:39:27 PM To: [email protected] Subject: Re: Proposal: Add support for lookahead in DFDL expressions Yup. This is a real situation. I think this happens in fixed length data for very mundane reasons explained in the attached slides. Practically speaking, in Link16 this comes up like twice or 3 times, so this isn't worth enhancing DFDL/Daffodil for unless this phenomenon is observed in more places. I have only seen it in Link16, though per the attached slides the problem could occur a lot in fixed length legacy data formats. The workaround is just a choice, where you have a "late discriminator" on the tag, when you finally get to it. ________________________________ From: Sloane, Brandon <[email protected]> Sent: Wednesday, May 29, 2019 1:00:04 PM To: [email protected] Subject: Proposal: Add support for lookahead in DFDL expressions In developing schema for link16, I have encountered a situation that I do not believe Daffodil currently has a good solutions for. It is a tagged union, where the tag comes after the union. In theory, the schema would look like: <xs:choice dfdl:choiceDispatchKey="{ tag }"> <xs:element name="a" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="1"/> <xs:element name="b" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="2"/> </xs:choice> <xs:element name="tag" type="xs:int" dfdl:length="8" /> Obviously, this schema doe not work because it requires look-ahead. In general it is not possible to make this sort of schema work, because in order to determine where in the bitstream <tag> is, one would first need to know the length of the choice, which cannot (generally speaking) be determined before parsing completes. However, in this case, the lookahead is possible in principle, because the choice happens to be fixed length (as it should be in any sane format where the tag follows the union). I believe that we can support this usecase with a much less invasive mechanism than infoset lookahead. In particular, we can support this with bytestream lookahead in the DFDL expressions, as below: <xs:choice dfdl:choiceDispatchKey="{ dfdl:lookAhead(16,8) }"> <xs:element name="a" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="1"/> <xs:element name="b" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="2"/> </xs:choice> <xs:element name="tag" type="xs:int" dfdl:length="8" /> The dfdl:lookAhead function takes as input a relative offset, o, and length, n, and returns the n bits located o bits passed the current location, interperated as an unsigned integer. >From am implementation standpoint, there should be no difficulty in adding >this, as the parser need only peek into the buffer it already has. Brandon T. Sloane Associate, Services [email protected] | tresys.com
