Re: Proposal: Add support for lookahead in DFDL expressions

Sloane, Brandon Fri, 31 May 2019 15:56:11 -0700

Due to the namespace change, the wiki page has moved: 
https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+DAF+lookAhead



Re: 4 - We should be able to increase the limit if we ever need to (I doubt we 
will have people writing schema that depends on getting an error from too-far 
lookahead). If this actually becomes an issue, we can add a mechanism for a 
specific schema to increase it (tunable?). For the moment, I think 512 bits 
should be enough for anyone, while still being small enough that no reasonable 
implementation would have an issue supporting it.

________________________________
From: Beckerle, Mike <[email protected]>
Sent: Friday, May 31, 2019 6:41:56 PM
To: [email protected]
Subject: Re: Proposal: Add support for lookahead in DFDL expressions

I read the page:


Please use namespace daf:lookAhead, not dfdl.


The type of choiceDispatchKey is string. Should the function be defined to 
return a string, or an unsigned integer where the user would then have to call 
xs:string(daf:lookAhead(...)) ?


I support your positions on numbered items 1-4.


Re 2 - if your lookAhead happens to look into the next message contents of a 
message stream, so be it. There's almost nothing we can do about that. There I 
think "implementation-dependent" behavior is acceptable.


Re: 4 - I definitely agree that implementation-defined is better than 
implementation-dependent. We should specify a limit we're willing to live with.


...mikeb


________________________________
From: Sloane, Brandon <[email protected]>
Sent: Friday, May 31, 2019 4:44 PM
To: [email protected]
Subject: Re: Proposal: Add support for lookahead in DFDL expressions

Wiki page created: 
https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+DFDL+lookAhead


In drafting it, I noticed a few edge cases to consider:


1) Is negative lookAhead allowed? I said no, as this would complicated freeing 
memory after we consume part of the input buffer.

2) What happens if we try to lookAhead past the end of a document. In the basic 
case, this is easy and we just error. However, if we are in streaming mode, we 
may still find data there; and may not yet have enough information to know that 
it is actually part of the following document. I propose that we leave this 
case undefined.

3) What is the maximum lookAhead. I define the maximum as distance + numBits to 
give the distance to the last bit we consume. I leave this limit as 
implementation defined and no less than 512 bits.

3.5) "Implemenation defined" means to me that any implementation has an exact 
limit, and will fail schema that attempts to exceed it, even if the 
implmentation can provide the requested data. This avoid potential issues where 
success would be none-determinstic based on how an implementation loads the 
data.


Thoughts?


In the meantime, I will start putting together the actual implementation.

________________________________
From: Sloane, Brandon
Sent: Friday, May 31, 2019 4:15:52 PM
To: [email protected]
Subject: Re: Proposal: Add support for lookahead in DFDL expressions


The offset proposal would be sufficient, but seems like overkill for my case. 
Even if we already had it implemented, I think it would still be worthwhile to 
have a lookAhead function, just because its usage is simpler.


I'll go ahead and create the wiki page, then add the feature to Daffodil.

________________________________
From: Beckerle, Mike <[email protected]>
Sent: Thursday, May 30, 2019 1:25:00 PM
To: [email protected]
Subject: Re: Proposal: Add support for lookahead in DFDL expressions

I really like your idea of doing this with a lookAhead function instead of some 
specialty variant of a choice.

There is an existing solution to this in the nato-stanag-5516 DFDL schema which 
I know you (Brandon) have access to. I like your proposal better, but it is a 
viable workaround.

If you are motivated, I would suggest go ahead and add your lookAhead function 
and let's try it out. If we like it we propose it for inclusion in DFDLv2.0.

I would just request you create a Daffodil Wiki page describing this new 
function, and what it means when parsing, unparsing (SDE?). One thing to be 
sure to comment on is what are the limits on how far you can look ahead. 
Presumably there is an implementation dependent limit, but not less than N 
bits... or something like that.

Lastly, there is this position by offset proposal. That seems like it would 
solve this same issue, as well as others. You can offset forward, parse the 
tag, then offset backward, and parse based on the tag. It also seems pretty 
simple to implement, though perhaps not quite as simple as your lookAhead which 
is isolated to one function.

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74687382


<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74687382>
________________________________
From: Sloane, Brandon <[email protected]>
Sent: Thursday, May 30, 2019 2:27:31 PM
To: [email protected]
Subject: Re: Proposal: Add support for lookahead in DFDL expressions

My initial survey of link16 found 8 instances of this; which I don't think 
changes the calculus substantially.


I suspect that adding lookahead to Daffodil would be less work than getting the 
Link16 schema working with late-discriminators; and we would then have support 
for lookahead in the future should we encounter other formats that require it.


If this were a more complicated/invasive feature, I would share your concern 
about adding it to DFDL, but it does not strike me as something that would 
impose significant maintenance or future development constraints.

________________________________
From: Beckerle, Mike <[email protected]>
Sent: Wednesday, May 29, 2019 6:39:27 PM
To: [email protected]
Subject: Re: Proposal: Add support for lookahead in DFDL expressions


Yup. This is a real situation. I think this happens in fixed length data for 
very mundane reasons explained in the attached slides.


Practically speaking, in Link16 this comes up like twice or 3 times, so this 
isn't worth enhancing DFDL/Daffodil for unless this phenomenon is observed in 
more places. I have only seen it in Link16, though per the attached slides the 
problem could occur a lot in fixed length legacy data formats.


The workaround is just a choice, where you have a "late discriminator" on the 
tag, when you finally get to it.

________________________________
From: Sloane, Brandon <[email protected]>
Sent: Wednesday, May 29, 2019 1:00:04 PM
To: [email protected]
Subject: Proposal: Add support for lookahead in DFDL expressions

In developing schema for link16, I have encountered a situation that I do not 
believe Daffodil currently has a good solutions for. It is a tagged union, 
where the tag comes after the union.


In theory, the schema would look like:


<xs:choice dfdl:choiceDispatchKey="{ tag }">

  <xs:element name="a" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="1"/>

  <xs:element name="b" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="2"/>

</xs:choice>

<xs:element name="tag" type="xs:int" dfdl:length="8" />


Obviously, this schema doe not work because it requires look-ahead. In general 
it is not possible to make this sort of schema work, because in order to 
determine where in the bitstream <tag> is, one would first need to know the 
length of the choice, which cannot (generally speaking) be determined before 
parsing completes.


However, in this case, the lookahead is possible in principle, because the 
choice happens to be fixed length (as it should be in any sane format where the 
tag follows the union).


I believe that we can support this usecase with a much less invasive mechanism 
than infoset lookahead. In particular, we can support this with bytestream 
lookahead in the DFDL expressions, as below:


<xs:choice dfdl:choiceDispatchKey="{ dfdl:lookAhead(16,8) }">

  <xs:element name="a" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="1"/>

  <xs:element name="b" type="xs:int" dfdl:length="16" dfdl:choiceBranchKey="2"/>

</xs:choice>

<xs:element name="tag" type="xs:int" dfdl:length="8" />


The dfdl:lookAhead function takes as input a relative offset, o, and length, n, 
and returns the n bits located o bits passed the current location, interperated 
as an unsigned integer.


>From am implementation standpoint, there should be no difficulty in adding 
>this, as the parser need only peek into the buffer it already has.


Brandon T. Sloane

Associate, Services

[email protected] | tresys.com

Re: Proposal: Add support for lookahead in DFDL expressions

Reply via email to