Re: Using DFDL to generate model, parser and generator?

Christofer Dutz Fri, 11 Jan 2019 02:38:03 -0800

Hi Steve,

thanks for that ... that helped.
I did quite a lot of refactoring and now it seems in general Daffofil is sort 
of happy on a syntactical level.
Now I'm getting errors when using the command line debugger to parse some data.


One problem I am having, is that I can have multiple "parameters" however I was 
unable to specify that the choices allow 0-unbounded elements. 
That's why I left away the min and max occurs (But need to add them again)
            <xs:sequence><!-- minOccurs="0" maxOccurs="unbounded"-->

Aother problem when parsing the demo input is that Daffodil doesn't seem to see 
that as soon as the second byte is "0x03" then this is a response and that 
references a S7ResponseParameterSetupCommunication

But I'm getting this in the console:

[warning] Schema Definition Warning: Multiple choice branches are associated 
with the end of element s7:S7Message.
Note that elements with dfdl:outputValueCalc cannot be used to distinguish 
choice branches.
The offending choice branches are:
group[3] at Location in 
file:/Users/christofer.dutz/Projects/Apache/incubator-daffodil/s7protocol.dfdl.xsd
group[4] at Location in 
file:/Users/christofer.dutz/Projects/Apache/incubator-daffodil/s7protocol.dfdl.xsd
The first branch will be used during unparsing when an infoset ambiguity exists.
Schema context: choice Location in 
file:/Users/christofer.dutz/Projects/Apache/incubator-daffodil/s7protocol.dfdl.xsd
[warning] Using --debug on a non-interactive console may result in display 
issues
(debug)

Here's my dfdl schema and one test-input [1]

Would be cool if you could help me with this ...

And then I will be having another problem, which I haven't addressed yet and 
don't quite know if I can do that in Daffodil, or will have to do that in my 
application.

A request/response has a set of Parameters and a ser of Payloads. For some 
Parameters there is a matching Payload and the parser is only able to know that 
Type a payload is, by knowing the order of the parameters.
So for example a Read-Variable Request-Parameter doesn't have a Payload, but a 
Write-Variable Request-Parameter does ... So the parser needs to know the 
parsed parameters in order to know how to parse the payloads.

This is actually a quite essential requirement ...

Chris

[1] 
https://drive.google.com/drive/folders/1ioUNnWeA2aI7_upkHWMgF7fb2soqdo03?usp=sharing



Am 10.01.19, 18:54 schrieb "Steve Lawrence" <[email protected]>:

    Hi Chris,
    
    As you've found out, DFDL only allows for a limited subset of XML
    schema, and inheritance is not one of those features it allows. Usually
    you can accomplish the same thing via custom types and groups. For
    example, you could change the S7Message from a complexType to a group,
    and then reference that group in the Request/Response elements, e.g.:
    
      <xs:group name="S7Message">
        <xs:sequence>
          <!-- common elements -->
        </xs:sequence>
      </xs:group>
    
      <xs:complexType name="S7RequestMessage">
        <xs:sequence>
          <xs:group ref="S7Message" />
          <!-- unique to the request -->
        </xs:sequence>
      </xs:complexType>
    
      <xs:complexType name="S7ResponseMessage">
        <xs:sequence>
          <xs:group ref="S7Message" />
          <!-- unique to the response -->
        </xs:sequence>
      </xs:complexType>
    
    
    Regarding your last question, DFDL handles parse time determine
    length/occurances/etc. using DFDL expressions, which are a subset of
    XPath. For example, if you have a dynmic lenth, you might have something
    like this:
    
       <xs:element name="length" type="xs:int"
         dfdl:lengthKind="explicit" dfdl:length="4" />
       <xs:element name="payload" type="xs:hexBinary"
         dfdl:lengthKind="explicit" dfdl:length="{ ../length }" />
    
    So first a 4 byte length is parsed, and then a hexBinary blob of data is
    parsed where the length is determined by the expression that gets the
    value of the parsed length value. For variable occurences, it might look
    something like this:
    
       <xs:element name="occurs" type="xs:int"
         dfdl:lengthKind="explicit" dfdl:length="4" />
       <xs:element name="payloads" type="xs:int"
         dfdl:lengthKind="exlicit" dfdl:length="4"
         dfdl:occursCountKind="explicit" dfdl:occursCount="{ ../occurs }"
         maxOccurs="unbounded" />
    
    So in this case, we parse a 4 byte int for the number of occurrences. At
    runtime, we determine the value of the parsed occurs element and have
    than many repeats of the 4-byte payloads element.
    
    The XPath expression language is complex it enough that it should allow
    to perform whatever math might be necessary to calculate sums of sizes
    and the like.
    
    Section 23 of the DFDL spec [1] describes the expression language in
    more detail. Section 23.4 defines a grammar for subset of XPath that is
    supported.
    
    - Steve
    
    [1] https://daffodil.apache.org/docs/dfdl/#_Toc398030820
    
    
    On 1/10/19 10:49 AM, Christofer Dutz wrote:
    > (This time the full message)
    > 
    > Hi Mike,
    > 
    > so I converted one of my Protocols into a Xml-Schema with some 
utilization of the DFDL namespace (Trying to get started)
    > Unfortunately I'm having a little problem with how to define type 
inheritance ... so I have for example parameter elements which all start with a 
one byte type-code followed by a one byte length parameter.
    > The rest is completely different, based on the type of parameter. 
    > 
    > Seems something like this isn't DFDL:
    > 
    >     <xs:complexType name="S7Message">
    >         <xs:sequence>
    >             <!-- S7 Magic Byte always 0x32 -->
    >             <xs:element name="magicByte" type="xs:unsignedByte" 
fixed="50"/>
    >             <xs:element name="messageType" type="xs:unsignedByte"/>
    >             <!-- Reserved value always 0x0000 -->
    >             <xs:element name="reserved" type="xs:unsignedShort" 
fixed="0"/>
    >             <xs:element name="tpduReference" type="xs:unsignedShort"/>
    >             <xs:element name="parametersLength" type="xs:unsignedShort"/>
    >             <xs:element name="payloadsLength" type="xs:unsignedShort"/>
    >         </xs:sequence>
    >     </xs:complexType>
    > 
    >     <xs:complexType name="S7RequestMessage">
    >         <xs:complexContent>
    >             <xs:extension base="s7:S7Message">
    >                 <xs:sequence>
    >                     <xs:element name="parameters" 
type="s7:S7RequestParameter" minOccurs="0" maxOccurs="unbounded"/>
    >                     <xs:element name="payloads" 
type="s7:S7RequestPayload" minOccurs="0" maxOccurs="unbounded"/>
    >                 </xs:sequence>
    >             </xs:extension>
    >         </xs:complexContent>
    >     </xs:complexType>
    > 
    >     <xs:complexType name="S7ResponseMessage">
    >         <xs:complexContent>
    >             <xs:extension base="s7:S7Message">
    >                 <xs:sequence>
    >                     <xs:element name="errorClass" type="xs:unsignedByte"/>
    >                     <xs:element name="errorCode" type="xs:unsignedByte"/>
    >                     <xs:element name="parameters" 
type="s7:S7ResponseParameter" minOccurs="0" maxOccurs="unbounded"/>
    >                     <xs:element name="payloads" 
type="s7:S7ResponsePayload" minOccurs="0" maxOccurs="unbounded"/>
    >                 </xs:sequence>
    >             </xs:extension>
    >         </xs:complexContent>
    >     </xs:complexType>
    > 
    > In the end it seems that DFDL doesn't extend Xml Schema, but uses a 
subset of it to do it's job, is that correct?
    > 
    > I thought at first that if it's an extension I could start with a schema 
and have a look as what it does and then to iteratively narrow it down, but it 
seems that approach isn't valied.
    > 
    > Think first I need to learn how to do, what I want in DFDL. But I did 
encounter some things that might be problematic (perhaps)
    > 
    > So sometimes I read a byte that contains a number of elements or a length 
of an element and have to then read exactly this number of bytes or exactly 
this number of parameters which summed up size matches a total parameter size 
...
    > Hope it is possible to model stuff like this with DFDL.
    > 
    > Chris
    > 
    > 
    > 
    > 
    > [1] 
https://github.com/OpenDFDL/examples/blob/master/helloWorld/src/main/java/HelloWorld.java
    > 
    > 
    > Am 10.01.19, 14:47 schrieb "Beckerle, Mike" <[email protected]>:
    > 
    >     This make sense to me architecturally as infrastructure means by 
which people use this.
    >     
    >     
    >     Compiling a DFDL schema into a any sort of compiled form, whether 
that is generated code, or just a saved runtime data structure (like we have 
now) is exactly what people want as a maven/sbt build step, so creating a 
plugin that does this is very sensible.
    >     
    >     
    >     Right now compiling is slow (unnecessarily. I hope we speed it up 
soon, and reduce it's memory footprint), so a build step that is only re-run if 
the schema actually changed is very useful to save time waiting around for the 
Daffodil compiler.
    >     
    >     
    >     I suggest that the generation of code from the daffodil 
parser/unparser data structures will push the boundaries of what anyone would 
call "template". This is going to be a quite sophisticated recursive descent 
walk, accumulating a variety of things and eventually emitting the code. I 
think it is totally worth it to try this though.
    >     
    >     ________________________________
    >     From: Christofer Dutz <[email protected]>
    >     Sent: Thursday, January 10, 2019 4:57:22 AM
    >     To: [email protected]
    >     Subject: Re: Using DFDL to generate model, parser and generator?
    >     
    >     Hi Mike,
    >     
    >     Well I am currently experimenting with creating a DFDL schema for one 
of the many protocol layers we have.
    >     
    >     I would propose the following (Please correct me, if I'm wrong):
    >     - We create DFDL Schemas
    >     - We use Daffodil to process these (Assuming that in order to process 
DFDL schemas, there has to be some sort of model representation)
    >     - We add a Maven plugin, that uses the parsed schema representation 
model and allows generating code via some templating language (Freemarker and 
Velocity are both Apache ... so should be one of these)
    >     - In a project you define templates for the current usecase (A 
general purpose runtime would be sub-optimal for our case ... we would probably 
use Netty utils for parsing/serializing)
    >     
    >     Perhaps based on these PLC4X templates it would make sense to build 
other sets of templates as part of the Daffodil project.
    >     Daffodil could have multiple sets of templates for different 
languages and frameworks. Eventually a template module could have a runtime 
module to be used in the code generated.
    >     
    >     So you would use the maven plugin without providing a 
template-artifact and it would look for local templates. If however you provide 
a template-artifact, then the plugin would use those.
    >     
    >     In the end I would probably build the maven plugin in a way that it 
makes things easier to run it on the Command line or build plugins for SBT, 
Gradle, Ant whatsoever ...
    >     
    >     What do you think?
    >     
    >     Chris
    >     
    >     
    >     
    >     Am 09.01.19, 20:10 schrieb "Beckerle, Mike" <[email protected]>:
    >     
    >         Christofer,
    >     
    >     
    >         Yes what you suggest is possible, is what many people want, has 
been talked about here and there, but I don't know of anyone else doing exactly 
this right now.
    >     
    >     
    >         Effectively what you are describing is a code-generator backend 
for Daffodil. I think this is a great idea. I personally want to have one that 
generates VHDL or Verilog or other Hardware synthesis language so you can go 
direct to an FPGA for data parsing at hardware speed.
    >     
    >     
    >         Anyway, such a generator would likely be adding to the existing 
parser/unparser primitives so that in addition to having parse() and unparse() 
methods, they would have generateCode() methods that emit the equivalent code, 
and recursively invoke the sub-objects to generateCode() that is incorporated 
recursively.
    >     
    >     
    >         I would suggest that the existing Daffodil backend, which may 
well not be fast enough for your needs, would nevertheless be very valuable 
part of your testing strategy as your schemas should work on Daffodil, and you 
can then verify that the parser behavior from your generated code is 
consistent.  It also may be helpful for diagnostic purposes - ie., if data is 
parsed and determined invalid, perhaps your "kit" to help your users involves 
parsing such data with regular old Daffodil into XML for tangibility/inspection.
    >     
    >     
    >         There is a fair amount of runtime-library to be created to go 
with the generated code of course. Daffodil has daffodil-lib, daffdil-io, 
daffodil-runtime1, and daffodil-runtime1-unparser, each of which contains a 
large volume of runtime code that would need to be replaced with C/C++ 
equivalent in a new runtime. I would suggest much of the work is actually here, 
not in the compilation.
    >     
    >     
    >         I really hope you undertake this effort. I think it will be a big 
value-add to Daffodil if it has a code-gen style backend. The current back-end 
really hasn't had raw-speed as its goal. It has largely been about correctness, 
and getting the DFDL standard fully/mostly implemented quickly. Let us know how 
we can help you get started.
    >     
    >     
    >         The other thing worth mentioning is that Daffodil does have on 
roadmap, plans to create a streaming parser/unparser. This would not build a 
DOM-tree like structure, but would instead emit events along the lines of a 
SAX-style parse of data. Now some formats are simply not stream-able, and there 
is no option to avoid building up a tree in memory. But many formats are 
stream-able, and people really do want the ability to parse files much larger 
than memory, in finite RAM, so long as the format is streamable.
    >     
    >     
    >         -mike beckerle
    >     
    >         Tresys Technology
    >     
    >         ________________________________
    >         From: Christofer Dutz <[email protected]>
    >         Sent: Wednesday, January 9, 2019 8:56:28 AM
    >         To: [email protected]
    >         Subject: Using DFDL to generate model, parser and generator?
    >     
    >         Hi all,
    >     
    >         I am currently looking for a solution to the following question:
    >     
    >         In the Apache PLC4X (incubating) project we are implementing a 
lot of different industry protocols.
    >         Each protocol sends packets following a particular format. For 
each of these we currently implement an internal model, serializers and parsers.
    >         Till now this has been pure Java, but we are now starting to work 
on C++ and would like to add even more languages.
    >     
    >         As we don’t want to manually keep in sync all of these 
implementations, my idea was to describe the data format in some form and have 
the parsers, serializers and the model generated from that.
    >         So the implementation only has to take care of the plumbing and 
the state-machine of the protocol.
    >     
    >         In Montreal I attended a great talk on DFDL and Daffodil, so I 
think DFDL in general would be a great fit.
    >         Unfortunately we don’t want to parse any data format into an XML 
or DOM representation for performance reasons.
    >     
    >         My ideal workflow would look like this:
    >     
    >           1.  For every protocol I define the DFDL documents describing 
the different types of messages for a given protocol
    >           2.  I define multiple protocol implementation modules (one for 
each language)
    >           3.  I use a maven plugin in each of these to generate the code 
for that particular language from those central DFDL definitions
    >     
    >         Is this possible?
    >         Is it planned to support this in the future?
    >         What other options do you see for this sort of problem?
    >     
    >         I am absolutely willing to get my hands dirty and help implement 
this, if you say: “Yes we want that too but haven’t managed to do that yet”.
    >     
    >         Chris
    >     
    >     
    >     
    > 
    >

Re: Using DFDL to generate model, parser and generator?

Reply via email to