Re: Using DFDL to generate model, parser and generator?

Steve Lawrence Fri, 11 Jan 2019 05:55:52 -0800

There's a few issues mentioned, I'll try to hit them all:

Problem 1: Zero or more choice elements.


DFDL only allows min/maxOccurs on xs:elements. To support zero or more
choices, you need to wrap your choice in a complex element. So in your
provided scheme, you might have something like this:

  <xs:element name="requestParameter" dfdl:occursCountKind="implicit"
    minOccurs="0" maxOccurs="unbounded">
    <xs:complexType>
      <xs:choice>
        <xs:group ref="s7:S7RequestParameterSetupCommunication"/>
        <xs:group ref="s7:SS7RequestParameterCPUService"/>
        <xs:group ref="s7:S7RequestParameterReadVar"/>
        <xs:group ref="s7:S7RequestParameterWriteVar"/>
      </xs:choice>
    </xs:complexType>
  </xs:element>

This results in an "array" of requestParameter elements, where each
element of the array contains one of those parameter types.

Problem 2: Second byte 0x03 means ResponseParameterSetupCommunication

This is related to how Daffodil resolves points of uncertainty. For
example, when you have a choice Daffodil needs some what to determine
which choice branch to take. By default it takes the first branch and if
something goes it backtracks and takes the second branch. Discriminators
can be used to tell Daffodil something went wrong and to backtrack,or to
let Daffodil know that it did take the correct branch. Another way is
via choice dispatch. This works very similar a swtich/case statement.
I'll use your S7MessageType as an example of how to use choice dispatch,
slightly rearranging things.

  <xs:complexType name="S7MessageType">
    <xs:sequence>
      <!-- S7 Magic Byte always 0x32 -->
      <xs:element name="magicByte" type="xs:unsignedByte" fixed="50"/>
      <xs:element name="messageType" type="xs:unsignedByte" />
      <xs:choice dfdl:choiceDispatchKey="{ xs:string(./messageType) }">
        <xs:group ref="s7:S7RequestMessage" dfdl:choiceBranchKey="1"/>
        <xs:group ref="s7:S7ResponseMessage" dfdl:choiceBranchKey="3"/>
      </xs:choice>
    </xs:sequence>
  </xs:complexType>

So in this case, we've moved the messageType element outside of the
Request/Response message types and into the MessageType element. So
first Daffodil will parse a magic number and a message type. Then it
will convert that messageType to a string in the dfdl:choiceDispatchKey
property and compare the result with the dfdl:choiceBranchKey
properties. If the value was "1" it takes the RequestMessage branch, if
"3" it takes the ResponseMessage branch. If it is neither, Daffodil will
return a ParseError. Considering this is binary data, I suspect most of
your choices will be able to take advantage of choice dispatch to
determine which choice branch to take, though it may require some slight
reorganization like mentioned above.

Problem 3: Multiple choice branches warning

I think you can ignore that warning for now. That error is telling you
that there are multiple branches of a choice that could be empty, which
results in ambiguities related to serialization. In your specific case,
I think it's just because you have some empty groups that just haven't
been implemented yet. I suspect that warning will go away once those are
implemented.

Problem 4: Matching Parameters with Payloads

Things get a little tricky here. This might require a little more
detailed information exactly what a list of parameters a payloads look
like and how they match up, and might even be worthy of it's own thread.
But in most cases this can be handled pretty easily. As mentioned in
Problem 1, you'll have an array of complex types called "parameter".
Then the choiceBranch in the payload would use dfdl:choiceDispatch to
reach back into the parameter array to determine how to parse each
payload. For example, you might have something like this:

  <xs:element name="parameters" dfdl:occursCount="{ ../occurances }" ..>
   ...
  </xs:element>
  <xs:element name="payloads" dfdl:occursCount="{ ../occurances }" ...>
    <xs:complexType>
      <xs:choice dfdl:choiceBranchDispatch="{
../parameters[dfdl:occursIndex()]/payloadType }">
        <xs:group ref="type1" dfdl:choiceBranchKey="type1" />
        <xs:group ref="type2" dfdl:choiceBranchKey="type2" />
        <xs:group ref="type3" dfdl:choiceBranchKey="type3" />
      </xs:chice>
    </xs:complexType>
  </xs:element>

This can be made to handle the case where a payload is empty for a
parameter. But as you can see, this is kindof complex which is why a
whole new thread might be worth it to go into more details.


General Schema Notes:

Just a couple things I noticed in the schema.

- In the dfdl:format element, you specific representation="text". Since
this is a binary format, that should be representation="binary".
Otherwise Daffodil will treat your data as if it were UTF-8 text instead
of two's complement binary.

- In the dfdl:format element, you specify lengthKind="delimited". This
is generally used for text based formats where there are things like
comma's that tell you where a field ends. For binary text, where most
fields where things are fix length, it usually makes sense to set
lengthKind="implicit" in the dfdl:format tag. This way your simple types
will have the appropriate lengths (eg. xs:unsignedByte is 1 byte,
xs:unsignedShort is 2 bytes, etc.). You can also explicitly set lengths
using dfdl:lengthKind="explicit" and dfdl:length/dfdl:lengthUnits where
types don't have standard lengths.

- Daffodil does not support the "fixed" attribute, so just be sure to
not rely on that for validation of the data. I would guess that as you
reorganize things to use choice dispatch that those "fixed" values will
be moved to choiceBranchKeys.



On 1/11/19 5:37 AM, Christofer Dutz wrote:
> Hi Steve,
> 
> thanks for that ... that helped.
> I did quite a lot of refactoring and now it seems in general Daffofil is sort 
> of happy on a syntactical level.
> Now I'm getting errors when using the command line debugger to parse some 
> data.
> 
> One problem I am having, is that I can have multiple "parameters" however I 
> was unable to specify that the choices allow 0-unbounded elements. 
> That's why I left away the min and max occurs (But need to add them again)
>             <xs:sequence><!-- minOccurs="0" maxOccurs="unbounded"-->
> 
> Aother problem when parsing the demo input is that Daffodil doesn't seem to 
> see that as soon as the second byte is "0x03" then this is a response and 
> that references a S7ResponseParameterSetupCommunication
> 
> But I'm getting this in the console:
> 
> [warning] Schema Definition Warning: Multiple choice branches are associated 
> with the end of element s7:S7Message.
> Note that elements with dfdl:outputValueCalc cannot be used to distinguish 
> choice branches.
> The offending choice branches are:
> group[3] at Location in 
> file:/Users/christofer.dutz/Projects/Apache/incubator-daffodil/s7protocol.dfdl.xsd
> group[4] at Location in 
> file:/Users/christofer.dutz/Projects/Apache/incubator-daffodil/s7protocol.dfdl.xsd
> The first branch will be used during unparsing when an infoset ambiguity 
> exists.
> Schema context: choice Location in 
> file:/Users/christofer.dutz/Projects/Apache/incubator-daffodil/s7protocol.dfdl.xsd
> [warning] Using --debug on a non-interactive console may result in display 
> issues
> (debug)
> 
> Here's my dfdl schema and one test-input [1]
> 
> Would be cool if you could help me with this ...
> 
> And then I will be having another problem, which I haven't addressed yet and 
> don't quite know if I can do that in Daffodil, or will have to do that in my 
> application.
> 
> A request/response has a set of Parameters and a ser of Payloads. For some 
> Parameters there is a matching Payload and the parser is only able to know 
> that Type a payload is, by knowing the order of the parameters.
> So for example a Read-Variable Request-Parameter doesn't have a Payload, but 
> a Write-Variable Request-Parameter does ... So the parser needs to know the 
> parsed parameters in order to know how to parse the payloads.
> 
> This is actually a quite essential requirement ...
> 
> Chris
> 
> [1] 
> https://drive.google.com/drive/folders/1ioUNnWeA2aI7_upkHWMgF7fb2soqdo03?usp=sharing
> 
> 
> 
> Am 10.01.19, 18:54 schrieb "Steve Lawrence" <[email protected]>:
> 
>     Hi Chris,
>     
>     As you've found out, DFDL only allows for a limited subset of XML
>     schema, and inheritance is not one of those features it allows. Usually
>     you can accomplish the same thing via custom types and groups. For
>     example, you could change the S7Message from a complexType to a group,
>     and then reference that group in the Request/Response elements, e.g.:
>     
>       <xs:group name="S7Message">
>         <xs:sequence>
>           <!-- common elements -->
>         </xs:sequence>
>       </xs:group>
>     
>       <xs:complexType name="S7RequestMessage">
>         <xs:sequence>
>           <xs:group ref="S7Message" />
>           <!-- unique to the request -->
>         </xs:sequence>
>       </xs:complexType>
>     
>       <xs:complexType name="S7ResponseMessage">
>         <xs:sequence>
>           <xs:group ref="S7Message" />
>           <!-- unique to the response -->
>         </xs:sequence>
>       </xs:complexType>
>     
>     
>     Regarding your last question, DFDL handles parse time determine
>     length/occurances/etc. using DFDL expressions, which are a subset of
>     XPath. For example, if you have a dynmic lenth, you might have something
>     like this:
>     
>        <xs:element name="length" type="xs:int"
>          dfdl:lengthKind="explicit" dfdl:length="4" />
>        <xs:element name="payload" type="xs:hexBinary"
>          dfdl:lengthKind="explicit" dfdl:length="{ ../length }" />
>     
>     So first a 4 byte length is parsed, and then a hexBinary blob of data is
>     parsed where the length is determined by the expression that gets the
>     value of the parsed length value. For variable occurences, it might look
>     something like this:
>     
>        <xs:element name="occurs" type="xs:int"
>          dfdl:lengthKind="explicit" dfdl:length="4" />
>        <xs:element name="payloads" type="xs:int"
>          dfdl:lengthKind="exlicit" dfdl:length="4"
>          dfdl:occursCountKind="explicit" dfdl:occursCount="{ ../occurs }"
>          maxOccurs="unbounded" />
>     
>     So in this case, we parse a 4 byte int for the number of occurrences. At
>     runtime, we determine the value of the parsed occurs element and have
>     than many repeats of the 4-byte payloads element.
>     
>     The XPath expression language is complex it enough that it should allow
>     to perform whatever math might be necessary to calculate sums of sizes
>     and the like.
>     
>     Section 23 of the DFDL spec [1] describes the expression language in
>     more detail. Section 23.4 defines a grammar for subset of XPath that is
>     supported.
>     
>     - Steve
>     
>     [1] https://daffodil.apache.org/docs/dfdl/#_Toc398030820
>     
>     
>     On 1/10/19 10:49 AM, Christofer Dutz wrote:
>     > (This time the full message)
>     > 
>     > Hi Mike,
>     > 
>     > so I converted one of my Protocols into a Xml-Schema with some 
> utilization of the DFDL namespace (Trying to get started)
>     > Unfortunately I'm having a little problem with how to define type 
> inheritance ... so I have for example parameter elements which all start with 
> a one byte type-code followed by a one byte length parameter.
>     > The rest is completely different, based on the type of parameter. 
>     > 
>     > Seems something like this isn't DFDL:
>     > 
>     >     <xs:complexType name="S7Message">
>     >         <xs:sequence>
>     >             <!-- S7 Magic Byte always 0x32 -->
>     >             <xs:element name="magicByte" type="xs:unsignedByte" 
> fixed="50"/>
>     >             <xs:element name="messageType" type="xs:unsignedByte"/>
>     >             <!-- Reserved value always 0x0000 -->
>     >             <xs:element name="reserved" type="xs:unsignedShort" 
> fixed="0"/>
>     >             <xs:element name="tpduReference" type="xs:unsignedShort"/>
>     >             <xs:element name="parametersLength" 
> type="xs:unsignedShort"/>
>     >             <xs:element name="payloadsLength" type="xs:unsignedShort"/>
>     >         </xs:sequence>
>     >     </xs:complexType>
>     > 
>     >     <xs:complexType name="S7RequestMessage">
>     >         <xs:complexContent>
>     >             <xs:extension base="s7:S7Message">
>     >                 <xs:sequence>
>     >                     <xs:element name="parameters" 
> type="s7:S7RequestParameter" minOccurs="0" maxOccurs="unbounded"/>
>     >                     <xs:element name="payloads" 
> type="s7:S7RequestPayload" minOccurs="0" maxOccurs="unbounded"/>
>     >                 </xs:sequence>
>     >             </xs:extension>
>     >         </xs:complexContent>
>     >     </xs:complexType>
>     > 
>     >     <xs:complexType name="S7ResponseMessage">
>     >         <xs:complexContent>
>     >             <xs:extension base="s7:S7Message">
>     >                 <xs:sequence>
>     >                     <xs:element name="errorClass" 
> type="xs:unsignedByte"/>
>     >                     <xs:element name="errorCode" 
> type="xs:unsignedByte"/>
>     >                     <xs:element name="parameters" 
> type="s7:S7ResponseParameter" minOccurs="0" maxOccurs="unbounded"/>
>     >                     <xs:element name="payloads" 
> type="s7:S7ResponsePayload" minOccurs="0" maxOccurs="unbounded"/>
>     >                 </xs:sequence>
>     >             </xs:extension>
>     >         </xs:complexContent>
>     >     </xs:complexType>
>     > 
>     > In the end it seems that DFDL doesn't extend Xml Schema, but uses a 
> subset of it to do it's job, is that correct?
>     > 
>     > I thought at first that if it's an extension I could start with a 
> schema and have a look as what it does and then to iteratively narrow it 
> down, but it seems that approach isn't valied.
>     > 
>     > Think first I need to learn how to do, what I want in DFDL. But I did 
> encounter some things that might be problematic (perhaps)
>     > 
>     > So sometimes I read a byte that contains a number of elements or a 
> length of an element and have to then read exactly this number of bytes or 
> exactly this number of parameters which summed up size matches a total 
> parameter size ...
>     > Hope it is possible to model stuff like this with DFDL.
>     > 
>     > Chris
>     > 
>     > 
>     > 
>     > 
>     > [1] 
> https://github.com/OpenDFDL/examples/blob/master/helloWorld/src/main/java/HelloWorld.java
>     > 
>     > 
>     > Am 10.01.19, 14:47 schrieb "Beckerle, Mike" <[email protected]>:
>     > 
>     >     This make sense to me architecturally as infrastructure means by 
> which people use this.
>     >     
>     >     
>     >     Compiling a DFDL schema into a any sort of compiled form, whether 
> that is generated code, or just a saved runtime data structure (like we have 
> now) is exactly what people want as a maven/sbt build step, so creating a 
> plugin that does this is very sensible.
>     >     
>     >     
>     >     Right now compiling is slow (unnecessarily. I hope we speed it up 
> soon, and reduce it's memory footprint), so a build step that is only re-run 
> if the schema actually changed is very useful to save time waiting around for 
> the Daffodil compiler.
>     >     
>     >     
>     >     I suggest that the generation of code from the daffodil 
> parser/unparser data structures will push the boundaries of what anyone would 
> call "template". This is going to be a quite sophisticated recursive descent 
> walk, accumulating a variety of things and eventually emitting the code. I 
> think it is totally worth it to try this though.
>     >     
>     >     ________________________________
>     >     From: Christofer Dutz <[email protected]>
>     >     Sent: Thursday, January 10, 2019 4:57:22 AM
>     >     To: [email protected]
>     >     Subject: Re: Using DFDL to generate model, parser and generator?
>     >     
>     >     Hi Mike,
>     >     
>     >     Well I am currently experimenting with creating a DFDL schema for 
> one of the many protocol layers we have.
>     >     
>     >     I would propose the following (Please correct me, if I'm wrong):
>     >     - We create DFDL Schemas
>     >     - We use Daffodil to process these (Assuming that in order to 
> process DFDL schemas, there has to be some sort of model representation)
>     >     - We add a Maven plugin, that uses the parsed schema representation 
> model and allows generating code via some templating language (Freemarker and 
> Velocity are both Apache ... so should be one of these)
>     >     - In a project you define templates for the current usecase (A 
> general purpose runtime would be sub-optimal for our case ... we would 
> probably use Netty utils for parsing/serializing)
>     >     
>     >     Perhaps based on these PLC4X templates it would make sense to build 
> other sets of templates as part of the Daffodil project.
>     >     Daffodil could have multiple sets of templates for different 
> languages and frameworks. Eventually a template module could have a runtime 
> module to be used in the code generated.
>     >     
>     >     So you would use the maven plugin without providing a 
> template-artifact and it would look for local templates. If however you 
> provide a template-artifact, then the plugin would use those.
>     >     
>     >     In the end I would probably build the maven plugin in a way that it 
> makes things easier to run it on the Command line or build plugins for SBT, 
> Gradle, Ant whatsoever ...
>     >     
>     >     What do you think?
>     >     
>     >     Chris
>     >     
>     >     
>     >     
>     >     Am 09.01.19, 20:10 schrieb "Beckerle, Mike" <[email protected]>:
>     >     
>     >         Christofer,
>     >     
>     >     
>     >         Yes what you suggest is possible, is what many people want, has 
> been talked about here and there, but I don't know of anyone else doing 
> exactly this right now.
>     >     
>     >     
>     >         Effectively what you are describing is a code-generator backend 
> for Daffodil. I think this is a great idea. I personally want to have one 
> that generates VHDL or Verilog or other Hardware synthesis language so you 
> can go direct to an FPGA for data parsing at hardware speed.
>     >     
>     >     
>     >         Anyway, such a generator would likely be adding to the existing 
> parser/unparser primitives so that in addition to having parse() and 
> unparse() methods, they would have generateCode() methods that emit the 
> equivalent code, and recursively invoke the sub-objects to generateCode() 
> that is incorporated recursively.
>     >     
>     >     
>     >         I would suggest that the existing Daffodil backend, which may 
> well not be fast enough for your needs, would nevertheless be very valuable 
> part of your testing strategy as your schemas should work on Daffodil, and 
> you can then verify that the parser behavior from your generated code is 
> consistent.  It also may be helpful for diagnostic purposes - ie., if data is 
> parsed and determined invalid, perhaps your "kit" to help your users involves 
> parsing such data with regular old Daffodil into XML for 
> tangibility/inspection.
>     >     
>     >     
>     >         There is a fair amount of runtime-library to be created to go 
> with the generated code of course. Daffodil has daffodil-lib, daffdil-io, 
> daffodil-runtime1, and daffodil-runtime1-unparser, each of which contains a 
> large volume of runtime code that would need to be replaced with C/C++ 
> equivalent in a new runtime. I would suggest much of the work is actually 
> here, not in the compilation.
>     >     
>     >     
>     >         I really hope you undertake this effort. I think it will be a 
> big value-add to Daffodil if it has a code-gen style backend. The current 
> back-end really hasn't had raw-speed as its goal. It has largely been about 
> correctness, and getting the DFDL standard fully/mostly implemented quickly. 
> Let us know how we can help you get started.
>     >     
>     >     
>     >         The other thing worth mentioning is that Daffodil does have on 
> roadmap, plans to create a streaming parser/unparser. This would not build a 
> DOM-tree like structure, but would instead emit events along the lines of a 
> SAX-style parse of data. Now some formats are simply not stream-able, and 
> there is no option to avoid building up a tree in memory. But many formats 
> are stream-able, and people really do want the ability to parse files much 
> larger than memory, in finite RAM, so long as the format is streamable.
>     >     
>     >     
>     >         -mike beckerle
>     >     
>     >         Tresys Technology
>     >     
>     >         ________________________________
>     >         From: Christofer Dutz <[email protected]>
>     >         Sent: Wednesday, January 9, 2019 8:56:28 AM
>     >         To: [email protected]
>     >         Subject: Using DFDL to generate model, parser and generator?
>     >     
>     >         Hi all,
>     >     
>     >         I am currently looking for a solution to the following question:
>     >     
>     >         In the Apache PLC4X (incubating) project we are implementing a 
> lot of different industry protocols.
>     >         Each protocol sends packets following a particular format. For 
> each of these we currently implement an internal model, serializers and 
> parsers.
>     >         Till now this has been pure Java, but we are now starting to 
> work on C++ and would like to add even more languages.
>     >     
>     >         As we don’t want to manually keep in sync all of these 
> implementations, my idea was to describe the data format in some form and 
> have the parsers, serializers and the model generated from that.
>     >         So the implementation only has to take care of the plumbing and 
> the state-machine of the protocol.
>     >     
>     >         In Montreal I attended a great talk on DFDL and Daffodil, so I 
> think DFDL in general would be a great fit.
>     >         Unfortunately we don’t want to parse any data format into an 
> XML or DOM representation for performance reasons.
>     >     
>     >         My ideal workflow would look like this:
>     >     
>     >           1.  For every protocol I define the DFDL documents describing 
> the different types of messages for a given protocol
>     >           2.  I define multiple protocol implementation modules (one 
> for each language)
>     >           3.  I use a maven plugin in each of these to generate the 
> code for that particular language from those central DFDL definitions
>     >     
>     >         Is this possible?
>     >         Is it planned to support this in the future?
>     >         What other options do you see for this sort of problem?
>     >     
>     >         I am absolutely willing to get my hands dirty and help 
> implement this, if you say: “Yes we want that too but haven’t managed to do 
> that yet”.
>     >     
>     >         Chris
>     >     
>     >     
>     >     
>     > 
>     > 
>     
>     
>

Re: Using DFDL to generate model, parser and generator?

Reply via email to