Re: Using DFDL to generate model, parser and generator?

Beckerle, Mike Wed, 09 Jan 2019 11:10:49 -0800

Christofer,


Yes what you suggest is possible, is what many people want, has been talked 
about here and there, but I don't know of anyone else doing exactly this right 
now.


Effectively what you are describing is a code-generator backend for Daffodil. I 
think this is a great idea. I personally want to have one that generates VHDL 
or Verilog or other Hardware synthesis language so you can go direct to an FPGA 
for data parsing at hardware speed.


Anyway, such a generator would likely be adding to the existing parser/unparser 
primitives so that in addition to having parse() and unparse() methods, they 
would have generateCode() methods that emit the equivalent code, and 
recursively invoke the sub-objects to generateCode() that is incorporated 
recursively.


I would suggest that the existing Daffodil backend, which may well not be fast 
enough for your needs, would nevertheless be very valuable part of your testing 
strategy as your schemas should work on Daffodil, and you can then verify that 
the parser behavior from your generated code is consistent.  It also may be 
helpful for diagnostic purposes - ie., if data is parsed and determined 
invalid, perhaps your "kit" to help your users involves parsing such data with 
regular old Daffodil into XML for tangibility/inspection.


There is a fair amount of runtime-library to be created to go with the 
generated code of course. Daffodil has daffodil-lib, daffdil-io, 
daffodil-runtime1, and daffodil-runtime1-unparser, each of which contains a 
large volume of runtime code that would need to be replaced with C/C++ 
equivalent in a new runtime. I would suggest much of the work is actually here, 
not in the compilation.


I really hope you undertake this effort. I think it will be a big value-add to 
Daffodil if it has a code-gen style backend. The current back-end really hasn't 
had raw-speed as its goal. It has largely been about correctness, and getting 
the DFDL standard fully/mostly implemented quickly. Let us know how we can help 
you get started.


The other thing worth mentioning is that Daffodil does have on roadmap, plans 
to create a streaming parser/unparser. This would not build a DOM-tree like 
structure, but would instead emit events along the lines of a SAX-style parse 
of data. Now some formats are simply not stream-able, and there is no option to 
avoid building up a tree in memory. But many formats are stream-able, and 
people really do want the ability to parse files much larger than memory, in 
finite RAM, so long as the format is streamable.


-mike beckerle

Tresys Technology

________________________________
From: Christofer Dutz <[email protected]>
Sent: Wednesday, January 9, 2019 8:56:28 AM
To: [email protected]
Subject: Using DFDL to generate model, parser and generator?

Hi all,

I am currently looking for a solution to the following question:

In the Apache PLC4X (incubating) project we are implementing a lot of different 
industry protocols.
Each protocol sends packets following a particular format. For each of these we 
currently implement an internal model, serializers and parsers.
Till now this has been pure Java, but we are now starting to work on C++ and 
would like to add even more languages.

As we don’t want to manually keep in sync all of these implementations, my idea 
was to describe the data format in some form and have the parsers, serializers 
and the model generated from that.
So the implementation only has to take care of the plumbing and the 
state-machine of the protocol.

In Montreal I attended a great talk on DFDL and Daffodil, so I think DFDL in 
general would be a great fit.
Unfortunately we don’t want to parse any data format into an XML or DOM 
representation for performance reasons.

My ideal workflow would look like this:

  1.  For every protocol I define the DFDL documents describing the different 
types of messages for a given protocol
  2.  I define multiple protocol implementation modules (one for each language)
  3.  I use a maven plugin in each of these to generate the code for that 
particular language from those central DFDL definitions

Is this possible?
Is it planned to support this in the future?
What other options do you see for this sort of problem?

I am absolutely willing to get my hands dirty and help implement this, if you 
say: “Yes we want that too but haven’t managed to do that yet”.

Chris

Re: Using DFDL to generate model, parser and generator?

Reply via email to