Christofer,
Yes what you suggest is possible, is what many people want, has been talked about here and there, but I don't know of anyone else doing exactly this right now. Effectively what you are describing is a code-generator backend for Daffodil. I think this is a great idea. I personally want to have one that generates VHDL or Verilog or other Hardware synthesis language so you can go direct to an FPGA for data parsing at hardware speed. Anyway, such a generator would likely be adding to the existing parser/unparser primitives so that in addition to having parse() and unparse() methods, they would have generateCode() methods that emit the equivalent code, and recursively invoke the sub-objects to generateCode() that is incorporated recursively. I would suggest that the existing Daffodil backend, which may well not be fast enough for your needs, would nevertheless be very valuable part of your testing strategy as your schemas should work on Daffodil, and you can then verify that the parser behavior from your generated code is consistent. It also may be helpful for diagnostic purposes - ie., if data is parsed and determined invalid, perhaps your "kit" to help your users involves parsing such data with regular old Daffodil into XML for tangibility/inspection. There is a fair amount of runtime-library to be created to go with the generated code of course. Daffodil has daffodil-lib, daffdil-io, daffodil-runtime1, and daffodil-runtime1-unparser, each of which contains a large volume of runtime code that would need to be replaced with C/C++ equivalent in a new runtime. I would suggest much of the work is actually here, not in the compilation. I really hope you undertake this effort. I think it will be a big value-add to Daffodil if it has a code-gen style backend. The current back-end really hasn't had raw-speed as its goal. It has largely been about correctness, and getting the DFDL standard fully/mostly implemented quickly. Let us know how we can help you get started. The other thing worth mentioning is that Daffodil does have on roadmap, plans to create a streaming parser/unparser. This would not build a DOM-tree like structure, but would instead emit events along the lines of a SAX-style parse of data. Now some formats are simply not stream-able, and there is no option to avoid building up a tree in memory. But many formats are stream-able, and people really do want the ability to parse files much larger than memory, in finite RAM, so long as the format is streamable. -mike beckerle Tresys Technology ________________________________ From: Christofer Dutz <[email protected]> Sent: Wednesday, January 9, 2019 8:56:28 AM To: [email protected] Subject: Using DFDL to generate model, parser and generator? Hi all, I am currently looking for a solution to the following question: In the Apache PLC4X (incubating) project we are implementing a lot of different industry protocols. Each protocol sends packets following a particular format. For each of these we currently implement an internal model, serializers and parsers. Till now this has been pure Java, but we are now starting to work on C++ and would like to add even more languages. As we don’t want to manually keep in sync all of these implementations, my idea was to describe the data format in some form and have the parsers, serializers and the model generated from that. So the implementation only has to take care of the plumbing and the state-machine of the protocol. In Montreal I attended a great talk on DFDL and Daffodil, so I think DFDL in general would be a great fit. Unfortunately we don’t want to parse any data format into an XML or DOM representation for performance reasons. My ideal workflow would look like this: 1. For every protocol I define the DFDL documents describing the different types of messages for a given protocol 2. I define multiple protocol implementation modules (one for each language) 3. I use a maven plugin in each of these to generate the code for that particular language from those central DFDL definitions Is this possible? Is it planned to support this in the future? What other options do you see for this sort of problem? I am absolutely willing to get my hands dirty and help implement this, if you say: “Yes we want that too but haven’t managed to do that yet”. Chris
