Hey Steve, That looks really cool! I cloned the git repo for the processors and everything looks pretty standard. I was impressed with how concise your api is, those concrete impls for parse and unparse took very little code :).
Pulled down daffodil source and built it (my first experience with sbt) and am planning on getting my feet wet when I have some spare time. I think this could be really useful for handling data formats we don't already have native processors for and I agree that transforming arbitrary data formats to/from xml makes for a really compelling validation/transformation usecase. Thanks, Bryan On Thu, Feb 2, 2017 at 12:51 PM, Steve Lawrence <[email protected]> wrote: > We have developed two new NiFi processors, called DaffodilParse and > DaffodilUnparse, which add support for the Daffodil open source project > [1] to NiFi. We were interested in any feedback the NiFi development > community might have. The code for the processors is available at the > following link: > > > https://opensource.ncsa.illinois.edu/bitbucket/ > projects/DFDL/repos/daffodil-nifi/browse > > Note that this currently depends on a snapshot of the latest version of > Daffodil, so this likely is not the final form, but it is functional and > gives a good idea of how we think a Daffodil processor might work. > > A little about Daffodil, for approximately the past 5 years, a group of > us have been working on the Daffodil project, an open source > implementation of the Data Format Description Language (DFDL) [2]. At a > very high level, DFDL defines a language that describes a wide variety > of data formats [3], including both text and binary, using XML schema > and annotations. It also defines how a DFDL implementation can use this > description to "parse" data into an XML infoset, and how this infoset > can be "unparsed" or serialized back into the original file format. By > using an XML infoset, DFDL provides a simple mechanism that allows one > to take advantage of the many XML technologies (e.g. XProc, XPath, XSLT, > Schematron) to validate, manipulate, create, and ingest complex data > formats. > > The Daffodil project is nearing the 2.0 release, which will include > support for both parsing and unparsing many complex data formats. With > this maturity, we think one potential use case for Daffodil is a NiFi > processor that can ingest data and parse it to XML. This XML can then be > validated/queried/transformed with the various existing NiFi XML > processors (e.g. EvaluateXQuery, SplitXml, ValidateXml, TransformXml) > and flow into other processors. A second Daffodil NiFi processor could > read the resulting XML and unparse it back to the original file format. > The two processors mentioned above do exactly that. > > If you would like to try out the processors, the usual 'mvn install' > will create a nar file containing the two processors. Both processors > require a single parameter to the path of a DFDL schema file (ending in > .dfdl.xsd by convention). The test directory in the repository contains > a DFDL schema describing CSV and a test file. However, the PCAP schema, > found here > > https://github.com/DFDLSchemas/PCAP > > is a bit more interesting, describing multiple layers of the network > stack of a packet capture file, showing things like IPv6, IPv4, MAC/IP > addresses, ports, protocols, etc. The PCAP DFDL schema is in the > src/main/resources/xsd directory, with some example PCAP files in > src/tests/resources/tests. These have all been tested to work with NiFi > 1.1.1. > > Thanks and we look forward to any feedback, > - Steve > > > [1] > https://opensource.ncsa.illinois.edu/confluence/ > display/DFDL/Daffodil%3A+Open+Source+DFDL > [2] https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl > [3] https://github.com/DFDLSchemas >
