Hey Steve,

That looks really cool!  I cloned the git repo for the processors and
everything looks pretty standard. I was impressed with how concise your api
is, those concrete impls for parse and unparse took very little code :).

Pulled down daffodil source and built it (my first experience with sbt) and
am planning on getting my feet wet when I have some spare time.

I think this could be really useful for handling data formats we don't
already have native processors for and I agree that transforming arbitrary
data formats to/from xml makes for a really compelling
validation/transformation usecase.

Thanks,
Bryan

On Thu, Feb 2, 2017 at 12:51 PM, Steve Lawrence <[email protected]>
wrote:

> We have developed two new NiFi processors, called DaffodilParse and
> DaffodilUnparse, which add support for the Daffodil open source project
> [1] to NiFi. We were interested in any feedback the NiFi development
> community might have. The code for the processors is available at the
> following link:
>
>
> https://opensource.ncsa.illinois.edu/bitbucket/
> projects/DFDL/repos/daffodil-nifi/browse
>
> Note that this currently depends on a snapshot of the latest version of
> Daffodil, so this likely is not the final form, but it is functional and
> gives a good idea of how we think a Daffodil processor might work.
>
> A little about Daffodil, for approximately the past 5 years, a group of
> us have been working on the Daffodil project, an open source
> implementation of the Data Format Description Language (DFDL) [2]. At a
> very high level, DFDL defines a language that describes a wide variety
> of data formats [3], including both text and binary, using XML schema
> and annotations. It also defines how a DFDL implementation can use this
> description to "parse" data into an XML infoset, and how this infoset
> can be "unparsed" or serialized back into the original file format. By
> using an XML infoset, DFDL provides a simple mechanism that allows one
> to take advantage of the many XML technologies (e.g. XProc, XPath, XSLT,
> Schematron) to validate, manipulate, create, and ingest complex data
> formats.
>
> The Daffodil project is nearing the 2.0 release, which will include
> support for both parsing and unparsing many complex data formats. With
> this maturity, we think one potential use case for Daffodil is a NiFi
> processor that can ingest data and parse it to XML. This XML can then be
> validated/queried/transformed with the various existing NiFi XML
> processors (e.g. EvaluateXQuery, SplitXml, ValidateXml, TransformXml)
> and flow into other processors. A second Daffodil NiFi processor could
> read the resulting XML and unparse it back to the original file format.
> The two processors mentioned above do exactly that.
>
> If you would like to try out the processors, the usual 'mvn install'
> will create a nar file containing the two processors. Both processors
> require a single parameter to the path of a DFDL schema file (ending in
> .dfdl.xsd by convention). The test directory in the repository contains
> a DFDL schema describing CSV and a test file. However, the PCAP schema,
> found here
>
>   https://github.com/DFDLSchemas/PCAP
>
> is a bit more interesting, describing multiple layers of the network
> stack of a packet capture file, showing things like IPv6, IPv4, MAC/IP
> addresses, ports, protocols, etc. The PCAP DFDL schema is in the
> src/main/resources/xsd directory, with some example PCAP files in
> src/tests/resources/tests. These have all been tested to work with NiFi
> 1.1.1.
>
> Thanks and we look forward to any feedback,
> - Steve
>
>
> [1]
> https://opensource.ncsa.illinois.edu/confluence/
> display/DFDL/Daffodil%3A+Open+Source+DFDL
> [2] https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> [3] https://github.com/DFDLSchemas
>

Reply via email to