Hello Steve,

A DFDL NiFi processor sounds like a great application of the open-source
work and I agree with Bryan's comments on the API as well as the potential
to handle data formats we don't already support.

Is there anything specific you're looking for feedback on? A leading
question, are you looking for advice on things to add in order to
contribute the processor to Apache or will it continue to live in the NCSA
repo?

Couple notes off the bat, I see other projects in that repo are licensed
BSD 3-line but the NiFi processors are Apache v2. Just want to make sure
that's on purpose. Also you'll need to properly reference your dependencies
with a notice file (since the nar file packages them in). We have a
licensing guide here[1] if you need something to reference. Also you have
an info logging statement in the onTrigger that always executes. I'd
suggest changing that to debug. Lastly, if this was to be contributed to
Apache, I'd ask for more general documentation. A general NiFi user with no
knowledge of DFDL and the use-case of handling these data formats should be
able to read your documentation and understand what the processor does
(linking to something for more details is appropriate).

[1] https://nifi.apache.org/licensing-guide.html

Thanks for your work and I look forward to seeing a DFDL v2 NiFi processor
soon!

Joe


On Fri, Feb 3, 2017 at 10:22 AM, Bryan Rosander <brosan...@apache.org>
wrote:

> Hey Steve,
>
> That looks really cool!  I cloned the git repo for the processors and
> everything looks pretty standard. I was impressed with how concise your api
> is, those concrete impls for parse and unparse took very little code :).
>
> Pulled down daffodil source and built it (my first experience with sbt) and
> am planning on getting my feet wet when I have some spare time.
>
> I think this could be really useful for handling data formats we don't
> already have native processors for and I agree that transforming arbitrary
> data formats to/from xml makes for a really compelling
> validation/transformation usecase.
>
> Thanks,
> Bryan
>
> On Thu, Feb 2, 2017 at 12:51 PM, Steve Lawrence <slawre...@tresys.com>
> wrote:
>
> > We have developed two new NiFi processors, called DaffodilParse and
> > DaffodilUnparse, which add support for the Daffodil open source project
> > [1] to NiFi. We were interested in any feedback the NiFi development
> > community might have. The code for the processors is available at the
> > following link:
> >
> >
> > https://opensource.ncsa.illinois.edu/bitbucket/
> > projects/DFDL/repos/daffodil-nifi/browse
> >
> > Note that this currently depends on a snapshot of the latest version of
> > Daffodil, so this likely is not the final form, but it is functional and
> > gives a good idea of how we think a Daffodil processor might work.
> >
> > A little about Daffodil, for approximately the past 5 years, a group of
> > us have been working on the Daffodil project, an open source
> > implementation of the Data Format Description Language (DFDL) [2]. At a
> > very high level, DFDL defines a language that describes a wide variety
> > of data formats [3], including both text and binary, using XML schema
> > and annotations. It also defines how a DFDL implementation can use this
> > description to "parse" data into an XML infoset, and how this infoset
> > can be "unparsed" or serialized back into the original file format. By
> > using an XML infoset, DFDL provides a simple mechanism that allows one
> > to take advantage of the many XML technologies (e.g. XProc, XPath, XSLT,
> > Schematron) to validate, manipulate, create, and ingest complex data
> > formats.
> >
> > The Daffodil project is nearing the 2.0 release, which will include
> > support for both parsing and unparsing many complex data formats. With
> > this maturity, we think one potential use case for Daffodil is a NiFi
> > processor that can ingest data and parse it to XML. This XML can then be
> > validated/queried/transformed with the various existing NiFi XML
> > processors (e.g. EvaluateXQuery, SplitXml, ValidateXml, TransformXml)
> > and flow into other processors. A second Daffodil NiFi processor could
> > read the resulting XML and unparse it back to the original file format.
> > The two processors mentioned above do exactly that.
> >
> > If you would like to try out the processors, the usual 'mvn install'
> > will create a nar file containing the two processors. Both processors
> > require a single parameter to the path of a DFDL schema file (ending in
> > .dfdl.xsd by convention). The test directory in the repository contains
> > a DFDL schema describing CSV and a test file. However, the PCAP schema,
> > found here
> >
> >   https://github.com/DFDLSchemas/PCAP
> >
> > is a bit more interesting, describing multiple layers of the network
> > stack of a packet capture file, showing things like IPv6, IPv4, MAC/IP
> > addresses, ports, protocols, etc. The PCAP DFDL schema is in the
> > src/main/resources/xsd directory, with some example PCAP files in
> > src/tests/resources/tests. These have all been tested to work with NiFi
> > 1.1.1.
> >
> > Thanks and we look forward to any feedback,
> > - Steve
> >
> >
> > [1]
> > https://opensource.ncsa.illinois.edu/confluence/
> > display/DFDL/Daffodil%3A+Open+Source+DFDL
> > [2] https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> > [3] https://github.com/DFDLSchemas
> >
>



-- 
*Joe Percivall*
linkedin.com/in/Percivall
e: jperciv...@apache.com

Reply via email to