Bryan & Joe, Thanks for the responses!
As far as contribution to Apache NiFi, we're certainly in favor of this eventually being merged in and will make any recommended changes towards that effort. As far as licensing goes, some of the BSD dependencies come from Daffodil's dependence on Scala, so those are intentional and is something we can't change. The other dependencies may be flexible if there are issues. Making the Daffodil NiFi processor Apache v2 was done in part to make merging into NiFi easier if that was eventually decided, but also because the code is heavily based off of the TransformXML NiFi processor, so it just made sense. I'll go through the provided link and make the necessary changes to make sure we are in line with the Apache license and the dependencies. Somewhat related, considering Daffodil's applicability to many Apache projects related to data (e.g. NiFi, Spark, Hadoop, Tika, etc.) we hope to eventually have Daffodil join Apache via the incubator (anyone interesting in being our champion, feel free to contact me), and so are already in discussion about switching the Daffodil license from NCSA to Apache and determining everything involved with that. Lastly, improved documentation is definitely on our radar. That's something we're constantly trying to improve. Daffodil/DFDL is a new technology for pretty much everyone, and it can have a pretty steep learning curve. We'll do everything we can to make it easier. We will for sure include some changes to documentation. Thanks for the feedback! - Steve On 02/03/2017 10:40 AM, Joe Percivall wrote: > Hello Steve, > > A DFDL NiFi processor sounds like a great application of the open-source > work and I agree with Bryan's comments on the API as well as the potential > to handle data formats we don't already support. > > Is there anything specific you're looking for feedback on? A leading > question, are you looking for advice on things to add in order to > contribute the processor to Apache or will it continue to live in the NCSA > repo? > > Couple notes off the bat, I see other projects in that repo are licensed > BSD 3-line but the NiFi processors are Apache v2. Just want to make sure > that's on purpose. Also you'll need to properly reference your dependencies > with a notice file (since the nar file packages them in). We have a > licensing guide here[1] if you need something to reference. Also you have > an info logging statement in the onTrigger that always executes. I'd > suggest changing that to debug. Lastly, if this was to be contributed to > Apache, I'd ask for more general documentation. A general NiFi user with no > knowledge of DFDL and the use-case of handling these data formats should be > able to read your documentation and understand what the processor does > (linking to something for more details is appropriate). > > [1] https://nifi.apache.org/licensing-guide.html > > Thanks for your work and I look forward to seeing a DFDL v2 NiFi processor > soon! > > Joe > > > On Fri, Feb 3, 2017 at 10:22 AM, Bryan Rosander <[email protected]> > wrote: > >> Hey Steve, >> >> That looks really cool! I cloned the git repo for the processors and >> everything looks pretty standard. I was impressed with how concise your api >> is, those concrete impls for parse and unparse took very little code :). >> >> Pulled down daffodil source and built it (my first experience with sbt) and >> am planning on getting my feet wet when I have some spare time. >> >> I think this could be really useful for handling data formats we don't >> already have native processors for and I agree that transforming arbitrary >> data formats to/from xml makes for a really compelling >> validation/transformation usecase. >> >> Thanks, >> Bryan >> >> On Thu, Feb 2, 2017 at 12:51 PM, Steve Lawrence <[email protected]> >> wrote: >> >>> We have developed two new NiFi processors, called DaffodilParse and >>> DaffodilUnparse, which add support for the Daffodil open source project >>> [1] to NiFi. We were interested in any feedback the NiFi development >>> community might have. The code for the processors is available at the >>> following link: >>> >>> >>> https://opensource.ncsa.illinois.edu/bitbucket/ >>> projects/DFDL/repos/daffodil-nifi/browse >>> >>> Note that this currently depends on a snapshot of the latest version of >>> Daffodil, so this likely is not the final form, but it is functional and >>> gives a good idea of how we think a Daffodil processor might work. >>> >>> A little about Daffodil, for approximately the past 5 years, a group of >>> us have been working on the Daffodil project, an open source >>> implementation of the Data Format Description Language (DFDL) [2]. At a >>> very high level, DFDL defines a language that describes a wide variety >>> of data formats [3], including both text and binary, using XML schema >>> and annotations. It also defines how a DFDL implementation can use this >>> description to "parse" data into an XML infoset, and how this infoset >>> can be "unparsed" or serialized back into the original file format. By >>> using an XML infoset, DFDL provides a simple mechanism that allows one >>> to take advantage of the many XML technologies (e.g. XProc, XPath, XSLT, >>> Schematron) to validate, manipulate, create, and ingest complex data >>> formats. >>> >>> The Daffodil project is nearing the 2.0 release, which will include >>> support for both parsing and unparsing many complex data formats. With >>> this maturity, we think one potential use case for Daffodil is a NiFi >>> processor that can ingest data and parse it to XML. This XML can then be >>> validated/queried/transformed with the various existing NiFi XML >>> processors (e.g. EvaluateXQuery, SplitXml, ValidateXml, TransformXml) >>> and flow into other processors. A second Daffodil NiFi processor could >>> read the resulting XML and unparse it back to the original file format. >>> The two processors mentioned above do exactly that. >>> >>> If you would like to try out the processors, the usual 'mvn install' >>> will create a nar file containing the two processors. Both processors >>> require a single parameter to the path of a DFDL schema file (ending in >>> .dfdl.xsd by convention). The test directory in the repository contains >>> a DFDL schema describing CSV and a test file. However, the PCAP schema, >>> found here >>> >>> https://github.com/DFDLSchemas/PCAP >>> >>> is a bit more interesting, describing multiple layers of the network >>> stack of a packet capture file, showing things like IPv6, IPv4, MAC/IP >>> addresses, ports, protocols, etc. The PCAP DFDL schema is in the >>> src/main/resources/xsd directory, with some example PCAP files in >>> src/tests/resources/tests. These have all been tested to work with NiFi >>> 1.1.1. >>> >>> Thanks and we look forward to any feedback, >>> - Steve >>> >>> >>> [1] >>> https://opensource.ncsa.illinois.edu/confluence/ >>> display/DFDL/Daffodil%3A+Open+Source+DFDL >>> [2] https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl >>> [3] https://github.com/DFDLSchemas >>> >> > > >
