Hi all, There's interest in being able to run Metron parsers in NiFi, rather than inside Storm. I dug into this a bit, and have some thoughts on how we could go about this. I'd love feedback on this, along with anything we'd consider must haves as well as future enhancements.
1. Separate metron-parsers into metron-parsers-common and metron-storm and create metron-parsers-nifi. For this code to be reusable across platforms (NiFi, Storm, and anything else in the future), we'll need to decouple our parsers and Storm. - There's also some nice fringe benefits around refactoring our code to be substantially more clear and understandable; something which came up while allowing for parser aggregation. 2. Create a MetronProcessor that can run our parsers. - I took a look at how RecordReader could be leveraged (e.g. CSVRecordReader), but this is pretty tightly tied into schemas and is meant to be used by ControllerServices, which are then used by Processors. There's friction involved there in terms of schemas, but also in terms of access to ZK configs and things like parser chaining. We might be able to leverage it, but it seems like it'd be fairly shoehorned in without getting the schema and other benefits. - This Processor would work similarly to Storm: bytes[] in -> JSON out. - There is a Processor <https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java> that handles loading other JARs that we can model a MetronParserProcessor off of that handles classpath/classloader issues (basically just sets up a classloader specific to what's being loaded and swaps out the Thread's loader when it calls to outside resources). 3. Create a MetronZkControllerService to supply our configs to our processors. - This is a pretty established NiFi pattern for being able to provide access to other services needed by a Processor (e.g. databases or large configurations files). - The same controller service can be used by all Processors to manage configs in a consistent manner. At that point, we can just NAR our controller service and parser processor up as needed, deploy them to NiFi, and let the user provide a config for where their custom parsers can be provided (i.e. their parser jar). This would be 3 nars (processor, controller-service, and controller-service-api in order to bind the other two together). Once deployed, our ability to use parsers should fit well into the standard NiFi workflow: 1. Create a MetronZkControllerService. 2. Configure the service to point at zookeeper. 3. Create a MetronParser. 4. Configure it to use the controller service + parser jar location + any other needed configs. 5. Use the outputs as needed downstream (either writing out to Kafka or feeding into more MetronParsers, etc.) Chaining parsers should ideally become a matter of chaining MetronParsers (and making sure the enveloping configs carry through properly). For parser aggregation, I'd just avoid it entirely until we know it's needed in NiFi. Justin