Hi all,

There's interest in being able to run Metron parsers in NiFi, rather than
inside Storm. I dug into this a bit, and have some thoughts on how we could
go about this.  I'd love feedback on this, along with anything we'd
consider must haves as well as future enhancements.

   1. Separate metron-parsers into metron-parsers-common and metron-storm
   and create metron-parsers-nifi. For this code to be reusable across
   platforms (NiFi, Storm, and anything else in the future), we'll need to
   decouple our parsers and Storm.
      - There's also some nice fringe benefits around refactoring our code
      to be substantially more clear and understandable; something
which came up
      while allowing for parser aggregation.
   2. Create a MetronProcessor that can run our parsers.
      - I took a look at how RecordReader could be leveraged (e.g.
      CSVRecordReader), but this is pretty tightly tied into schemas
and is meant
      to be used by ControllerServices, which are then used by Processors.
      There's friction involved there in terms of schemas, but also in terms of
      access to ZK configs and things like parser chaining.  We might
be able to
      leverage it, but it seems like it'd be fairly shoehorned in
without getting
      the schema and other benefits.
      - This Processor would work similarly to Storm: bytes[] in -> JSON
      out.
      - There is a Processor
      
<https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java>
that
      handles loading other JARs that we can model a
MetronParserProcessor off of
      that handles classpath/classloader issues (basically just sets up a
      classloader specific to what's being loaded and swaps out the Thread's
      loader when it calls to outside resources).
      3. Create a MetronZkControllerService to supply our configs to our
   processors.
      - This is a pretty established NiFi pattern for being able to provide
      access to other services needed by a Processor (e.g. databases or large
      configurations files).
      - The same controller service can be used by all Processors to manage
      configs in a consistent manner.

At that point, we can just NAR our controller service and parser processor
up as needed, deploy them to NiFi, and let the user provide a config for
where their custom parsers can be provided (i.e. their parser jar).  This
would be 3 nars (processor, controller-service, and controller-service-api
in order to bind the other two together).

Once deployed, our ability to use parsers should fit well into the standard
NiFi workflow:

   1. Create a MetronZkControllerService.
   2. Configure the service to point at zookeeper.
   3. Create a MetronParser.
   4. Configure it to use the controller service + parser jar location +
   any other needed configs.
   5. Use the outputs as needed downstream (either writing out to Kafka or
   feeding into more MetronParsers, etc.)

Chaining parsers should ideally become a matter of chaining MetronParsers
(and making sure the enveloping configs carry through properly). For parser
aggregation, I'd just avoid it entirely until we know it's needed in NiFi.

Justin

Reply via email to