[DISCUSS] Metron Parsers in Nifi

Justin Leet Tue, 07 Aug 2018 06:27:13 -0700

Hi all,

There's interest in being able to run Metron parsers in NiFi, rather than
inside Storm. I dug into this a bit, and have some thoughts on how we could
go about this.  I'd love feedback on this, along with anything we'd
consider must haves as well as future enhancements.

1. Separate metron-parsers into metron-parsers-common and metron-storm
and create metron-parsers-nifi. For this code to be reusable across
platforms (NiFi, Storm, and anything else in the future), we'll need to
decouple our parsers and Storm.
- There's also some nice fringe benefits around refactoring our code
to be substantially more clear and understandable; something
which came up
while allowing for parser aggregation.
2. Create a MetronProcessor that can run our parsers.
- I took a look at how RecordReader could be leveraged (e.g.
CSVRecordReader), but this is pretty tightly tied into schemas
and is meant
to be used by ControllerServices, which are then used by Processors.
There's friction involved there in terms of schemas, but also in terms of
access to ZK configs and things like parser chaining. We might
be able to
leverage it, but it seems like it'd be fairly shoehorned in
without getting
the schema and other benefits.
- This Processor would work similarly to Storm: bytes[] in -> JSON
out.
- There is a Processor

<https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java>
that
handles loading other JARs that we can model a
MetronParserProcessor off of
that handles classpath/classloader issues (basically just sets up a
classloader specific to what's being loaded and swaps out the Thread's
loader when it calls to outside resources).
3. Create a MetronZkControllerService to supply our configs to our
processors.
- This is a pretty established NiFi pattern for being able to provide
access to other services needed by a Processor (e.g. databases or large
configurations files).
- The same controller service can be used by all Processors to manage
configs in a consistent manner.

At that point, we can just NAR our controller service and parser processor
up as needed, deploy them to NiFi, and let the user provide a config for
where their custom parsers can be provided (i.e. their parser jar). This
would be 3 nars (processor, controller-service, and controller-service-api
in order to bind the other two together).

Once deployed, our ability to use parsers should fit well into the standard
NiFi workflow:

1. Create a MetronZkControllerService.
2. Configure the service to point at zookeeper.
3. Create a MetronParser.
4. Configure it to use the controller service + parser jar location +
any other needed configs.
5. Use the outputs as needed downstream (either writing out to Kafka or
feeding into more MetronParsers, etc.)

Chaining parsers should ideally become a matter of chaining MetronParsers
(and making sure the enveloping configs carry through properly). For parser
aggregation, I'd just avoid it entirely until we know it's needed in NiFi.

Justin

[DISCUSS] Metron Parsers in Nifi

Reply via email to