Re: [DISCUSS] Metron Parsers in Nifi

Otto Fowler Thu, 09 Aug 2018 07:06:19 -0700

I think the benefits are clear.  What is unclear is if the goal is to
expose or share or re-use Metron capabilities ( stellar, parsing ) in nifi
in a way that is native to nifi ( configured and managed in nifi ), where
you may not even need metron ( say you just want to parse asa ) or if the
goal is to have a hybrid approach coupling the processors/readers to the
metron installation.



On August 9, 2018 at 09:14:58, Justin Leet (justinjl...@gmail.com) wrote:

I'll add onto Mike's discussion with the original set of requirements I had
in mind (and apply feedback on these as necessary!). This is largely
overlap with what Mike said, but I want to make sure it's clear where my
proposal was coming from, so we can improve on it as needed. James and
Mike are also right, I think I skipped over the benefits of NiFi in general
a bit, so thanks for chiming in there.

- Deploy our bundled parsers without needing custom wrapping on all of
them.
- Don't prevent ourselves from building custom wrapping as needed.
- Custom Java parsers with an easy way to hook in, similar to what we
already do in Storm.
- One stop (or at least one format) configuration, for the case when we're
doing some thing in NiFi (parsers) and some elsewhere (enrichment and
indexing). I don't think it'll always be "start in NiFi, end in Storm",
especially as we build out Stellar capability, but I also don't want users
learning a different set of configs and config tools for every platform we
run on.
- Ability to build out parsers and other systems fairly easily, e.g. Spark.
- Support our current use cases (in particular parser chaining as a more
advanced use case).

It really boils down to providing a relatively simple user path to be able
to migrate to NiFi as needed or desired as simply as possible in a very
general way, while not preventing parser by parser enhancements.

On Wed, Aug 8, 2018 at 7:14 PM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> I think it also provides customers greater control over their
architecture
> by giving them the flexibility to choose where/how to host their parsers.
>
> To Justin's point about the API, my biggest concern about the
RecordReader
> approach is that it is not stable. We already have a similar problem in
> having the TransportClient in ElasticSearch - they are prone to changing
it
> in minor versions with the advent of their newer REST API, which is
> problematic for ensuring a stable installation.
>
> From my own perspective, our goal with NiFi, at least in part, should be
> the ability to deploy our core parsing infrastructure, i.e.
>
> - pre-built parsers
> - custom java parsers
> - Stellar transforms
> - custom stellar transforms
>
> And have the ability to configure it similarly to how we configure
parsers
> within Storm. Consistent with our recent parser chaining and aggregation
> feature, users should be able to construct and deploy similar constructs
in
> NiFi. The core architectural shift would be that parser code should be
> platform agnostic. We provide the plumbing in Storm, NiFi, and <Spark
> Streaming?, other> and platform architects and devops teams can choose
how
> and where to deploy.
>
> Best,
> Mike
>
>
> On Wed, Aug 8, 2018 at 9:57 AM James Sirota <jsir...@apache.org> wrote:
>
> > Integration with NiFi would be useful for parsing low-volume
telemetries
> > at the edge. This is a much more resource friendly way to do it than
> > setting up dedicated storm topologies. The integration would be that
the
> > NiFi processor parses the data and pushes it straight into the
enrichment
> > topic, saving us the resources of having multiple parsers in storm
> >
> > Thanks,
> > James
> >
> > 07.08.2018, 11:29, "Otto Fowler" <ottobackwa...@gmail.com>:
> > > Why do we start over. We are going back and forth on implementation,
> and
> > I
> > > don’t think we have the same goals or concerns.
> > >
> > > What would be the requirements or goals of metron integration with
> Nifi?
> > > How many levels or options for integration do we have?
> > > What are the approaches to choose from?
> > > Who are the target users?
> > >
> > > On August 7, 2018 at 12:24:56, Justin Leet (justinjl...@gmail.com)
> > wrote:
> > >
> > > So how does the MetronRecordReader roll into everything? It seems
like
> > it'd
> > > be more useful on the reader per format approach, but otherwise it
> > doesn't
> > > really seem like we gain much, and it requires getting everything
> linked
> > up
> > > properly to be used. Assuming we looked at doing it that way, is the
> idea
> > > that we'd setup a ControllerService with the MetronRecordReader and a
> > > MetronRecordWriter and then have the StellarTransformRecord processor
> > > configured with those ControllerServices? How do we manage the
> > > configurations of the everything that way? How does the
> ControllerService
> > > get configured with whatever parser(s) are needed in the flow?
> Basically,
> > > what's your vision for how everything would tie together?
> > >
> > > I also forgot to mention this in the original writeup, but there's
> > another
> > > reason to avoid the RecordReader: It's not considered stable. See
> > >
> >
>
https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java#L34
> > .
> > > That alone makes me super hesitant to use it, if it can shift out
from
> > > under us in even in incremental version.
> > >
> > > I'm also unclear on why StellarTransformRecord processor matters for
> > either
> > > approach. With the Processor approach you could simply follow it up
> with
> > > the Stellar processor, the same way you'd would in the RecordReader
> > > approach. The Stellar processor should be a parallel improvement, not
a
> > > conflicting one.
> > >
> > > On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <ottobackwa...@gmail.com>
> > wrote:
> > >
> > >> A Metron Processor itself isn’t really necessary. A
> MetronRecordReader
> > (
> > >> either the megalithic or a reader per format ) would be a good
> > approach.
> > >> Then have StellarTransformRecord processor that can do Stellar on
> _any_
> > >> record, regardless of source.
> > >>
> > >> On August 7, 2018 at 11:06:22, Justin Leet (justinjl...@gmail.com)
> > wrote:
> > >>
> > >> Thanks for the comments, Otto, this is definitely great feedback.
I'd
> > >> love to respond inline, but the email's already starting to lose
it's
> > >> formatting, so I'll go with the classic "wall of text". Let me know
> if
> > I
> > >> didn't address everything.
> > >>
> > >> Loading modules (or jars or whatever) outside of our Processor gives
> us
> > >> the benefit of making it incredibly easy for a users to create their
> > own
> > >> parsers. I would definitely expect our own bundled parsers to be
> > included
> > >> in our base NAR, but loading modules enables users to only have to
> > learn
> > >> how Metron wants our stuff lined up and just plug it in. Having said
> > that,
> > >> I could see having a wrapper for our bundled parsers that makes it
> > really
> > >> easy to just say you want an MetronAsaParser or MetronBroParser,
etc.
> > That
> > >> would give us the best of both worlds, where it's easy to get setup
> our
> > >> bundled parsers and also trivial to pull in non-bundled parsers.
What
> > >> doing this gives us is an easy way to support (hopefully) every
> parser
> > that
> > >> gets made, right out of the box, without us needing to build a
> > specialized
> > >> version of everything until we decide to and without users having to
> > jump
> > >> through hoops.
> > >>
> > >> None of this prevents anyone from creating specialized parsers (for
> > perf
> > >> reasons, or to use the schema registries, or anything else). It's
> > probably
> > >> worthwhile to package up some of built-in parsers and customize them
> > to use
> > >> more specialized feature appropriately as we see things get used in
> the
> > >> wild. Like you said, we could likely provide Avro schemas for some
of
> > this
> > >> and give users a more robust experience on what we choose to support
> > and
> > >> provide guidance for other things. I'm also worried that building
> > >> specialized schemas becomes problematic for things like parser
> chaining
> > >> (where our routers wrap the underlying messages and add on their own
> > info).
> > >> Going down that road potentially requires anything wrapped to have a
> > >> specialized schema for the wrapped version in addition to a vanilla
> > version
> > >> (although please correct me if I'm missing something there, I'll
> openly
> > >> admit to some shakiness on how that would be handled).
> > >>
> > >> I also disagree that this is un-Nifi-like, although I'm admittedly
> not
> > as
> > >> skilled there. The basis for doing this is directly inspired by the
> > >> JoltTransformer, which is extremely similar to the proposed setup
for
> > our
> > >> parsers: Simply take a spec (in this case the configs, including the
> > >> fieldTransformations), and delegate a mapping from bytes[] to JSON.
> The
> > >> Jolt library even has an Expression Language (check out
> > >>
> >
>
https://community.hortonworks.com/articles/105965/expression-language-with-jolt-in-apache-nifi.html
> > ),
> > >> so it's not a foreign concept. I believe Simon Ball has already done
> > some
> > >> experimenting around with getting Stellar running in NiFi, and I'd
> > love to
> > >> see Stellar more readily available in NiFi in general.
> > >>
> > >> Re: the ControllerService, I see this as a way to maintain Metron's
> > use of
> > >> ZK as the source of config truth. Users could definitely be using
> NiFi
> > and
> > >> Storm in tandem (parse in NiFi + enrich and index from Storm, for
> > >> example). Using the ControllerService gives us a ZK instance as the
> > single
> > >> source of truth. That way we aren't forcing users to go to two
> > different
> > >> places to manage configs. This also lets us leverage our existing
> > scripts
> > >> and our existing infrastructure around configs and their management
> and
> > >> validation very easily. It also gives users a way to port from NiFi
> to
> > >> Storm or vice-versa without having to migrate configs as well. We
> could
> > >> also provide the option to configure the Processor itself with the
> data
> > >> (just don't set up a controller service and provide the json or
> > whatever as
> > >> one of our properties).
> > >>
> > >> On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <ottobackwa...@gmail.com
> >
> > >> wrote:
> > >>
> > >>> I think this is a good idea. As I mentioned in the other thread
I’ve
> > >>> been doing a lot of work on Nifi recently.
> > >>> I think the important thing is that what is done should be done the
> > NiFi
> > >>> way, not bolting the Metron composition
> > >>> onto Nifi. Think of it like the Tao of Unix, the parsers and
> > components
> > >>> should be single purpose and simple, allowing
> > >>> exceptional flexibility in composition.
> > >>>
> > >>> Comments inline.
> > >>>
> > >>> On August 7, 2018 at 09:27:01, Justin Leet (justinjl...@gmail.com)
> > wrote:
> > >>>
> > >>> Hi all,
> > >>>
> > >>> There's interest in being able to run Metron parsers in NiFi,
rather
> > than
> > >>>
> > >>> inside Storm. I dug into this a bit, and have some thoughts on how
> we
> > >>> could
> > >>> go about this. I'd love feedback on this, along with anything we'd
> > >>> consider must haves as well as future enhancements.
> > >>>
> > >>> 1. Separate metron-parsers into metron-parsers-common and
> metron-storm
> > >>> and create metron-parsers-nifi. For this code to be reusable across
> > >>> platforms (NiFi, Storm, and anything else in the future), we'll
need
> > to
> > >>> decouple our parsers and Storm.
> > >>>
> > >>> +1. The “parsing code” should be a library that implements an
> > interface
> > >>> ( another library ).
> > >>>
> > >>> The Processors and the Storm things can share them.
> > >>>
> > >>> - There's also some nice fringe benefits around refactoring our
code
> > >>> to be substantially more clear and understandable; something
> > >>> which came up
> > >>> while allowing for parser aggregation.
> > >>> 2. Create a MetronProcessor that can run our parsers.
> > >>> - I took a look at how RecordReader could be leveraged (e.g.
> > >>> CSVRecordReader), but this is pretty tightly tied into schemas
> > >>> and is meant
> > >>> to be used by ControllerServices, which are then used by
Processors.
> > >>> There's friction involved there in terms of schemas, but also in
> > terms of
> > >>>
> > >>> access to ZK configs and things like parser chaining. We might
> > >>> be able to
> > >>> leverage it, but it seems like it'd be fairly shoehorned in
> > >>> without getting
> > >>> the schema and other benefits.
> > >>>
> > >>> We won’t have to provide our ‘no schema processors’ ( grok, csv,
> json
> > ).
> > >>>
> > >>> All the remaining processors DO have schemas that we know about. We
> > can
> > >>> just provide the avro schemas the same way we provide the ES
> schemas.
> > >>>
> > >>> The “parsing” should not be conflated with the transform/stellar in
> > >>> NiFi. We should make that separate. Running Stellar over Records
> > would be
> > >>> the best thing.
> > >>>
> > >>> - This Processor would work similarly to Storm: bytes[] in -> JSON
> > >>> out.
> > >>> - There is a Processor
> > >>> <
> > >>>
> >
>
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
> > >>> >
> > >>> that
> > >>> handles loading other JARs that we can model a
> > >>> MetronParserProcessor off of
> > >>> that handles classpath/classloader issues (basically just sets up a
> > >>> classloader specific to what's being loaded and swaps out the
> Thread's
> > >>> loader when it calls to outside resources).
> > >>>
> > >>> There should be no reason to load modules outside the NAR. Why do
> you
> > >>> expect to? If each Metron Processor equiv of a Metron Storm Parser
> is
> > just
> > >>> parsing to json it shouldn’t need much.And we could package them in
> > the
> > >>> NAR. I would suggest we have a Processor per Parser to allow for
> > >>> specialization. It should all be in the nar.
> > >>>
> > >>> The Stellar Processor, if you would support the works would
possibly
> > need
> > >>> this.
> > >>>
> > >>> 3. Create a MetronZkControllerService to supply our configs to our
> > >>> processors.
> > >>> - This is a pretty established NiFi pattern for being able to
> provide
> > >>> access to other services needed by a Processor (e.g. databases or
> > large
> > >>> configurations files).
> > >>> - The same controller service can be used by all Processors to
> manage
> > >>> configs in a consistent manner.
> > >>>
> > >>> I think controller services would make sense where needed, I’m just
> > not
> > >>> sure what you imagine them being needed for?
> > >>>
> > >>> If the user has NiFi, and a Registry etc, are you saying you
imagine
> > them
> > >>> using Metron + ZK to manage configurations? Or to be using BOTH
> storm
> > >>> processors and Nifi Processors?
> > >>>
> > >>> At that point, we can just NAR our controller service and parser
> > processor
> > >>>
> > >>> up as needed, deploy them to NiFi, and let the user provide a
config
> > for
> > >>> where their custom parsers can be provided (i.e. their parser jar).
> > This
> > >>> would be 3 nars (processor, controller-service, and
> > controller-service-api
> > >>>
> > >>> in order to bind the other two together).
> > >>>
> > >>> Once deployed, our ability to use parsers should fit well into the
> > >>> standard
> > >>> NiFi workflow:
> > >>>
> > >>> 1. Create a MetronZkControllerService.
> > >>> 2. Configure the service to point at zookeeper.
> > >>> 3. Create a MetronParser.
> > >>> 4. Configure it to use the controller service + parser jar location
> +
> > >>> any other needed configs.
> > >>> 5. Use the outputs as needed downstream (either writing out to
Kafka
> > or
> > >>> feeding into more MetronParsers, etc.)
> > >>>
> > >>> Chaining parsers should ideally become a matter of chaining
> > MetronParsers
> > >>>
> > >>> (and making sure the enveloping configs carry through properly).
For
> > >>> parser
> > >>> aggregation, I'd just avoid it entirely until we know it's needed
in
> > NiFi.
> > >>>
> > >>> Justin
> >
> > -------------------
> > Thank you,
> >
> > James Sirota
> > PMC- Apache Metron
> > jsirota AT apache DOT org
> >
> >
>

Re: [DISCUSS] Metron Parsers in Nifi

Reply via email to