Re: [DISCUSS] Metron Parsers in Nifi

James Sirota Wed, 08 Aug 2018 08:57:29 -0700

Integration with NiFi would be useful for parsing low-volume telemetries at the 
edge.  This is a much more resource friendly way to do it than setting up 
dedicated storm topologies.  The integration would be that the NiFi processor 
parses the data and pushes it straight into the enrichment topic, saving us the 
resources of having multiple parsers in storm


Thanks,
James 

07.08.2018, 11:29, "Otto Fowler" <[email protected]>:
> Why do we start over. We are going back and forth on implementation, and I
> don’t think we have the same goals or concerns.
>
> What would be the requirements or goals of metron integration with Nifi?
> How many levels or options for integration do we have?
> What are the approaches to choose from?
> Who are the target users?
>
> On August 7, 2018 at 12:24:56, Justin Leet ([email protected]) wrote:
>
> So how does the MetronRecordReader roll into everything? It seems like it'd
> be more useful on the reader per format approach, but otherwise it doesn't
> really seem like we gain much, and it requires getting everything linked up
> properly to be used. Assuming we looked at doing it that way, is the idea
> that we'd setup a ControllerService with the MetronRecordReader and a
> MetronRecordWriter and then have the StellarTransformRecord processor
> configured with those ControllerServices? How do we manage the
> configurations of the everything that way? How does the ControllerService
> get configured with whatever parser(s) are needed in the flow? Basically,
> what's your vision for how everything would tie together?
>
> I also forgot to mention this in the original writeup, but there's another
> reason to avoid the RecordReader: It's not considered stable. See
> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java#L34.
> That alone makes me super hesitant to use it, if it can shift out from
> under us in even in incremental version.
>
> I'm also unclear on why StellarTransformRecord processor matters for either
> approach. With the Processor approach you could simply follow it up with
> the Stellar processor, the same way you'd would in the RecordReader
> approach. The Stellar processor should be a parallel improvement, not a
> conflicting one.
>
> On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <[email protected]> wrote:
>
>>  A Metron Processor itself isn’t really necessary. A MetronRecordReader (
>>  either the megalithic or a reader per format ) would be a good approach.
>>  Then have StellarTransformRecord processor that can do Stellar on _any_
>>  record, regardless of source.
>>
>>  On August 7, 2018 at 11:06:22, Justin Leet ([email protected]) wrote:
>>
>>  Thanks for the comments, Otto, this is definitely great feedback. I'd
>>  love to respond inline, but the email's already starting to lose it's
>>  formatting, so I'll go with the classic "wall of text". Let me know if I
>>  didn't address everything.
>>
>>  Loading modules (or jars or whatever) outside of our Processor gives us
>>  the benefit of making it incredibly easy for a users to create their own
>>  parsers. I would definitely expect our own bundled parsers to be included
>>  in our base NAR, but loading modules enables users to only have to learn
>>  how Metron wants our stuff lined up and just plug it in. Having said that,
>>  I could see having a wrapper for our bundled parsers that makes it really
>>  easy to just say you want an MetronAsaParser or MetronBroParser, etc. That
>>  would give us the best of both worlds, where it's easy to get setup our
>>  bundled parsers and also trivial to pull in non-bundled parsers. What
>>  doing this gives us is an easy way to support (hopefully) every parser that
>>  gets made, right out of the box, without us needing to build a specialized
>>  version of everything until we decide to and without users having to jump
>>  through hoops.
>>
>>  None of this prevents anyone from creating specialized parsers (for perf
>>  reasons, or to use the schema registries, or anything else). It's probably
>>  worthwhile to package up some of built-in parsers and customize them to use
>>  more specialized feature appropriately as we see things get used in the
>>  wild. Like you said, we could likely provide Avro schemas for some of this
>>  and give users a more robust experience on what we choose to support and
>>  provide guidance for other things. I'm also worried that building
>>  specialized schemas becomes problematic for things like parser chaining
>>  (where our routers wrap the underlying messages and add on their own info).
>>  Going down that road potentially requires anything wrapped to have a
>>  specialized schema for the wrapped version in addition to a vanilla version
>>  (although please correct me if I'm missing something there, I'll openly
>>  admit to some shakiness on how that would be handled).
>>
>>  I also disagree that this is un-Nifi-like, although I'm admittedly not as
>>  skilled there. The basis for doing this is directly inspired by the
>>  JoltTransformer, which is extremely similar to the proposed setup for our
>>  parsers: Simply take a spec (in this case the configs, including the
>>  fieldTransformations), and delegate a mapping from bytes[] to JSON. The
>>  Jolt library even has an Expression Language (check out
>>  
>> https://community.hortonworks.com/articles/105965/expression-language-with-jolt-in-apache-nifi.html),
>>  so it's not a foreign concept. I believe Simon Ball has already done some
>>  experimenting around with getting Stellar running in NiFi, and I'd love to
>>  see Stellar more readily available in NiFi in general.
>>
>>  Re: the ControllerService, I see this as a way to maintain Metron's use of
>>  ZK as the source of config truth. Users could definitely be using NiFi and
>>  Storm in tandem (parse in NiFi + enrich and index from Storm, for
>>  example). Using the ControllerService gives us a ZK instance as the single
>>  source of truth. That way we aren't forcing users to go to two different
>>  places to manage configs. This also lets us leverage our existing scripts
>>  and our existing infrastructure around configs and their management and
>>  validation very easily. It also gives users a way to port from NiFi to
>>  Storm or vice-versa without having to migrate configs as well. We could
>>  also provide the option to configure the Processor itself with the data
>>  (just don't set up a controller service and provide the json or whatever as
>>  one of our properties).
>>
>>  On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <[email protected]>
>>  wrote:
>>
>>>  I think this is a good idea. As I mentioned in the other thread I’ve
>>>  been doing a lot of work on Nifi recently.
>>>  I think the important thing is that what is done should be done the NiFi
>>>  way, not bolting the Metron composition
>>>  onto Nifi. Think of it like the Tao of Unix, the parsers and components
>>>  should be single purpose and simple, allowing
>>>  exceptional flexibility in composition.
>>>
>>>  Comments inline.
>>>
>>>  On August 7, 2018 at 09:27:01, Justin Leet ([email protected]) wrote:
>>>
>>>  Hi all,
>>>
>>>  There's interest in being able to run Metron parsers in NiFi, rather than
>>>
>>>  inside Storm. I dug into this a bit, and have some thoughts on how we
>>>  could
>>>  go about this. I'd love feedback on this, along with anything we'd
>>>  consider must haves as well as future enhancements.
>>>
>>>  1. Separate metron-parsers into metron-parsers-common and metron-storm
>>>  and create metron-parsers-nifi. For this code to be reusable across
>>>  platforms (NiFi, Storm, and anything else in the future), we'll need to
>>>  decouple our parsers and Storm.
>>>
>>>  +1. The “parsing code” should be a library that implements an interface
>>>  ( another library ).
>>>
>>>  The Processors and the Storm things can share them.
>>>
>>>  - There's also some nice fringe benefits around refactoring our code
>>>  to be substantially more clear and understandable; something
>>>  which came up
>>>  while allowing for parser aggregation.
>>>  2. Create a MetronProcessor that can run our parsers.
>>>  - I took a look at how RecordReader could be leveraged (e.g.
>>>  CSVRecordReader), but this is pretty tightly tied into schemas
>>>  and is meant
>>>  to be used by ControllerServices, which are then used by Processors.
>>>  There's friction involved there in terms of schemas, but also in terms of
>>>
>>>  access to ZK configs and things like parser chaining. We might
>>>  be able to
>>>  leverage it, but it seems like it'd be fairly shoehorned in
>>>  without getting
>>>  the schema and other benefits.
>>>
>>>  We won’t have to provide our ‘no schema processors’ ( grok, csv, json ).
>>>
>>>  All the remaining processors DO have schemas that we know about. We can
>>>  just provide the avro schemas the same way we provide the ES schemas.
>>>
>>>  The “parsing” should not be conflated with the transform/stellar in
>>>  NiFi. We should make that separate. Running Stellar over Records would be
>>>  the best thing.
>>>
>>>  - This Processor would work similarly to Storm: bytes[] in -> JSON
>>>  out.
>>>  - There is a Processor
>>>  <
>>>  
>>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
>>>  >
>>>  that
>>>  handles loading other JARs that we can model a
>>>  MetronParserProcessor off of
>>>  that handles classpath/classloader issues (basically just sets up a
>>>  classloader specific to what's being loaded and swaps out the Thread's
>>>  loader when it calls to outside resources).
>>>
>>>  There should be no reason to load modules outside the NAR. Why do you
>>>  expect to? If each Metron Processor equiv of a Metron Storm Parser is just
>>>  parsing to json it shouldn’t need much.And we could package them in the
>>>  NAR. I would suggest we have a Processor per Parser to allow for
>>>  specialization. It should all be in the nar.
>>>
>>>  The Stellar Processor, if you would support the works would possibly need
>>>  this.
>>>
>>>  3. Create a MetronZkControllerService to supply our configs to our
>>>  processors.
>>>  - This is a pretty established NiFi pattern for being able to provide
>>>  access to other services needed by a Processor (e.g. databases or large
>>>  configurations files).
>>>  - The same controller service can be used by all Processors to manage
>>>  configs in a consistent manner.
>>>
>>>  I think controller services would make sense where needed, I’m just not
>>>  sure what you imagine them being needed for?
>>>
>>>  If the user has NiFi, and a Registry etc, are you saying you imagine them
>>>  using Metron + ZK to manage configurations? Or to be using BOTH storm
>>>  processors and Nifi Processors?
>>>
>>>  At that point, we can just NAR our controller service and parser processor
>>>
>>>  up as needed, deploy them to NiFi, and let the user provide a config for
>>>  where their custom parsers can be provided (i.e. their parser jar). This
>>>  would be 3 nars (processor, controller-service, and controller-service-api
>>>
>>>  in order to bind the other two together).
>>>
>>>  Once deployed, our ability to use parsers should fit well into the
>>>  standard
>>>  NiFi workflow:
>>>
>>>  1. Create a MetronZkControllerService.
>>>  2. Configure the service to point at zookeeper.
>>>  3. Create a MetronParser.
>>>  4. Configure it to use the controller service + parser jar location +
>>>  any other needed configs.
>>>  5. Use the outputs as needed downstream (either writing out to Kafka or
>>>  feeding into more MetronParsers, etc.)
>>>
>>>  Chaining parsers should ideally become a matter of chaining MetronParsers
>>>
>>>  (and making sure the enveloping configs carry through properly). For
>>>  parser
>>>  aggregation, I'd just avoid it entirely until we know it's needed in NiFi.
>>>
>>>  Justin

------------------- 
Thank you,

James Sirota
PMC- Apache Metron
jsirota AT apache DOT org

Re: [DISCUSS] Metron Parsers in Nifi

Reply via email to