I'll add onto Mike's discussion with the original set of requirements I had in mind (and apply feedback on these as necessary!). This is largely overlap with what Mike said, but I want to make sure it's clear where my proposal was coming from, so we can improve on it as needed. James and Mike are also right, I think I skipped over the benefits of NiFi in general a bit, so thanks for chiming in there.
- Deploy our bundled parsers without needing custom wrapping on all of them. - Don't prevent ourselves from building custom wrapping as needed. - Custom Java parsers with an easy way to hook in, similar to what we already do in Storm. - One stop (or at least one format) configuration, for the case when we're doing some thing in NiFi (parsers) and some elsewhere (enrichment and indexing). I don't think it'll always be "start in NiFi, end in Storm", especially as we build out Stellar capability, but I also don't want users learning a different set of configs and config tools for every platform we run on. - Ability to build out parsers and other systems fairly easily, e.g. Spark. - Support our current use cases (in particular parser chaining as a more advanced use case). It really boils down to providing a relatively simple user path to be able to migrate to NiFi as needed or desired as simply as possible in a very general way, while not preventing parser by parser enhancements. On Wed, Aug 8, 2018 at 7:14 PM Michael Miklavcic < michael.miklav...@gmail.com> wrote: > I think it also provides customers greater control over their architecture > by giving them the flexibility to choose where/how to host their parsers. > > To Justin's point about the API, my biggest concern about the RecordReader > approach is that it is not stable. We already have a similar problem in > having the TransportClient in ElasticSearch - they are prone to changing it > in minor versions with the advent of their newer REST API, which is > problematic for ensuring a stable installation. > > From my own perspective, our goal with NiFi, at least in part, should be > the ability to deploy our core parsing infrastructure, i.e. > > - pre-built parsers > - custom java parsers > - Stellar transforms > - custom stellar transforms > > And have the ability to configure it similarly to how we configure parsers > within Storm. Consistent with our recent parser chaining and aggregation > feature, users should be able to construct and deploy similar constructs in > NiFi. The core architectural shift would be that parser code should be > platform agnostic. We provide the plumbing in Storm, NiFi, and <Spark > Streaming?, other> and platform architects and devops teams can choose how > and where to deploy. > > Best, > Mike > > > On Wed, Aug 8, 2018 at 9:57 AM James Sirota <jsir...@apache.org> wrote: > > > Integration with NiFi would be useful for parsing low-volume telemetries > > at the edge. This is a much more resource friendly way to do it than > > setting up dedicated storm topologies. The integration would be that the > > NiFi processor parses the data and pushes it straight into the enrichment > > topic, saving us the resources of having multiple parsers in storm > > > > Thanks, > > James > > > > 07.08.2018, 11:29, "Otto Fowler" <ottobackwa...@gmail.com>: > > > Why do we start over. We are going back and forth on implementation, > and > > I > > > don’t think we have the same goals or concerns. > > > > > > What would be the requirements or goals of metron integration with > Nifi? > > > How many levels or options for integration do we have? > > > What are the approaches to choose from? > > > Who are the target users? > > > > > > On August 7, 2018 at 12:24:56, Justin Leet (justinjl...@gmail.com) > > wrote: > > > > > > So how does the MetronRecordReader roll into everything? It seems like > > it'd > > > be more useful on the reader per format approach, but otherwise it > > doesn't > > > really seem like we gain much, and it requires getting everything > linked > > up > > > properly to be used. Assuming we looked at doing it that way, is the > idea > > > that we'd setup a ControllerService with the MetronRecordReader and a > > > MetronRecordWriter and then have the StellarTransformRecord processor > > > configured with those ControllerServices? How do we manage the > > > configurations of the everything that way? How does the > ControllerService > > > get configured with whatever parser(s) are needed in the flow? > Basically, > > > what's your vision for how everything would tie together? > > > > > > I also forgot to mention this in the original writeup, but there's > > another > > > reason to avoid the RecordReader: It's not considered stable. See > > > > > > https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java#L34 > > . > > > That alone makes me super hesitant to use it, if it can shift out from > > > under us in even in incremental version. > > > > > > I'm also unclear on why StellarTransformRecord processor matters for > > either > > > approach. With the Processor approach you could simply follow it up > with > > > the Stellar processor, the same way you'd would in the RecordReader > > > approach. The Stellar processor should be a parallel improvement, not a > > > conflicting one. > > > > > > On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <ottobackwa...@gmail.com> > > wrote: > > > > > >> A Metron Processor itself isn’t really necessary. A > MetronRecordReader > > ( > > >> either the megalithic or a reader per format ) would be a good > > approach. > > >> Then have StellarTransformRecord processor that can do Stellar on > _any_ > > >> record, regardless of source. > > >> > > >> On August 7, 2018 at 11:06:22, Justin Leet (justinjl...@gmail.com) > > wrote: > > >> > > >> Thanks for the comments, Otto, this is definitely great feedback. I'd > > >> love to respond inline, but the email's already starting to lose it's > > >> formatting, so I'll go with the classic "wall of text". Let me know > if > > I > > >> didn't address everything. > > >> > > >> Loading modules (or jars or whatever) outside of our Processor gives > us > > >> the benefit of making it incredibly easy for a users to create their > > own > > >> parsers. I would definitely expect our own bundled parsers to be > > included > > >> in our base NAR, but loading modules enables users to only have to > > learn > > >> how Metron wants our stuff lined up and just plug it in. Having said > > that, > > >> I could see having a wrapper for our bundled parsers that makes it > > really > > >> easy to just say you want an MetronAsaParser or MetronBroParser, etc. > > That > > >> would give us the best of both worlds, where it's easy to get setup > our > > >> bundled parsers and also trivial to pull in non-bundled parsers. What > > >> doing this gives us is an easy way to support (hopefully) every > parser > > that > > >> gets made, right out of the box, without us needing to build a > > specialized > > >> version of everything until we decide to and without users having to > > jump > > >> through hoops. > > >> > > >> None of this prevents anyone from creating specialized parsers (for > > perf > > >> reasons, or to use the schema registries, or anything else). It's > > probably > > >> worthwhile to package up some of built-in parsers and customize them > > to use > > >> more specialized feature appropriately as we see things get used in > the > > >> wild. Like you said, we could likely provide Avro schemas for some of > > this > > >> and give users a more robust experience on what we choose to support > > and > > >> provide guidance for other things. I'm also worried that building > > >> specialized schemas becomes problematic for things like parser > chaining > > >> (where our routers wrap the underlying messages and add on their own > > info). > > >> Going down that road potentially requires anything wrapped to have a > > >> specialized schema for the wrapped version in addition to a vanilla > > version > > >> (although please correct me if I'm missing something there, I'll > openly > > >> admit to some shakiness on how that would be handled). > > >> > > >> I also disagree that this is un-Nifi-like, although I'm admittedly > not > > as > > >> skilled there. The basis for doing this is directly inspired by the > > >> JoltTransformer, which is extremely similar to the proposed setup for > > our > > >> parsers: Simply take a spec (in this case the configs, including the > > >> fieldTransformations), and delegate a mapping from bytes[] to JSON. > The > > >> Jolt library even has an Expression Language (check out > > >> > > > https://community.hortonworks.com/articles/105965/expression-language-with-jolt-in-apache-nifi.html > > ), > > >> so it's not a foreign concept. I believe Simon Ball has already done > > some > > >> experimenting around with getting Stellar running in NiFi, and I'd > > love to > > >> see Stellar more readily available in NiFi in general. > > >> > > >> Re: the ControllerService, I see this as a way to maintain Metron's > > use of > > >> ZK as the source of config truth. Users could definitely be using > NiFi > > and > > >> Storm in tandem (parse in NiFi + enrich and index from Storm, for > > >> example). Using the ControllerService gives us a ZK instance as the > > single > > >> source of truth. That way we aren't forcing users to go to two > > different > > >> places to manage configs. This also lets us leverage our existing > > scripts > > >> and our existing infrastructure around configs and their management > and > > >> validation very easily. It also gives users a way to port from NiFi > to > > >> Storm or vice-versa without having to migrate configs as well. We > could > > >> also provide the option to configure the Processor itself with the > data > > >> (just don't set up a controller service and provide the json or > > whatever as > > >> one of our properties). > > >> > > >> On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <ottobackwa...@gmail.com > > > > >> wrote: > > >> > > >>> I think this is a good idea. As I mentioned in the other thread I’ve > > >>> been doing a lot of work on Nifi recently. > > >>> I think the important thing is that what is done should be done the > > NiFi > > >>> way, not bolting the Metron composition > > >>> onto Nifi. Think of it like the Tao of Unix, the parsers and > > components > > >>> should be single purpose and simple, allowing > > >>> exceptional flexibility in composition. > > >>> > > >>> Comments inline. > > >>> > > >>> On August 7, 2018 at 09:27:01, Justin Leet (justinjl...@gmail.com) > > wrote: > > >>> > > >>> Hi all, > > >>> > > >>> There's interest in being able to run Metron parsers in NiFi, rather > > than > > >>> > > >>> inside Storm. I dug into this a bit, and have some thoughts on how > we > > >>> could > > >>> go about this. I'd love feedback on this, along with anything we'd > > >>> consider must haves as well as future enhancements. > > >>> > > >>> 1. Separate metron-parsers into metron-parsers-common and > metron-storm > > >>> and create metron-parsers-nifi. For this code to be reusable across > > >>> platforms (NiFi, Storm, and anything else in the future), we'll need > > to > > >>> decouple our parsers and Storm. > > >>> > > >>> +1. The “parsing code” should be a library that implements an > > interface > > >>> ( another library ). > > >>> > > >>> The Processors and the Storm things can share them. > > >>> > > >>> - There's also some nice fringe benefits around refactoring our code > > >>> to be substantially more clear and understandable; something > > >>> which came up > > >>> while allowing for parser aggregation. > > >>> 2. Create a MetronProcessor that can run our parsers. > > >>> - I took a look at how RecordReader could be leveraged (e.g. > > >>> CSVRecordReader), but this is pretty tightly tied into schemas > > >>> and is meant > > >>> to be used by ControllerServices, which are then used by Processors. > > >>> There's friction involved there in terms of schemas, but also in > > terms of > > >>> > > >>> access to ZK configs and things like parser chaining. We might > > >>> be able to > > >>> leverage it, but it seems like it'd be fairly shoehorned in > > >>> without getting > > >>> the schema and other benefits. > > >>> > > >>> We won’t have to provide our ‘no schema processors’ ( grok, csv, > json > > ). > > >>> > > >>> All the remaining processors DO have schemas that we know about. We > > can > > >>> just provide the avro schemas the same way we provide the ES > schemas. > > >>> > > >>> The “parsing” should not be conflated with the transform/stellar in > > >>> NiFi. We should make that separate. Running Stellar over Records > > would be > > >>> the best thing. > > >>> > > >>> - This Processor would work similarly to Storm: bytes[] in -> JSON > > >>> out. > > >>> - There is a Processor > > >>> < > > >>> > > > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java > > >>> > > > >>> that > > >>> handles loading other JARs that we can model a > > >>> MetronParserProcessor off of > > >>> that handles classpath/classloader issues (basically just sets up a > > >>> classloader specific to what's being loaded and swaps out the > Thread's > > >>> loader when it calls to outside resources). > > >>> > > >>> There should be no reason to load modules outside the NAR. Why do > you > > >>> expect to? If each Metron Processor equiv of a Metron Storm Parser > is > > just > > >>> parsing to json it shouldn’t need much.And we could package them in > > the > > >>> NAR. I would suggest we have a Processor per Parser to allow for > > >>> specialization. It should all be in the nar. > > >>> > > >>> The Stellar Processor, if you would support the works would possibly > > need > > >>> this. > > >>> > > >>> 3. Create a MetronZkControllerService to supply our configs to our > > >>> processors. > > >>> - This is a pretty established NiFi pattern for being able to > provide > > >>> access to other services needed by a Processor (e.g. databases or > > large > > >>> configurations files). > > >>> - The same controller service can be used by all Processors to > manage > > >>> configs in a consistent manner. > > >>> > > >>> I think controller services would make sense where needed, I’m just > > not > > >>> sure what you imagine them being needed for? > > >>> > > >>> If the user has NiFi, and a Registry etc, are you saying you imagine > > them > > >>> using Metron + ZK to manage configurations? Or to be using BOTH > storm > > >>> processors and Nifi Processors? > > >>> > > >>> At that point, we can just NAR our controller service and parser > > processor > > >>> > > >>> up as needed, deploy them to NiFi, and let the user provide a config > > for > > >>> where their custom parsers can be provided (i.e. their parser jar). > > This > > >>> would be 3 nars (processor, controller-service, and > > controller-service-api > > >>> > > >>> in order to bind the other two together). > > >>> > > >>> Once deployed, our ability to use parsers should fit well into the > > >>> standard > > >>> NiFi workflow: > > >>> > > >>> 1. Create a MetronZkControllerService. > > >>> 2. Configure the service to point at zookeeper. > > >>> 3. Create a MetronParser. > > >>> 4. Configure it to use the controller service + parser jar location > + > > >>> any other needed configs. > > >>> 5. Use the outputs as needed downstream (either writing out to Kafka > > or > > >>> feeding into more MetronParsers, etc.) > > >>> > > >>> Chaining parsers should ideally become a matter of chaining > > MetronParsers > > >>> > > >>> (and making sure the enveloping configs carry through properly). For > > >>> parser > > >>> aggregation, I'd just avoid it entirely until we know it's needed in > > NiFi. > > >>> > > >>> Justin > > > > ------------------- > > Thank you, > > > > James Sirota > > PMC- Apache Metron > > jsirota AT apache DOT org > > > > >