I like the streaming enrichment solutions but it depends on how you are getting the data in. If you get the data in a csv file just call the flat file loader from a script processor. No special Nifi required.
If the enrichments don’t arrive in bulk, the streaming solution is better. Thanks Carolyn Duby Solutions Engineer, Northeast cd...@hortonworks.com +1.508.965.0584 Join my team! Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try https://community.hortonworks.com <https://community.hortonworks.com/answers/index.html> On 6/12/18, 1:08 PM, "Simon Elliston Ball" <si...@simonellistonball.com> wrote: >Good solution. The streaming enrichment writer makes a lot of sense for >this, especially if you're not using huge enrichment sources that need the >batch based loaders. > >As it happens I have written most of a NiFi processor to handle this use >case directly - both non-record and Record based, especially for Otto :). >The one thing we need to figure out now is where to host that, and how to >handle releases of a nifi-metron-bundle. I'll probably get round to putting >the code in my github at least in the next few days, while we figure out a >more permanent home. > >Charlie, out of curiosity, what didn't you like about the flatfile loader >script? > >Simon > >On 12 June 2018 at 18:00, Charles Joynt <charles.jo...@gresearch.co.uk> >wrote: > >> Thanks for the responses. I appreciate the willingness to look at creating >> a NiFi processer. That would be great! >> >> Just to follow up on this (after a week looking after the "ops" side of >> dev-ops): I really don't want to have to use the flatfile loader script, >> and I'm not going to be able to write a Metron-style HBase key generator >> any time soon, but I have had some success with a different approach. >> >> 1. Generate data in CSV format, e.g. "server.domain.local","A"," >> 192.168.0.198" >> 2. Send this to a HTTP listener in NiFi >> 3. Write to a kafka topic >> >> I then followed your instructions in this blog: >> https://cwiki.apache.org/confluence/display/METRON/ >> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment >> >> 4. Create a new "dns" sensor in Metron >> 5. Use the CSVParser and SimpleHbaseEnrichmentWriter, and parserConfig >> settings to push this into HBase: >> >> { >> "parserClassName": "org.apache.metron.parsers.csv.CSVParser", >> "writerClassName": "org.apache.metron.enrichment.writer. >> SimpleHbaseEnrichmentWriter", >> "sensorTopic": "dns", >> "parserConfig": { >> "shew.table": " dns", >> "shew.cf": "dns", >> "shew.keyColumns": "name", >> "shew.enrichmentType": "dns", >> "columns": { >> "name": 0, >> "type": 1, >> "data": 2 >> } >> }, >> } >> >> And... it seems to be working. At least, I have data in HBase which looks >> more like the output of the flatfile loader. >> >> Charlie >> >> -----Original Message----- >> From: Casey Stella [mailto:ceste...@gmail.com] >> Sent: 05 June 2018 14:56 >> To: dev@metron.apache.org >> Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON >> >> The problem, as you correctly diagnosed, is the key in HBase. We >> construct the key very specifically in Metron, so it's unlikely to work out >> of the box with the NiFi processor unfortunately. The key that we use is >> formed here in the codebase: >> https://github.com/cestella/incubator-metron/blob/master/ >> metron-platform/metron-enrichment/src/main/java/org/ >> apache/metron/enrichment/converter/EnrichmentKey.java#L51 >> >> To put that in english, consider the following: >> >> - type - The enrichment type >> - indicator - the indicator to use >> - hash(*) - A murmur 3 128bit hash function >> >> the key is hash(indicator) + type + indicator >> >> This hash prefixing is a standard practice in hbase key design that allows >> the keys to be uniformly distributed among the regions and prevents >> hotspotting. Depending on how the PutHBaseJSON processor works, if you can >> construct the key and pass it in, then you might be able to either >> construct the key in NiFi or write a processor to construct the key. >> Ultimately though, what Carolyn said is true..the easiest approach is >> probably using the flatfile loader. >> If you do get this working in NiFi, however, do please let us know and/or >> consider contributing it back to the project as a PR :) >> >> >> >> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt < >> charles.jo...@gresearch.co.uk> >> wrote: >> >> > Hello, >> > >> > I work as a Dev/Ops Data Engineer within the security team at a >> > company in London where we are in the process of implementing Metron. >> > I have been tasked with implementing feeds of network environment data >> > into HBase so that this data can be used as enrichment sources for our >> security events. >> > First-off I wanted to pull in DNS data for an internal domain. >> > >> > I am assuming that I need to write data into HBase in such a way that >> > it exactly matches what I would get from the flatfile_loader.sh >> > script. A colleague of mine has already loaded some DNS data using >> > that script, so I am using that as a reference. >> > >> > I have implemented a flow in NiFi which takes JSON data from a HTTP >> > listener and routes it to a PutHBaseJSON processor. The flow is >> > working, in the sense that data is successfully written to HBase, but >> > despite (naively) specifying "Row Identifier Encoding Strategy = >> > Binary", the results in HBase don't look correct. Comparing the output >> > from HBase scan commands I >> > see: >> > >> > flatfile_loader.sh produced: >> > >> > ROW: >> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\ >> > x0E192.168.0.198 >> > CELL: column=data:v, timestamp=1516896203840, >> > value={"clientname":"server.domain.local","clientip":"192.168.0.198"} >> > >> > PutHBaseJSON produced: >> > >> > ROW: server.domain.local >> > CELL: column=dns:v, timestamp=1527778603783, >> > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"} >> > >> > From source JSON: >> > >> > >> > {"k":"server.domain.local","v":{"name":"server.domain.local","type":"A >> > ","data":"192.168.0.198"}} >> > >> > I know that there are some differences in column family / field names, >> > but my worry is the ROW id. Presumably I need to encode my row key, >> > "k" in the JSON data, in a way that matches how the flatfile_loader.sh >> script did it. >> > >> > Can anyone explain how I might convert my Id to the correct format? >> > -or- >> > Does this matter-can Metron use the human-readable ROW ids? >> > >> > Charlie Joynt >> > >> > -------------- >> > G-RESEARCH believes the information provided herein is reliable. While >> > every care has been taken to ensure accuracy, the information is >> > furnished to the recipients with no warranty as to the completeness >> > and accuracy of its contents and on condition that any errors or >> > omissions shall not be made the basis of any claim, demand or cause of >> action. >> > The information in this email is intended only for the named recipient. >> > If you are not the intended recipient please notify us immediately and >> > do not copy, distribute or take action based on this e-mail. >> > All messages sent to and from this e-mail address will be logged by >> > G-RESEARCH and are subject to archival storage, monitoring, review and >> > disclosure. >> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, >> > Whittington House, 19-30 Alfred Place, London WC1E 7EA. >> > Trenchant Limited is a company registered in England with company >> > number 08127121. >> > -------------- >> > >> > > > >-- >-- >simon elliston ball >@sireb