Agreed….Streaming enrichments is the right solution for DNS data. Do we have a web service for writing enrichments?
Carolyn Duby Solutions Engineer, Northeast cd...@hortonworks.com +1.508.965.0584 Join my team! Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try https://community.hortonworks.com <https://community.hortonworks.com/answers/index.html> On 6/13/18, 6:25 AM, "Charles Joynt" <charles.jo...@gresearch.co.uk> wrote: >Regarding why I didn't choose to load data with the flatfile loader script... > >I want to be able to SEND enrichment data to Metron rather than have to set up >cron jobs to PULL data. At the moment I'm trying to prove that the process >works with a simple data source. In the future we will want enrichment data in >Metron that comes from systems (e.g. HR databases) that I won't have access >to, hence will need someone to be able to send us the data. > >> Carolyn: just call the flat file loader from a script processor... > >I didn't believe that would work in my environment. I'm pretty sure the script >has dependencies on various Metron JARs, not least for the row id hashing >algorithm. I suppose this would require at least a partial install of Metron >alongside NiFi, and would introduce additional work on the NiFi cluster for >any Metron upgrade. In some (enterprise) environments there might be >separation of ownership between NiFi and Metron. > >I also prefer not to have a Java app calling a bash script which calls a new >java process, with logs or error output that might just get swallowed up >invisibly. Somewhere down the line this could hold up effective >troubleshooting. > >> Simon: I have actually written a stellar processor, which applies stellar to >> all FlowFile attributes... > >Gulp. > >> Simon: what didn't you like about the flatfile loader script? > >The flatfile loader script has worked fine for me when prepping enrichment >data in test systems, however it was a bit of a chore to get the JSON >configuration files set up, especially for "wide" data sources that may have >15-20 fields, e.g. Active Directory. > >More broadly speaking, I want to embrace the streaming data paradigm and tried >to avoid batch jobs. With the DNS example, you might imagine a future where >the enrichment data is streamed based on DHCP registrations, DNS update >events, etc. In principle this could reduce the window of time where we might >enrich a data source with out-of-date data. > >Charlie > >-----Original Message----- >From: Carolyn Duby [mailto:cd...@hortonworks.com] >Sent: 12 June 2018 20:33 >To: dev@metron.apache.org >Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON > >I like the streaming enrichment solutions but it depends on how you are >getting the data in. If you get the data in a csv file just call the flat >file loader from a script processor. No special Nifi required. > >If the enrichments don’t arrive in bulk, the streaming solution is better. > >Thanks >Carolyn Duby >Solutions Engineer, Northeast >cd...@hortonworks.com >+1.508.965.0584 > >Join my team! >Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions >Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try >https://community.hortonworks.com ><https://community.hortonworks.com/answers/index.html> > > >On 6/12/18, 1:08 PM, "Simon Elliston Ball" <si...@simonellistonball.com> wrote: > >>Good solution. The streaming enrichment writer makes a lot of sense for >>this, especially if you're not using huge enrichment sources that need >>the batch based loaders. >> >>As it happens I have written most of a NiFi processor to handle this >>use case directly - both non-record and Record based, especially for Otto :). >>The one thing we need to figure out now is where to host that, and how >>to handle releases of a nifi-metron-bundle. I'll probably get round to >>putting the code in my github at least in the next few days, while we >>figure out a more permanent home. >> >>Charlie, out of curiosity, what didn't you like about the flatfile >>loader script? >> >>Simon >> >>On 12 June 2018 at 18:00, Charles Joynt <charles.jo...@gresearch.co.uk> >>wrote: >> >>> Thanks for the responses. I appreciate the willingness to look at >>> creating a NiFi processer. That would be great! >>> >>> Just to follow up on this (after a week looking after the "ops" side >>> of >>> dev-ops): I really don't want to have to use the flatfile loader >>> script, and I'm not going to be able to write a Metron-style HBase >>> key generator any time soon, but I have had some success with a different >>> approach. >>> >>> 1. Generate data in CSV format, e.g. "server.domain.local","A"," >>> 192.168.0.198" >>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic >>> >>> I then followed your instructions in this blog: >>> https://cwiki.apache.org/confluence/display/METRON/ >>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm >>> ent >>> >>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and >>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this >>> into HBase: >>> >>> { >>> "parserClassName": "org.apache.metron.parsers.csv.CSVParser", >>> "writerClassName": "org.apache.metron.enrichment.writer. >>> SimpleHbaseEnrichmentWriter", >>> "sensorTopic": "dns", >>> "parserConfig": { >>> "shew.table": " dns", >>> "shew.cf": "dns", >>> "shew.keyColumns": "name", >>> "shew.enrichmentType": "dns", >>> "columns": { >>> "name": 0, >>> "type": 1, >>> "data": 2 >>> } >>> }, >>> } >>> >>> And... it seems to be working. At least, I have data in HBase which >>> looks more like the output of the flatfile loader. >>> >>> Charlie >>> >>> -----Original Message----- >>> From: Casey Stella [mailto:ceste...@gmail.com] >>> Sent: 05 June 2018 14:56 >>> To: dev@metron.apache.org >>> Subject: Re: Writing enrichment data directly from NiFi with >>> PutHBaseJSON >>> >>> The problem, as you correctly diagnosed, is the key in HBase. We >>> construct the key very specifically in Metron, so it's unlikely to >>> work out of the box with the NiFi processor unfortunately. The key >>> that we use is formed here in the codebase: >>> https://github.com/cestella/incubator-metron/blob/master/ >>> metron-platform/metron-enrichment/src/main/java/org/ >>> apache/metron/enrichment/converter/EnrichmentKey.java#L51 >>> >>> To put that in english, consider the following: >>> >>> - type - The enrichment type >>> - indicator - the indicator to use >>> - hash(*) - A murmur 3 128bit hash function >>> >>> the key is hash(indicator) + type + indicator >>> >>> This hash prefixing is a standard practice in hbase key design that >>> allows the keys to be uniformly distributed among the regions and >>> prevents hotspotting. Depending on how the PutHBaseJSON processor >>> works, if you can construct the key and pass it in, then you might be >>> able to either construct the key in NiFi or write a processor to construct >>> the key. >>> Ultimately though, what Carolyn said is true..the easiest approach is >>> probably using the flatfile loader. >>> If you do get this working in NiFi, however, do please let us know >>> and/or consider contributing it back to the project as a PR :) >>> >>> >>> >>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt < >>> charles.jo...@gresearch.co.uk> >>> wrote: >>> >>> > Hello, >>> > >>> > I work as a Dev/Ops Data Engineer within the security team at a >>> > company in London where we are in the process of implementing Metron. >>> > I have been tasked with implementing feeds of network environment >>> > data into HBase so that this data can be used as enrichment sources >>> > for our >>> security events. >>> > First-off I wanted to pull in DNS data for an internal domain. >>> > >>> > I am assuming that I need to write data into HBase in such a way >>> > that it exactly matches what I would get from the >>> > flatfile_loader.sh script. A colleague of mine has already loaded >>> > some DNS data using that script, so I am using that as a reference. >>> > >>> > I have implemented a flow in NiFi which takes JSON data from a HTTP >>> > listener and routes it to a PutHBaseJSON processor. The flow is >>> > working, in the sense that data is successfully written to HBase, >>> > but despite (naively) specifying "Row Identifier Encoding Strategy >>> > = Binary", the results in HBase don't look correct. Comparing the >>> > output from HBase scan commands I >>> > see: >>> > >>> > flatfile_loader.sh produced: >>> > >>> > ROW: >>> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x >>> > 00\ >>> > x0E192.168.0.198 >>> > CELL: column=data:v, timestamp=1516896203840, >>> > value={"clientname":"server.domain.local","clientip":"192.168.0.198 >>> > "} >>> > >>> > PutHBaseJSON produced: >>> > >>> > ROW: server.domain.local >>> > CELL: column=dns:v, timestamp=1527778603783, >>> > value={"name":"server.domain.local","type":"A","data":"192.168.0.19 >>> > 8"} >>> > >>> > From source JSON: >>> > >>> > >>> > {"k":"server.domain.local","v":{"name":"server.domain.local","type" >>> > :"A >>> > ","data":"192.168.0.198"}} >>> > >>> > I know that there are some differences in column family / field >>> > names, but my worry is the ROW id. Presumably I need to encode my >>> > row key, "k" in the JSON data, in a way that matches how the >>> > flatfile_loader.sh >>> script did it. >>> > >>> > Can anyone explain how I might convert my Id to the correct format? >>> > -or- >>> > Does this matter-can Metron use the human-readable ROW ids? >>> > >>> > Charlie Joynt >>> > >>> > -------------- >>> > G-RESEARCH believes the information provided herein is reliable. >>> > While every care has been taken to ensure accuracy, the information >>> > is furnished to the recipients with no warranty as to the >>> > completeness and accuracy of its contents and on condition that any >>> > errors or omissions shall not be made the basis of any claim, >>> > demand or cause of >>> action. >>> > The information in this email is intended only for the named recipient. >>> > If you are not the intended recipient please notify us immediately >>> > and do not copy, distribute or take action based on this e-mail. >>> > All messages sent to and from this e-mail address will be logged by >>> > G-RESEARCH and are subject to archival storage, monitoring, review >>> > and disclosure. >>> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, >>> > Whittington House, 19-30 Alfred Place, London WC1E 7EA. >>> > Trenchant Limited is a company registered in England with company >>> > number 08127121. >>> > -------------- >>> > >>> >> >> >> >>-- >>-- >>simon elliston ball >>@sireb