Good solution. The streaming enrichment writer makes a lot of sense for this, especially if you're not using huge enrichment sources that need the batch based loaders.
As it happens I have written most of a NiFi processor to handle this use case directly - both non-record and Record based, especially for Otto :). The one thing we need to figure out now is where to host that, and how to handle releases of a nifi-metron-bundle. I'll probably get round to putting the code in my github at least in the next few days, while we figure out a more permanent home. Charlie, out of curiosity, what didn't you like about the flatfile loader script? Simon On 12 June 2018 at 18:00, Charles Joynt <charles.jo...@gresearch.co.uk> wrote: > Thanks for the responses. I appreciate the willingness to look at creating > a NiFi processer. That would be great! > > Just to follow up on this (after a week looking after the "ops" side of > dev-ops): I really don't want to have to use the flatfile loader script, > and I'm not going to be able to write a Metron-style HBase key generator > any time soon, but I have had some success with a different approach. > > 1. Generate data in CSV format, e.g. "server.domain.local","A"," > 192.168.0.198" > 2. Send this to a HTTP listener in NiFi > 3. Write to a kafka topic > > I then followed your instructions in this blog: > https://cwiki.apache.org/confluence/display/METRON/ > 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment > > 4. Create a new "dns" sensor in Metron > 5. Use the CSVParser and SimpleHbaseEnrichmentWriter, and parserConfig > settings to push this into HBase: > > { > "parserClassName": "org.apache.metron.parsers.csv.CSVParser", > "writerClassName": "org.apache.metron.enrichment.writer. > SimpleHbaseEnrichmentWriter", > "sensorTopic": "dns", > "parserConfig": { > "shew.table": " dns", > "shew.cf": "dns", > "shew.keyColumns": "name", > "shew.enrichmentType": "dns", > "columns": { > "name": 0, > "type": 1, > "data": 2 > } > }, > } > > And... it seems to be working. At least, I have data in HBase which looks > more like the output of the flatfile loader. > > Charlie > > -----Original Message----- > From: Casey Stella [mailto:ceste...@gmail.com] > Sent: 05 June 2018 14:56 > To: dev@metron.apache.org > Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON > > The problem, as you correctly diagnosed, is the key in HBase. We > construct the key very specifically in Metron, so it's unlikely to work out > of the box with the NiFi processor unfortunately. The key that we use is > formed here in the codebase: > https://github.com/cestella/incubator-metron/blob/master/ > metron-platform/metron-enrichment/src/main/java/org/ > apache/metron/enrichment/converter/EnrichmentKey.java#L51 > > To put that in english, consider the following: > > - type - The enrichment type > - indicator - the indicator to use > - hash(*) - A murmur 3 128bit hash function > > the key is hash(indicator) + type + indicator > > This hash prefixing is a standard practice in hbase key design that allows > the keys to be uniformly distributed among the regions and prevents > hotspotting. Depending on how the PutHBaseJSON processor works, if you can > construct the key and pass it in, then you might be able to either > construct the key in NiFi or write a processor to construct the key. > Ultimately though, what Carolyn said is true..the easiest approach is > probably using the flatfile loader. > If you do get this working in NiFi, however, do please let us know and/or > consider contributing it back to the project as a PR :) > > > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt < > charles.jo...@gresearch.co.uk> > wrote: > > > Hello, > > > > I work as a Dev/Ops Data Engineer within the security team at a > > company in London where we are in the process of implementing Metron. > > I have been tasked with implementing feeds of network environment data > > into HBase so that this data can be used as enrichment sources for our > security events. > > First-off I wanted to pull in DNS data for an internal domain. > > > > I am assuming that I need to write data into HBase in such a way that > > it exactly matches what I would get from the flatfile_loader.sh > > script. A colleague of mine has already loaded some DNS data using > > that script, so I am using that as a reference. > > > > I have implemented a flow in NiFi which takes JSON data from a HTTP > > listener and routes it to a PutHBaseJSON processor. The flow is > > working, in the sense that data is successfully written to HBase, but > > despite (naively) specifying "Row Identifier Encoding Strategy = > > Binary", the results in HBase don't look correct. Comparing the output > > from HBase scan commands I > > see: > > > > flatfile_loader.sh produced: > > > > ROW: > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\ > > x0E192.168.0.198 > > CELL: column=data:v, timestamp=1516896203840, > > value={"clientname":"server.domain.local","clientip":"192.168.0.198"} > > > > PutHBaseJSON produced: > > > > ROW: server.domain.local > > CELL: column=dns:v, timestamp=1527778603783, > > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"} > > > > From source JSON: > > > > > > {"k":"server.domain.local","v":{"name":"server.domain.local","type":"A > > ","data":"192.168.0.198"}} > > > > I know that there are some differences in column family / field names, > > but my worry is the ROW id. Presumably I need to encode my row key, > > "k" in the JSON data, in a way that matches how the flatfile_loader.sh > script did it. > > > > Can anyone explain how I might convert my Id to the correct format? > > -or- > > Does this matter-can Metron use the human-readable ROW ids? > > > > Charlie Joynt > > > > -------------- > > G-RESEARCH believes the information provided herein is reliable. While > > every care has been taken to ensure accuracy, the information is > > furnished to the recipients with no warranty as to the completeness > > and accuracy of its contents and on condition that any errors or > > omissions shall not be made the basis of any claim, demand or cause of > action. > > The information in this email is intended only for the named recipient. > > If you are not the intended recipient please notify us immediately and > > do not copy, distribute or take action based on this e-mail. > > All messages sent to and from this e-mail address will be logged by > > G-RESEARCH and are subject to archival storage, monitoring, review and > > disclosure. > > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, > > Whittington House, 19-30 Alfred Place, London WC1E 7EA. > > Trenchant Limited is a company registered in England with company > > number 08127121. > > -------------- > > > -- -- simon elliston ball @sireb