Do we even have a jira? If not maybe Carolyn et. al. can write one up that lays out some requirements and context.
On June 13, 2018 at 10:04:27, Casey Stella (ceste...@gmail.com) wrote: no, sadly we do not. On Wed, Jun 13, 2018 at 10:01 AM Carolyn Duby <cd...@hortonworks.com> wrote: > Agreed….Streaming enrichments is the right solution for DNS data. > > Do we have a web service for writing enrichments? > > Carolyn Duby > Solutions Engineer, Northeast > cd...@hortonworks.com > +1.508.965.0584 > > Join my team! > Enterprise Account Manager – Boston - http://grnh.se/wepchv1 > Solutions Engineer – Boston - http://grnh.se/8gbxy41 > Need Answers? Try https://community.hortonworks.com < > https://community.hortonworks.com/answers/index.html> > > > > > > > > > On 6/13/18, 6:25 AM, "Charles Joynt" <charles.jo...@gresearch.co.uk> > wrote: > > >Regarding why I didn't choose to load data with the flatfile loader > script... > > > >I want to be able to SEND enrichment data to Metron rather than have to > set up cron jobs to PULL data. At the moment I'm trying to prove that the > process works with a simple data source. In the future we will want > enrichment data in Metron that comes from systems (e.g. HR databases) that > I won't have access to, hence will need someone to be able to send us the > data. > > > >> Carolyn: just call the flat file loader from a script processor... > > > >I didn't believe that would work in my environment. I'm pretty sure the > script has dependencies on various Metron JARs, not least for the row id > hashing algorithm. I suppose this would require at least a partial install > of Metron alongside NiFi, and would introduce additional work on the NiFi > cluster for any Metron upgrade. In some (enterprise) environments there > might be separation of ownership between NiFi and Metron. > > > >I also prefer not to have a Java app calling a bash script which calls a > new java process, with logs or error output that might just get swallowed > up invisibly. Somewhere down the line this could hold up effective > troubleshooting. > > > >> Simon: I have actually written a stellar processor, which applies > stellar to all FlowFile attributes... > > > >Gulp. > > > >> Simon: what didn't you like about the flatfile loader script? > > > >The flatfile loader script has worked fine for me when prepping > enrichment data in test systems, however it was a bit of a chore to get the > JSON configuration files set up, especially for "wide" data sources that > may have 15-20 fields, e.g. Active Directory. > > > >More broadly speaking, I want to embrace the streaming data paradigm and > tried to avoid batch jobs. With the DNS example, you might imagine a future > where the enrichment data is streamed based on DHCP registrations, DNS > update events, etc. In principle this could reduce the window of time where > we might enrich a data source with out-of-date data. > > > >Charlie > > > >-----Original Message----- > >From: Carolyn Duby [mailto:cd...@hortonworks.com] > >Sent: 12 June 2018 20:33 > >To: dev@metron.apache.org > >Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON > > > >I like the streaming enrichment solutions but it depends on how you are > getting the data in. If you get the data in a csv file just call the flat > file loader from a script processor. No special Nifi required. > > > >If the enrichments don’t arrive in bulk, the streaming solution is better. > > > >Thanks > >Carolyn Duby > >Solutions Engineer, Northeast > >cd...@hortonworks.com > >+1.508.965.0584 > > > >Join my team! > >Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions > Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try > https://community.hortonworks.com < > https://community.hortonworks.com/answers/index.html> > > > > > >On 6/12/18, 1:08 PM, "Simon Elliston Ball" <si...@simonellistonball.com> > wrote: > > > >>Good solution. The streaming enrichment writer makes a lot of sense for > >>this, especially if you're not using huge enrichment sources that need > >>the batch based loaders. > >> > >>As it happens I have written most of a NiFi processor to handle this > >>use case directly - both non-record and Record based, especially for > Otto :). > >>The one thing we need to figure out now is where to host that, and how > >>to handle releases of a nifi-metron-bundle. I'll probably get round to > >>putting the code in my github at least in the next few days, while we > >>figure out a more permanent home. > >> > >>Charlie, out of curiosity, what didn't you like about the flatfile > >>loader script? > >> > >>Simon > >> > >>On 12 June 2018 at 18:00, Charles Joynt <charles.jo...@gresearch.co.uk> > >>wrote: > >> > >>> Thanks for the responses. I appreciate the willingness to look at > >>> creating a NiFi processer. That would be great! > >>> > >>> Just to follow up on this (after a week looking after the "ops" side > >>> of > >>> dev-ops): I really don't want to have to use the flatfile loader > >>> script, and I'm not going to be able to write a Metron-style HBase > >>> key generator any time soon, but I have had some success with a > different approach. > >>> > >>> 1. Generate data in CSV format, e.g. "server.domain.local","A"," > >>> 192.168.0.198" > >>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic > >>> > >>> I then followed your instructions in this blog: > >>> https://cwiki.apache.org/confluence/display/METRON/ > >>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm > >>> ent > >>> > >>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and > >>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this > >>> into HBase: > >>> > >>> { > >>> "parserClassName": "org.apache.metron.parsers.csv.CSVParser", > >>> "writerClassName": "org.apache.metron.enrichment.writer. > >>> SimpleHbaseEnrichmentWriter", > >>> "sensorTopic": "dns", > >>> "parserConfig": { > >>> "shew.table": " dns", > >>> "shew.cf": "dns", > >>> "shew.keyColumns": "name", > >>> "shew.enrichmentType": "dns", > >>> "columns": { > >>> "name": 0, > >>> "type": 1, > >>> "data": 2 > >>> } > >>> }, > >>> } > >>> > >>> And... it seems to be working. At least, I have data in HBase which > >>> looks more like the output of the flatfile loader. > >>> > >>> Charlie > >>> > >>> -----Original Message----- > >>> From: Casey Stella [mailto:ceste...@gmail.com] > >>> Sent: 05 June 2018 14:56 > >>> To: dev@metron.apache.org > >>> Subject: Re: Writing enrichment data directly from NiFi with > >>> PutHBaseJSON > >>> > >>> The problem, as you correctly diagnosed, is the key in HBase. We > >>> construct the key very specifically in Metron, so it's unlikely to > >>> work out of the box with the NiFi processor unfortunately. The key > >>> that we use is formed here in the codebase: > >>> https://github.com/cestella/incubator-metron/blob/master/ > >>> metron-platform/metron-enrichment/src/main/java/org/ > >>> apache/metron/enrichment/converter/EnrichmentKey.java#L51 > >>> > >>> To put that in english, consider the following: > >>> > >>> - type - The enrichment type > >>> - indicator - the indicator to use > >>> - hash(*) - A murmur 3 128bit hash function > >>> > >>> the key is hash(indicator) + type + indicator > >>> > >>> This hash prefixing is a standard practice in hbase key design that > >>> allows the keys to be uniformly distributed among the regions and > >>> prevents hotspotting. Depending on how the PutHBaseJSON processor > >>> works, if you can construct the key and pass it in, then you might be > >>> able to either construct the key in NiFi or write a processor to > construct the key. > >>> Ultimately though, what Carolyn said is true..the easiest approach is > >>> probably using the flatfile loader. > >>> If you do get this working in NiFi, however, do please let us know > >>> and/or consider contributing it back to the project as a PR :) > >>> > >>> > >>> > >>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt < > >>> charles.jo...@gresearch.co.uk> > >>> wrote: > >>> > >>> > Hello, > >>> > > >>> > I work as a Dev/Ops Data Engineer within the security team at a > >>> > company in London where we are in the process of implementing Metron. > >>> > I have been tasked with implementing feeds of network environment > >>> > data into HBase so that this data can be used as enrichment sources > >>> > for our > >>> security events. > >>> > First-off I wanted to pull in DNS data for an internal domain. > >>> > > >>> > I am assuming that I need to write data into HBase in such a way > >>> > that it exactly matches what I would get from the > >>> > flatfile_loader.sh script. A colleague of mine has already loaded > >>> > some DNS data using that script, so I am using that as a reference. > >>> > > >>> > I have implemented a flow in NiFi which takes JSON data from a HTTP > >>> > listener and routes it to a PutHBaseJSON processor. The flow is > >>> > working, in the sense that data is successfully written to HBase, > >>> > but despite (naively) specifying "Row Identifier Encoding Strategy > >>> > = Binary", the results in HBase don't look correct. Comparing the > >>> > output from HBase scan commands I > >>> > see: > >>> > > >>> > flatfile_loader.sh produced: > >>> > > >>> > ROW: > >>> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x > >>> > 00\ > >>> > x0E192.168.0.198 > >>> > CELL: column=data:v, timestamp=1516896203840, > >>> > value={"clientname":"server.domain.local","clientip":"192.168.0.198 > >>> > "} > >>> > > >>> > PutHBaseJSON produced: > >>> > > >>> > ROW: server.domain.local > >>> > CELL: column=dns:v, timestamp=1527778603783, > >>> > value={"name":"server.domain.local","type":"A","data":"192.168.0.19 > >>> > 8"} > >>> > > >>> > From source JSON: > >>> > > >>> > > >>> > {"k":"server.domain.local","v":{"name":"server.domain.local","type" > >>> > :"A > >>> > ","data":"192.168.0.198"}} > >>> > > >>> > I know that there are some differences in column family / field > >>> > names, but my worry is the ROW id. Presumably I need to encode my > >>> > row key, "k" in the JSON data, in a way that matches how the > >>> > flatfile_loader.sh > >>> script did it. > >>> > > >>> > Can anyone explain how I might convert my Id to the correct format? > >>> > -or- > >>> > Does this matter-can Metron use the human-readable ROW ids? > >>> > > >>> > Charlie Joynt > >>> > > >>> > -------------- > >>> > G-RESEARCH believes the information provided herein is reliable. > >>> > While every care has been taken to ensure accuracy, the information > >>> > is furnished to the recipients with no warranty as to the > >>> > completeness and accuracy of its contents and on condition that any > >>> > errors or omissions shall not be made the basis of any claim, > >>> > demand or cause of > >>> action. > >>> > The information in this email is intended only for the named > recipient. > >>> > If you are not the intended recipient please notify us immediately > >>> > and do not copy, distribute or take action based on this e-mail. > >>> > All messages sent to and from this e-mail address will be logged by > >>> > G-RESEARCH and are subject to archival storage, monitoring, review > >>> > and disclosure. > >>> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, > >>> > Whittington House, 19-30 Alfred Place, London WC1E 7EA. > >>> > Trenchant Limited is a company registered in England with company > >>> > number 08127121. > >>> > -------------- > >>> > > >>> > >> > >> > >> > >>-- > >>-- > >>simon elliston ball > >>@sireb >