Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Carolyn Duby Wed, 13 Jun 2018 07:02:04 -0700

Agreed….Streaming enrichments is the right solution for DNS data.

Do we have a web service for writing enrichments?


Carolyn Duby
Solutions Engineer, Northeast
cd...@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com 
<https://community.hortonworks.com/answers/index.html>








On 6/13/18, 6:25 AM, "Charles Joynt" <charles.jo...@gresearch.co.uk> wrote:

>Regarding why I didn't choose to load data with the flatfile loader script...
>
>I want to be able to SEND enrichment data to Metron rather than have to set up 
>cron jobs to PULL data. At the moment I'm trying to prove that the process 
>works with a simple data source. In the future we will want enrichment data in 
>Metron that comes from systems (e.g. HR databases) that I won't have access 
>to, hence will need someone to be able to send us the data.
>
>> Carolyn: just call the flat file loader from a script processor...
>
>I didn't believe that would work in my environment. I'm pretty sure the script 
>has dependencies on various Metron JARs, not least for the row id hashing 
>algorithm. I suppose this would require at least a partial install of Metron 
>alongside NiFi, and would introduce additional work on the NiFi cluster for 
>any Metron upgrade. In some (enterprise) environments there might be 
>separation of ownership between NiFi and Metron.
>
>I also prefer not to have a Java app calling a bash script which calls a new 
>java process, with logs or error output that might just get swallowed up 
>invisibly. Somewhere down the line this could hold up effective 
>troubleshooting.
>
>> Simon: I have actually written a stellar processor, which applies stellar to 
>> all FlowFile attributes...
>
>Gulp.
>
>> Simon: what didn't you like about the flatfile loader script?
>
>The flatfile loader script has worked fine for me when prepping enrichment 
>data in test systems, however it was a bit of a chore to get the JSON 
>configuration files set up, especially for "wide" data sources that may have 
>15-20 fields, e.g. Active Directory.
>
>More broadly speaking, I want to embrace the streaming data paradigm and tried 
>to avoid batch jobs. With the DNS example, you might imagine a future where 
>the enrichment data is streamed based on DHCP registrations, DNS update 
>events, etc. In principle this could reduce the window of time where we might 
>enrich a data source with out-of-date data.
>
>Charlie
>
>-----Original Message-----
>From: Carolyn Duby [mailto:cd...@hortonworks.com] 
>Sent: 12 June 2018 20:33
>To: dev@metron.apache.org
>Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>
>I like the streaming enrichment solutions but it depends on how you are 
>getting the data in.  If you get the data in a csv file just call the flat 
>file loader from a script processor.  No special Nifi required.
>
>If the enrichments don’t arrive in bulk, the streaming solution is better.
>
>Thanks
>Carolyn Duby
>Solutions Engineer, Northeast
>cd...@hortonworks.com
>+1.508.965.0584
>
>Join my team!
>Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions 
>Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try 
>https://community.hortonworks.com 
><https://community.hortonworks.com/answers/index.html>
>
>
>On 6/12/18, 1:08 PM, "Simon Elliston Ball" <si...@simonellistonball.com> wrote:
>
>>Good solution. The streaming enrichment writer makes a lot of sense for 
>>this, especially if you're not using huge enrichment sources that need 
>>the batch based loaders.
>>
>>As it happens I have written most of a NiFi processor to handle this 
>>use case directly - both non-record and Record based, especially for Otto :).
>>The one thing we need to figure out now is where to host that, and how 
>>to handle releases of a nifi-metron-bundle. I'll probably get round to 
>>putting the code in my github at least in the next few days, while we 
>>figure out a more permanent home.
>>
>>Charlie, out of curiosity, what didn't you like about the flatfile 
>>loader script?
>>
>>Simon
>>
>>On 12 June 2018 at 18:00, Charles Joynt <charles.jo...@gresearch.co.uk>
>>wrote:
>>
>>> Thanks for the responses. I appreciate the willingness to look at 
>>> creating a NiFi processer. That would be great!
>>>
>>> Just to follow up on this (after a week looking after the "ops" side 
>>> of
>>> dev-ops): I really don't want to have to use the flatfile loader 
>>> script, and I'm not going to be able to write a Metron-style HBase 
>>> key generator any time soon, but I have had some success with a different 
>>> approach.
>>>
>>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
>>> 192.168.0.198"
>>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
>>>
>>> I then followed your instructions in this blog:
>>> https://cwiki.apache.org/confluence/display/METRON/
>>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm
>>> ent
>>>
>>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and 
>>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this 
>>> into HBase:
>>>
>>> {
>>>         "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>>>         "writerClassName": "org.apache.metron.enrichment.writer.
>>> SimpleHbaseEnrichmentWriter",
>>>         "sensorTopic": "dns",
>>>         "parserConfig": {
>>>                 "shew.table": " dns",
>>>                 "shew.cf": "dns",
>>>                 "shew.keyColumns": "name",
>>>                 "shew.enrichmentType": "dns",
>>>                 "columns": {
>>>                         "name": 0,
>>>                         "type": 1,
>>>                         "data": 2
>>>                 }
>>>         },
>>> }
>>>
>>> And... it seems to be working. At least, I have data in HBase which 
>>> looks more like the output of the flatfile loader.
>>>
>>> Charlie
>>>
>>> -----Original Message-----
>>> From: Casey Stella [mailto:ceste...@gmail.com]
>>> Sent: 05 June 2018 14:56
>>> To: dev@metron.apache.org
>>> Subject: Re: Writing enrichment data directly from NiFi with 
>>> PutHBaseJSON
>>>
>>> The problem, as you correctly diagnosed, is the key in HBase.  We 
>>> construct the key very specifically in Metron, so it's unlikely to 
>>> work out of the box with the NiFi processor unfortunately.  The key 
>>> that we use is formed here in the codebase:
>>> https://github.com/cestella/incubator-metron/blob/master/
>>> metron-platform/metron-enrichment/src/main/java/org/
>>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>>>
>>> To put that in english, consider the following:
>>>
>>>    - type - The enrichment type
>>>    - indicator - the indicator to use
>>>    - hash(*) - A murmur 3 128bit hash function
>>>
>>> the key is hash(indicator) + type + indicator
>>>
>>> This hash prefixing is a standard practice in hbase key design that 
>>> allows the keys to be uniformly distributed among the regions and 
>>> prevents hotspotting.  Depending on how the PutHBaseJSON processor 
>>> works, if you can construct the key and pass it in, then you might be 
>>> able to either construct the key in NiFi or write a processor to construct 
>>> the key.
>>> Ultimately though, what Carolyn said is true..the easiest approach is 
>>> probably using the flatfile loader.
>>> If you do get this working in NiFi, however, do please let us know 
>>> and/or consider contributing it back to the project as a PR :)
>>>
>>>
>>>
>>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt < 
>>> charles.jo...@gresearch.co.uk>
>>> wrote:
>>>
>>> > Hello,
>>> >
>>> > I work as a Dev/Ops Data Engineer within the security team at a 
>>> > company in London where we are in the process of implementing Metron.
>>> > I have been tasked with implementing feeds of network environment 
>>> > data into HBase so that this data can be used as enrichment sources 
>>> > for our
>>> security events.
>>> > First-off I wanted to pull in DNS data for an internal domain.
>>> >
>>> > I am assuming that I need to write data into HBase in such a way 
>>> > that it exactly matches what I would get from the 
>>> > flatfile_loader.sh script. A colleague of mine has already loaded 
>>> > some DNS data using that script, so I am using that as a reference.
>>> >
>>> > I have implemented a flow in NiFi which takes JSON data from a HTTP 
>>> > listener and routes it to a PutHBaseJSON processor. The flow is 
>>> > working, in the sense that data is successfully written to HBase, 
>>> > but despite (naively) specifying "Row Identifier Encoding Strategy 
>>> > = Binary", the results in HBase don't look correct. Comparing the 
>>> > output from HBase scan commands I
>>> > see:
>>> >
>>> > flatfile_loader.sh produced:
>>> >
>>> > ROW:
>>> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x
>>> > 00\
>>> > x0E192.168.0.198
>>> > CELL: column=data:v, timestamp=1516896203840, 
>>> > value={"clientname":"server.domain.local","clientip":"192.168.0.198
>>> > "}
>>> >
>>> > PutHBaseJSON produced:
>>> >
>>> > ROW:  server.domain.local
>>> > CELL: column=dns:v, timestamp=1527778603783, 
>>> > value={"name":"server.domain.local","type":"A","data":"192.168.0.19
>>> > 8"}
>>> >
>>> > From source JSON:
>>> >
>>> >
>>> > {"k":"server.domain.local","v":{"name":"server.domain.local","type"
>>> > :"A
>>> > ","data":"192.168.0.198"}}
>>> >
>>> > I know that there are some differences in column family / field 
>>> > names, but my worry is the ROW id. Presumably I need to encode my 
>>> > row key, "k" in the JSON data, in a way that matches how the 
>>> > flatfile_loader.sh
>>> script did it.
>>> >
>>> > Can anyone explain how I might convert my Id to the correct format?
>>> > -or-
>>> > Does this matter-can Metron use the human-readable ROW ids?
>>> >
>>> > Charlie Joynt
>>> >
>>> > --------------
>>> > G-RESEARCH believes the information provided herein is reliable. 
>>> > While every care has been taken to ensure accuracy, the information 
>>> > is furnished to the recipients with no warranty as to the 
>>> > completeness and accuracy of its contents and on condition that any 
>>> > errors or omissions shall not be made the basis of any claim, 
>>> > demand or cause of
>>> action.
>>> > The information in this email is intended only for the named recipient.
>>> > If you are not the intended recipient please notify us immediately 
>>> > and do not copy, distribute or take action based on this e-mail.
>>> > All messages sent to and from this e-mail address will be logged by 
>>> > G-RESEARCH and are subject to archival storage, monitoring, review 
>>> > and disclosure.
>>> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, 
>>> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
>>> > Trenchant Limited is a company registered in England with company 
>>> > number 08127121.
>>> > --------------
>>> >
>>>
>>
>>
>>
>>--
>>--
>>simon elliston ball
>>@sireb

Re: Writing enrichment data directly from NiFi with PutHBaseJSON

Reply via email to