Do we, the community, think it would be a good idea to create a PutMetronEnrichment NiFi processor for this use case? It seems a number of people want to use NiFi to manage and schedule loading of enrichments for example.
Simon On 5 June 2018 at 06:56, Casey Stella <ceste...@gmail.com> wrote: > The problem, as you correctly diagnosed, is the key in HBase. We construct > the key very specifically in Metron, so it's unlikely to work out of the > box with the NiFi processor unfortunately. The key that we use is formed > here in the codebase: > https://github.com/cestella/incubator-metron/blob/master/ > metron-platform/metron-enrichment/src/main/java/org/ > apache/metron/enrichment/converter/EnrichmentKey.java#L51 > > To put that in english, consider the following: > > - type - The enrichment type > - indicator - the indicator to use > - hash(*) - A murmur 3 128bit hash function > > the key is hash(indicator) + type + indicator > > This hash prefixing is a standard practice in hbase key design that allows > the keys to be uniformly distributed among the regions and prevents > hotspotting. Depending on how the PutHBaseJSON processor works, if you can > construct the key and pass it in, then you might be able to either > construct the key in NiFi or write a processor to construct the key. > Ultimately though, what Carolyn said is true..the easiest approach is > probably using the flatfile loader. > If you do get this working in NiFi, however, do please let us know and/or > consider contributing it back to the project as a PR :) > > > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt < > charles.jo...@gresearch.co.uk> > wrote: > > > Hello, > > > > I work as a Dev/Ops Data Engineer within the security team at a company > in > > London where we are in the process of implementing Metron. I have been > > tasked with implementing feeds of network environment data into HBase so > > that this data can be used as enrichment sources for our security events. > > First-off I wanted to pull in DNS data for an internal domain. > > > > I am assuming that I need to write data into HBase in such a way that it > > exactly matches what I would get from the flatfile_loader.sh script. A > > colleague of mine has already loaded some DNS data using that script, so > I > > am using that as a reference. > > > > I have implemented a flow in NiFi which takes JSON data from a HTTP > > listener and routes it to a PutHBaseJSON processor. The flow is working, > in > > the sense that data is successfully written to HBase, but despite > (naively) > > specifying "Row Identifier Encoding Strategy = Binary", the results in > > HBase don't look correct. Comparing the output from HBase scan commands I > > see: > > > > flatfile_loader.sh produced: > > > > ROW: > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\ > x05whois\x00\x0E192.168.0.198 > > CELL: column=data:v, timestamp=1516896203840, > > value={"clientname":"server.domain.local","clientip":"192.168.0.198"} > > > > PutHBaseJSON produced: > > > > ROW: server.domain.local > > CELL: column=dns:v, timestamp=1527778603783, > > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"} > > > > From source JSON: > > > > > > {"k":"server.domain.local","v":{"name":"server.domain.local" > ,"type":"A","data":"192.168.0.198"}} > > > > I know that there are some differences in column family / field names, > but > > my worry is the ROW id. Presumably I need to encode my row key, "k" in > the > > JSON data, in a way that matches how the flatfile_loader.sh script did > it. > > > > Can anyone explain how I might convert my Id to the correct format? > > -or- > > Does this matter-can Metron use the human-readable ROW ids? > > > > Charlie Joynt > > > > -------------- > > G-RESEARCH believes the information provided herein is reliable. While > > every care has been taken to ensure accuracy, the information is > furnished > > to the recipients with no warranty as to the completeness and accuracy of > > its contents and on condition that any errors or omissions shall not be > > made the basis of any claim, demand or cause of action. > > The information in this email is intended only for the named recipient. > > If you are not the intended recipient please notify us immediately and do > > not copy, distribute or take action based on this e-mail. > > All messages sent to and from this e-mail address will be logged by > > G-RESEARCH and are subject to archival storage, monitoring, review and > > disclosure. > > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, > > Whittington House, 19-30 Alfred Place, London WC1E 7EA. > > Trenchant Limited is a company registered in England with company number > > 08127121. > > -------------- > > > -- -- simon elliston ball @sireb