Hello,

I work as a Dev/Ops Data Engineer within the security team at a company in 
London where we are in the process of implementing Metron. I have been tasked 
with implementing feeds of network environment data into HBase so that this 
data can be used as enrichment sources for our security events. First-off I 
wanted to pull in DNS data for an internal domain.

I am assuming that I need to write data into HBase in such a way that it 
exactly matches what I would get from the flatfile_loader.sh script. A 
colleague of mine has already loaded some DNS data using that script, so I am 
using that as a reference.

I have implemented a flow in NiFi which takes JSON data from a HTTP listener 
and routes it to a PutHBaseJSON processor. The flow is working, in the sense 
that data is successfully written to HBase, but despite (naively) specifying 
"Row Identifier Encoding Strategy = Binary", the results in HBase don't look 
correct. Comparing the output from HBase scan commands I see:

flatfile_loader.sh produced:

ROW:      
\xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\x0E192.168.0.198
CELL: column=data:v, timestamp=1516896203840, 
value={"clientname":"server.domain.local","clientip":"192.168.0.198"}

PutHBaseJSON produced:

ROW:  server.domain.local
CELL: column=dns:v, timestamp=1527778603783, 
value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}

>From source JSON:

{"k":"server.domain.local","v":{"name":"server.domain.local","type":"A","data":"192.168.0.198"}}

I know that there are some differences in column family / field names, but my 
worry is the ROW id. Presumably I need to encode my row key, "k" in the JSON 
data, in a way that matches how the flatfile_loader.sh script did it.

Can anyone explain how I might convert my Id to the correct format?
-or-
Does this matter-can Metron use the human-readable ROW ids?

Charlie Joynt

--------------
G-RESEARCH believes the information provided herein is reliable. While every 
care has been taken to ensure accuracy, the information is furnished to the 
recipients with no warranty as to the completeness and accuracy of its contents 
and on condition that any errors or omissions shall not be made the basis of 
any claim, demand or cause of action.
The information in this email is intended only for the named recipient.  If you 
are not the intended recipient please notify us immediately and do not copy, 
distribute or take action based on this e-mail.
All messages sent to and from this e-mail address will be logged by G-RESEARCH 
and are subject to archival storage, monitoring, review and disclosure.
G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, Whittington 
House, 19-30 Alfred Place, London WC1E 7EA.
Trenchant Limited is a company registered in England with company number 
08127121.
--------------

Reply via email to