mmiklavc edited a comment on issue #1483: METRON-2217 Migrate current HBase 
client from HTableInterface to Table
URL: https://github.com/apache/metron/pull/1483#issuecomment-525102320
 
 
   ## Test Plan
   
   ### Enrichments
   
   This will cover enrichments, threat intel, and the bulk loading utilities 
that write data to HBase
   
   #### Test basic enrichment
   
   Follow the following [updated] blog series steps here to get some data into 
Metron using Squid along with an enrichment
   
   1. 
https://cwiki.apache.org/confluence/display/METRON/2016/04/25/Metron+Tutorial+-+Fundamentals+Part+1%3A+Creating+a+New+Telemetry
   2. 
https://cwiki.apache.org/confluence/display/METRON/2016/04/28/Metron+Tutorial+-+Fundamentals+Part+2%3A+Creating+a+New+Enrichment
   
   #### Test threat intel
   
   1. 
https://cwiki.apache.org/confluence/display/METRON/2016/05/02/Metron+Tutorial+-+Fundamentals+Part+4%3A+Pluggable+Threat+Intelligence
   
   #### Test multi-threading
   
   For the final step, we'll deviate from the blog a bit so we can test that 
the thread pool doesn't cause any deadlocking/threading issues on the new HBase 
connection approach. Taken from 
https://cwiki.apache.org/confluence/display/METRON/2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment.
   
   Let's load the original whois list from step 1 as a threatintel for added 
fun. This way we can run multiple enrichments and also have it trigger threat 
intel from the same messages. Create a file `blocklist2.csv` with the following 
contents:
   ```
   [root@node1: ~]
   # cat blocklist2.csv
   aliexpress.com,squidblacklist.org
   pravda.ru,squidblacklist.org
   google.com,squidblacklist.org
   brightsideofthesun.com,squidblacklist.org
   microsoftstore.com,squidblacklist.org
   autonews.com,squidblacklist.org
   facebook.com,squidblacklist.org
   ebay.com,squidblacklist.org
   recruit.jp,squidblacklist.org
   lada.ru,squidblacklist.org
   aliexpress.com,squidblacklist.org
   ```
   
   Load the threat intel into HBase
   `${METRON_HOME}/bin/flatfile_loader.sh -i blocklist2.csv -t threatintel -c t 
-e threatintel_extractor_config.json`
   
   Clear the squid logs
   ```
   rm /var/log/squid/access.log
   touch /var/log/squid/access.log
   chown squid:squid /var/log/squid/access.log
   service squid restart
   ```
   
   Re-run new squid client commands similar to step 1. Rather than a fraction 
of the records matching on domain for the whois enrichment, we'll have them all 
match for this test.
   ```
   squidclient 
"https://www.google.com/maps/place/Waterford,+WI/@42.7639877,-88.2867248,12z/data=!4m5!3m4!1s0x88059e67de9a3861:0x2d24f51aad34c80b!8m2!3d42.7630722!4d-88.2142563";
   squidclient 
"http://www.help.1and1.co.uk/domains-c40986/transfer-domains-c79878";
   squidclient 
"https://community.cisco.com/t5/technology-and-support/ct-p/technology-support";
   squidclient "https://www.capitalone.com/support-center";
   squidclient "https://www.cnn.com/about";
   squidclient "https://contact.nba.com/";
   squidclient "https://www.espn.com/nfl/team/_/name/cle/cleveland-browns";
   ```
   
   Update your squid.json enrichment to include Stellar enrichments. We're 
going to duplicate the `whois` enrichment multiple times for the sake of 
simplicity.
   
   ```
   # cat $METRON_HOME/config/zookeeper/enrichments/squid.json
   {
     "enrichment" : {
       "fieldMap" : {
         "hbaseEnrichment" : [ "domain_without_subdomains" ],
         "stellar" : {
          "config" : {
            "e1" : {
              "user" : "ENRICHMENT_GET('user', ip_src_addr, 'enrichment', 't')"
            },
            "e2" : {
              "dws1" : "ENRICHMENT_GET('whois', domain_without_subdomains, 
'enrichment', 't')"
            },
            "e3" : {
              "dws2" : "ENRICHMENT_GET('whois', domain_without_subdomains, 
'enrichment', 't')"
            },
            "e4" : {
              "dws3" : "ENRICHMENT_GET('whois', domain_without_subdomains, 
'enrichment', 't')"
            },
            "e5" : {
              "dws4" : "ENRICHMENT_GET('whois', domain_without_subdomains, 
'enrichment', 't')"
            },
            "e6" : {
              "dws5" : "ENRICHMENT_GET('whois', domain_without_subdomains, 
'enrichment', 't')"
            }
          }
        }
       },
       "fieldToTypeMap" : {
         "domain_without_subdomains" : [ "whois" ]
       },
       "config" : { }
     },
     "threatIntel" : {
       "fieldMap" : {
         "hbaseThreatIntel" : [ "domain_without_subdomains" ]
       },
       "fieldToTypeMap" : {
         "domain_without_subdomains" : [ "squidBlacklist" ]
       },
       "config" : { },
       "triageConfig" : {
         "riskLevelRules" : [ ],
         "aggregator" : "MAX",
         "aggregationConfig" : { }
       }
     },
     "configuration" : { }
   }
   ```
   
   Load the changed enrichment
   ```
   ${METRON_HOME}/bin/zk_load_configs.sh -m PUSH -z $ZOOKEEPER -i 
${METRON_HOME}/config/zookeeper
   # verify it loaded
   ${METRON_HOME}/bin/zk_load_configs.sh -m DUMP -z $ZOOKEEPER -c ENRICHMENT -n 
squid
   ```
   
   Wipe your squid indexes in ES
   ```
   curl -XDELETE "http://node1:9200/squid*";
   ```
   
   Stop the enrichment topology
   
   In Ambari, navigate to Metron > Configs > Enrichment. Make the following 
config adjustments:
   1. Set Unified Enrichment Parallelism to 3
   2. Set Unified Threat Intel Parallelism to 3
   3. Set Unified Enrichment Cache Size to 0 (force cache misses so we hit 
HBase)
   4. Set Unified Threat Intel Cache Size to 0 (force cache misses so we hit 
HBase)
   5. Set Unified Enrichment Thread Pool Size to 5. 
   
   Restart the enrichment topology. You should see a log message in the storm 
worker logs similar to the following:
   ```
   2019-08-26 17:52:40.162 o.a.m.e.b.UnifiedEnrichmentBolt 
Thread-8-threatIntelBolt-executor[7 7] [INFO] Creating new threadpool of size 5
   ```
   
   Import the squid access data to Kafka. Run it multiple times by running the 
following:
   ```
   for in in {1..30}; do cat /var/log/squid/access.log | 
${HDP_HOME}/kafka-broker/bin/kafka-console-producer.sh --broker-list 
$BROKERLIST --topic squid; done
   ```
   
   After a bit of time, you should see new records in the squid index that have 
the new enrichment and threat intel fields (note the fields dws #1-4). You 
should get 210 records in your squid index assuming you setup your squid access 
log with 7 records during the earlier squidclient setup.
   ```
   {
   "_index": "squid_index_2019.08.24.00",
   "_type": "squid_doc",
   "_id": "AWzBEZ7MZrHsl7xo6X-6",
   "_version": 1,
   "_score": 1,
   "_source": {
   "enrichments:hbaseEnrichment:domain_without_subdomains:whois:owner": "ESPN, 
Inc.",
   "full_hostname": "www.espn.com",
   "dws1:home_country": "US",
   "dws1:domain": "espn.com",
   "dws2:domain": "espn.com",
   "dws3:home_country": "US",
   "dws1:domain_created_timestamp": "781268400000",
   "enrichments:hbaseEnrichment:domain_without_subdomains:whois:home_country": 
"US",
   
"enrichments:hbaseEnrichment:domain_without_subdomains:whois:domain_created_timestamp":
 "781268400000",
   "dws5:home_country": "US",
   "parallelenricher:enrich:end:ts": "1566607252930",
   "adapter:threatinteladapter:end:ts": "1566607252930",
   "original_string": "1566604971.782 732 127.0.0.1 TCP_MISS/200 331562 GET 
https://www.espn.com/nfl/team/_/name/cle/cleveland-browns - 
DIRECT/54.152.255.68 text/html",
   "dws3:registrar": "ESPN, Inc.",
   "dws4:owner": "ESPN, Inc.",
   "action": "TCP_MISS",
   "dws4:domain": "espn.com",
   "dws5:domain": "espn.com",
   "dws3:domain": "espn.com",
   "enrichments:hbaseEnrichment:domain_without_subdomains:whois:registrar": 
"ESPN, Inc.",
   "dws5:domain_created_timestamp": "781268400000",
   "method": "GET",
   "parallelenricher:enrich:begin:ts": "1566607252928",
   "user:user": "mmiklavcic",
   "adapter:simplehbaseadapter:end:ts": "1566607252925",
   "dws3:domain_created_timestamp": "781268400000",
   "dws2:domain_created_timestamp": "781268400000",
   "user:timestamp": 1566598784187,
   "dws2:registrar": "ESPN, Inc.",
   "user:source:type": "user",
   "dws4:domain_created_timestamp": "781268400000",
   "adapter:threatinteladapter:begin:ts": "1566607252928",
   "guid": "919b421a-b2ec-4e82-951e-3ee031c5a394",
   "dws3:owner": "ESPN, Inc.",
   "dws2:owner": "ESPN, Inc.",
   "code": 200,
   "adapter:stellaradapter:end:ts": "1566607252922",
   "enrichments:hbaseEnrichment:domain_without_subdomains:whois:domain": 
"espn.com",
   "dws2:home_country": "US",
   "dws4:home_country": "US",
   "dws1:registrar": "ESPN, Inc.",
   "elapsed": 732,
   "source:type": "squid",
   "ip_dst_addr": "54.152.255.68",
   "dws5:registrar": "ESPN, Inc.",
   "domain_without_subdomains": "espn.com",
   "ip_src_addr": "127.0.0.1",
   "timestamp": 1566604971782,
   "adapter:stellaradapter:begin:ts": "1566607252906",
   "url": "https://www.espn.com/nfl/team/_/name/cle/cleveland-browns";,
   "dws1:owner": "ESPN, Inc.",
   "parallelenricher:splitter:begin:ts": "1566607252928",
   "dws5:owner": "ESPN, Inc.",
   "user:guid": "d8fb60b7-1670-4f96-a413-cb185afbe0de",
   "bytes": 331562,
   "parallelenricher:splitter:end:ts": "1566607252928",
   "user:original_string": "mmiklavcic,127.0.0.1",
   "dws4:registrar": "ESPN, Inc.",
   "adapter:simplehbaseadapter:begin:ts": "1566607252906"
   }
   }
   ```
   
   #### Test recoverability with HBase down
   
   Now, again clear your squid index. 
   ```
   curl -XDELETE "http://node1:9200/squid*";
   ```
   
   Stop HBase and wait a few moments. Import the squid data again:
   ```
   cat /var/log/squid/access.log | 
${HDP_HOME}/kafka-broker/bin/kafka-console-producer.sh --broker-list 
$BROKERLIST --topic squid
   ```
   
   Wait about a minute and check your squid index. You should not see any new 
data in the index. Now, restart HBase again in Ambari. After HBase has 
restarted, check the squid index. After some amount of time, the data should be 
able to flow through enrichments and make it to the squid index.
   
   After completing the above steps you should not see any HBase exceptions or 
errors in the enrichment logs.
   
   ### Profiler
   
   Stop the profiler. In Ambari, set the profiler period duration to 1 minute 
via the Profiler config section.
   Adjust `$METRON_HOME/config/zookeeper/global.json` to adjust the capture 
duration:
   
   ```
   vim ${METRON_HOME}/config/zookeeper/global.json
   "profiler.client.period.duration" : "1",
   "profiler.client.period.duration.units" : "MINUTES",
   ```
   
   Create `$METRON_HOME/config/zookeeper/profiler.json` and save the following 
contents:
   ```
   {
     "profiles": [
       {
         "profile": "hello-world",
         "onlyif":  "exists(ip_dst_addr)",
         "foreach": "ip_dst_addr",
         "init":    { "count": "0" },
         "update":  { "count": "count + 1" },
         "result":  "count"
       }
     ]
   }
   ```
   
   Modify `${METRON_HOME}/config/zookeeper/enrichments/squid.json` so it pulls 
values from the profiler. Update our previous example to add the following 
Stellar enrichment "e7":
   ```
            "e6" : {
              "dws5" : "ENRICHMENT_GET('whois', domain_without_subdomains, 
'enrichment', 't')"
            },
            "e7" : {
              "profile_for_ip_dst_addr" : "PROFILE_GET( 'hello-world', 
ip_dst_addr, PROFILE_FIXED(2, 'MINUTES'))"
            }
   ```
   
   
   Push your changes to Zookeeper
   ```
   ${METRON_HOME}/bin/zk_load_configs.sh -m PUSH -z $ZOOKEEPER -i 
${METRON_HOME}/config/zookeeper
   ```
   
   Restart the profiler again.
   
   Clear your squid data
   ```
   curl -XDELETE "http://node1:9200/squid*";
   ```
   
   And publish some squid data to the squid topic for roughly 500 seconds. This 
is a somewhat arbitrary choice, but we want to give the profiles enough time to 
flush in order for the enrichments to start picking up the profile data from 
HBase.
   ```
   for in in {1..100}; do cat /var/log/squid/access.log | 
${HDP_HOME}/kafka-broker/bin/kafka-console-producer.sh --broker-list 
$BROKERLIST --topic squid; sleep 5; done
   ```
   
   Once this process completes, you should note the following:
   1. No errors/exceptions in the profiler or enrichment Storm logs
   2. 700 records get written to the Squid index in ES
   3. You should see many (not all, especially the early records) records 
written with non-empty values for field `profile_for_ip_dst_addr`. e.g.
       ```
       curl -XGET "http://node1:9200/squid*/_search?size=700&pretty=true"; | 
grep -A 2 profile_for_ip_dst_addr
       ```
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to