[GitHub] [metron] mmiklavc edited a comment on issue #1368: Metron-2053: Refactor metron-enrichment to decouple Storm dependencies

GitBox Fri, 29 Mar 2019 12:25:22 -0700

mmiklavc edited a comment on issue #1368: Metron-2053: Refactor 
metron-enrichment to decouple Storm dependencies
URL: https://github.com/apache/metron/pull/1368#issuecomment-477798665
 
 
   # Testing Plan
   
   ## Setup Test Environment
   
   * Build full dev `metron/metron-deployment/development/centos6$ vagrant up`
   * Login to full dev `ssh root@node1`, password "vagrant"
   * Set some environment variables - Note: set your metron version accordingly 
if using this test script in the future for a later version.
   ```
   # version info
   export METRON_VERSION=0.7.1
   export SOLR_VERSION=6.6.2
   
   # paths
   export METRON_HOME=/usr/metron/${METRON_VERSION}
   export HDP_HOME=/usr/hdp/current
   export KAFKA_HOME=/usr/hdp/current/kafka-broker
   export SOLR_HOME=/var/solr/solr-${SOLR_VERSION}
   export ELASTIC_HOME=/usr/share/elasticsearch
   export KIBANA_HOME=/usr/share/kibana
   export STORM_LOGS=/var/log/storm/workers-artifacts
   
   # host info
   export METRON_HOST=node1
   export ZOOKEEPER=${METRON_HOST}:2181
   export BROKERLIST=${METRON_HOST}:6667
   export STORM_UI=http://${METRON_HOST}:8744
   export ELASTIC=http://${METRON_HOST}:9200
   export ES_HOST=http://${METRON_HOST}:9200
   export KIBANA=http://${METRON_HOST}:5000
   ```
   
   ## Basics
   
   Verify data is flowing through the system, from parsing to indexing
   
   1. Open Ambari and navigate to the Metron service 
http://node1:8080/#/main/services/METRON/summary
   2. Open the Alerts UI
   3. 
![image](https://user-images.githubusercontent.com/658443/55191493-f119ec00-5167-11e9-8444-be77308ccf24.png)
   4. Verify alerts show up in the main UI - click the search icon (you may 
need to wait a moment for them to appear)
   
![image](https://user-images.githubusercontent.com/658443/55191611-3dfdc280-5168-11e9-90ac-dc949f458b7f.png)
   5. Head back to Ambari and select the Kibana service 
http://node1:8080/#/main/services/KIBANA/summary
   6. Open the Kibana dashboard via the "Metron UI" option in the quick links
   7. 
![image](https://user-images.githubusercontent.com/658443/55191670-67b6e980-5168-11e9-9edd-4d346ed90da8.png)
   8. Verify the dashboard is populating
   9. 
![image](https://user-images.githubusercontent.com/658443/55191751-99c84b80-5168-11e9-82eb-d95ce1414478.png)
   
   ## Profiler
   
   Verify profiler still works in Storm and the REPL.
   
   Pulled from 
https://github.com/apache/metron/blob/master/metron-analytics/metron-profiler-storm/README.md
   
   1. First, we'll configure the profiler to emit a profile every 1 minute 
rather than every 15, for expediency:
   
       * First, stop the profiler
       * In Ambari, set the profiler period duration to `1` minute via the 
Profiler config section.
       * 
![image](https://user-images.githubusercontent.com/658443/55196533-4c9ea680-5175-11e9-9b96-74ffdc7f5d81.png)
       * Pull down latest global config to the local file system
           ```$METRON_HOME/bin/zk_load_configs.sh -m PULL -o 
${METRON_HOME}/config/zookeeper -z $ZOOKEEPER -f```
       * Adjust `$METRON_HOME/config/zookeeper/global.json` to adjust the 
capture duration:
           ```
           "profiler.client.period.duration" : "1",
           "profiler.client.period.duration.units" : "MINUTES"
           ```
       * Push the changes back up to Zookeeper
           ```$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i 
$METRON_HOME/config/zookeeper/ -z $ZOOKEEPER```
   
   1.  Start the Stellar Shell with the `-z` command line argument so that a 
connection to Zookeeper is established.  This is required when  deploying a new 
profile definition as shown in the steps below.
       ```
       [root@node1 ~]# source /etc/default/metron
       [root@node1 ~]# $METRON_HOME/bin/stellar -z $ZOOKEEPER
       Stellar, Go!
       [Stellar]>>>
       ```
   
   1. If you haven't already, define your profile.
        ```
        [Stellar]>>> conf := SHELL_EDIT()
        [Stellar]>>> conf
        {
          "profiles": [
            {
              "profile": "hello-world",
              "onlyif":  "exists(ip_src_addr)",
              "foreach": "ip_src_addr",
              "init":    { "count": "0" },
              "update":  { "count": "count + 1" },
              "result":  "count"
            }
          ]
        }
        ```
   
   1. Check what is already deployed.  
   
        Pushing a new profile configuration is destructive.  It will overwrite 
any existing configuration.  Check what you have out there.  Manually merge the 
existing configuration with your new profile definition.
   
        ```
        [Stellar]>>> existing := CONFIG_GET("PROFILER")
        ```
   
   1. Deploy your profile.  This will push the configuration to to the live, 
actively running Profiler topology.  This will overwrite any existing profile 
definitions.
   
        ```
        [Stellar]>>> CONFIG_PUT("PROFILER", conf)
        ```
   
   1. Exit the Stellar REPL and now restart the profiler
   
   1. Make sure the sensor stubs are running.
   
       ```service sensor-stubs start```
   
   1. Wait a few minutes - we want to wait some multiple of the period duration 
to ensure that the profiler has been able to flush data to HBase multiple times 
before we check in the next step.
   
   1. Check the profiler is writing to HBase
       ```
       echo "count 'profiler'" | hbase shell
       HBase Shell; enter 'help<RETURN>' for list of supported commands.
       Type "exit<RETURN>" to leave the HBase Shell
       Version 1.1.2.2.6.5.1050-37, r897822d4dd5956ca186974c10382e9094683fa29, 
Tue Dec 11 02:04:10 UTC 2018
   
       count 'profiler'
       24 row(s) in 0.9550 seconds
       ```
   
   1. Start the Stellar REPL back up again
       ```
       [root@node1 ~]# $METRON_HOME/bin/stellar -z $ZOOKEEPER
       Stellar, Go!
       [Stellar]>>>
       ```
   
   1. Read values from the profiler. We'll first print out the help on 
PROFILE_GET and PROFILE_FIXED for context. The ip_src_addr I'm using below, 
"192.168.66.1," is pulled from one of the records in the alerts UI.
       ```
       [Stellar]>>> ?PROFILE_GET
       PROFILE_GET
       Description: Retrieves a series of values from a stored profile.
       
       Arguments:
           profile - The name of the profile.
           entity - The name of the entity.
           periods - The list of profile periods to fetch. Use PROFILE_WINDOW 
or PROFILE_FIXED.
           groups - Optional - The groups to retrieve. Must correspond to the 
'groupBy' list used during profile creation. Defaults to an empty list, meaning 
no groups.
           config_overrides - Optional - Map (in curly braces) of name:value 
pairs, each overriding the global config parameter of the same name. Default is 
the empty Map, meaning no overrides.
       
       Returns: The selected profile measurements.
   
       [Stellar]>>> ?PROFILE_FIXED
       PROFILE_FIXED
       Description: The profiler periods associated with a fixed lookback 
starting from now.
   
       Arguments:
           durationAgo - How long ago should values be retrieved from?
           units - The units of 'durationAgo'.
           config_overrides - Optional - Map (in curly braces) of name:value 
pairs, each overriding the global config parameter of the same name. Default is 
the empty Map, meaning no overrides.
   
       Returns: The selected profile measurement periods.  These are 
ProfilePeriod objects.
   
       [Stellar]>>> PROFILE_GET("hello-world","192.168.66.1",PROFILE_FIXED(30, 
"MINUTES"))
       [158, 191, 184, 205, 178, 194, 180]
       ```
   
   1. You should see an array of the profile measurements, as indicated in the 
command output above.
   
   ## PCAP
   
   Steps adapted from 
https://github.com/apache/metron/pull/1157#issuecomment-412972370
   
   Setup root user HDFS dir (only needed if running commands as root)
   
   ```
   sudo -u hdfs hdfs dfs -mkdir /user/root
   sudo -u hdfs hdfs dfs -chown root:hdfs /user/root
   ```
   
   Get PCAP data into Metron: 
   
   1. Install and setup pycapa (this has been updated in master recently) - 
https://github.com/apache/metron/blob/master/metron-sensors/pycapa/README.md#centos-6
   2. (if using singlenode vagrant) You can kill the enrichment, profiler, 
indexing, and sensor topologies to free up resources, if needed, via `for i in 
bro enrichment random_access_indexing batch_indexing yaf snort;do storm kill 
$i;done`
   3. Start the pcap topology if it's not already running via 
`$METRON_HOME/bin/start_pcap_topology.sh`
   4. Start the pycapa packet capture producer on eth1 via `pycapa --producer 
--kafka-topic pcap --interface eth1 --kafka-broker $BROKERLIST`
   5. Watch the topology in the Storm UI and kill the packet capture utility 
from before, when the number of packets ingested is over 3k.
   6. Ensure that at at least 3 files exist on HDFS by running `hdfs dfs -ls 
/apps/metron/pcap/input`
   7. Choose a file (denoted by $FILE) and dump a few of the contents using the 
pcap_inspector utility via `$METRON_HOME/bin/pcap_inspector.sh -i $FILE -n 5`
   8. Choose one of the lines and note the protocol.
   9. Note that when you run the commands below, the resulting file will be 
placed in the execution directory where you kicked off the job from.
   
   ### Run a fixed filter query
   
   1. Run a fixed filter query by executing the following command with the 
values noted above (match your start_time format to the date format provided - 
default is to use millis since epoch)
   2. `$METRON_HOME/bin/pcap_query.sh fixed -st <start_time> -df "yyyyMMdd" -p 
<protocol_num> -rpf 500`
   3. Verify the MR job finishes successfully. Upon completion, you should see 
multiple files named with relatively current datestamps in your current 
directory, e.g. pcap-data-20160617160549737+0000.pcap
   4. Copy the files to your local machine and verify you can them it in 
Wireshark. I chose a middle file and the last file. The middle file should have 
500 records (per the records_per_file option), and the last one will likely 
have a number of records <= 500.
   
   ## Flatfile loader
   
   Make sure no classpath issues have broken it.
   
   Steps adapted from  
https://github.com/apache/metron/pull/432#issuecomment-276733075
   
   ### Preliminaries
   
   * Download the alexa 1m dataset:
   ```
   wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
   unzip top-1m.csv.zip
   ```
   * Stage import file
   ```
   head -n 10000 top-1m.csv > top-10k.csv
   ```
   * Create an extractor.json for the CSV data by editing `extractor.json` and 
pasting in these contents:
   ```
   {
     "config" : {
       "columns" : {
          "domain" : 1,
          "rank" : 0
                   }
       ,"indicator_column" : "domain"
       ,"type" : "alexa"
       ,"separator" : ","
                },
     "extractor" : "CSV"
   }
   ```
   
   The extractor.json will get used by flatfile_loader.sh in the next step
   
   ### Import from HDFS via MR
   
   ```
   # truncate hbase
   echo "truncate 'enrichment'" | hbase shell
   # import data into hbase 
   $METRON_HOME/bin/flatfile_loader.sh -i /tmp/top-10k.csv -t enrichment -c t 
-e ./extractor.json -m MR
   # count data written and verify it's 10k
   echo "count 'enrichment'" | hbase shell
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [metron] mmiklavc edited a comment on issue #1368: Metron-2053: Refactor metron-enrichment to decouple Storm dependencies

Reply via email to