[
https://issues.apache.org/jira/browse/METRON-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936552#comment-15936552
]
ASF GitHub Bot commented on METRON-793:
---------------------------------------
Github user cestella commented on the issue:
https://github.com/apache/incubator-metron/pull/486
# Testing Plan
## Preliminaries
* Please perform the following tests on the `full-dev` vagrant environment.
* Set an environment variable to indicate `METRON_HOME`:
`export METRON_HOME=/usr/metron/0.3.1`
## Ensure Data Flows from the Indices
Ensure that with a basic full-dev we get data into the elasticsearch
indices and into HDFS.
## (Optional) Free Up Space on the virtual machine
First, let's free up some headroom on the virtual machine. If you are
running this on a
multinode cluster, you would not have to do this.
* Stop and disable Metron in Ambari
* Kill monit via `service monit stop`
* Kill tcpreplay via `for i in $(ps -ef | grep tcpreplay | awk '{print
$2}');do kill -9 $i;done`
* Kill yaf via `for i in $(ps -ef | grep yaf | awk '{print $2}');do kill -9
$i;done`
* Kill bro via `for i in $(ps -ef | grep bro | awk '{print $2}');do kill -9
$i;done`
## Test the PCAP topology
A new kafka spout necessitates testing pcap.
### Install and start pycapa
```
# set env vars
export PYCAPA_HOME=/opt/pycapa
export PYTHON27_HOME=/opt/rh/python27/root
# Install these packages via yum (RHEL, CentOS)
yum -y install epel-release centos-release-scl
yum -y install "@Development tools" python27 python27-scldevel
python27-python-virtualenv libpcap-devel libselinux-python
# Setup directories
mkdir $PYCAPA_HOME && chmod 755 $PYCAPA_HOME
# Create virtualenv
export LD_LIBRARY_PATH="/opt/rh/python27/root/usr/lib64"
${PYTHON27_HOME}/usr/bin/virtualenv pycapa-venv
# Copy pycapa
# copy incubator-metron/metron-sensors/pycapa from the Metron source tree
into $PYCAPA_HOME on the node you would like to install pycapa on.
# Build it
cd ${PYCAPA_HOME}/pycapa
# activate the virtualenv
source ${PYCAPA_HOME}/pycapa-venv/bin/activate
pip install -r requirements.txt
python setup.py install
# Run it
cd ${PYCAPA_HOME}/pycapa-venv/bin
pycapa --producer --topic pcap -i eth1 -k node1:6667
```
### Ensure pycapa can write to HDFS
* Ensure that `/apps/metron/pcap` exists and can be written to by the
storm user. If not, then:
```
sudo su - hdfs
hadoop fs -mkdir -p /apps/metron/pcap
hadoop fs -chown metron:hadoop /apps/metron/pcap
hadoop fs -chmod 775 /apps/metron/pcap
```
* Start the pcap topology via `$METRON_HOME/bin/start_pcap_topology.sh`
* Start the pycapa packet capture producer on eth1 via `/usr/bin/pycapa
--producer --topic pcap -i eth1 -k node1:6667`
* Watch the topology in the Storm UI and kill the packet capture utility
from before, when the number of packets ingested is over 3k. Ensure that at at
least 3 files exist on HDFS by running `hadoop fs -ls /apps/metron/pcap`
* Choose a file (denoted by $FILE) and dump a few of the contents using the
pcap_inspector utility via `$METRON_HOME/bin/pcap_inspector.sh -i $FILE -n 5`
* Choose one of the lines and note the protocol.
* Note that when you run the commands below, the resulting file will be
placed in the execution directory where you kicked off the job from.
* Run a Stellar query filter query by executing a command similar to the
following, with the values noted above (match your start_time format to the
date format provided - default is to use millis since epoch):
```
$METRON_HOME/bin/pcap_query.sh query -st "20160617" -df "yyyyMMdd" -query
"protocol == 6" -rpf 500
```
* Verify the MR job finishes successfully. Upon completion, you should see
multiple files named with relatively current datestamps in your current
directory, e.g. pcap-data-20160617160549737+0000.pcap
* Copy the files to your local machine and verify you can them it in
Wireshark. I chose a middle file and the last file. The middle file should have
500 records (per the records_per_file option), and the last one will likely
have a number of records <= 500.
## Test the Profiler
### Setup
* Ensure that Metron is stopped and put in maintenance mode in Ambari
* Create the profiler hbase table
`echo "create 'profiler', 'P'" | hbase shell`
* Open `~/rand_gen.py` and paste the following:
```
#!/usr/bin/python
import random
import sys
import time
def main():
mu = float(sys.argv[1])
sigma = float(sys.argv[2])
freq_s = int(sys.argv[3])
while True:
out = '{ "value" : ' + str(random.gauss(mu, sigma)) + ' }'
print out
sys.stdout.flush()
time.sleep(freq_s)
if __name__ == '__main__':
main()
```
This will generate random JSON maps with a numeric field called `value`
* Set the profiler to use 1 minute tick durations:
* Edit `$METRON_HOME/config/profiler.properties` to adjust the capture
duration by changing `profiler.period.duration=15` to
`profiler.period.duration=1`
* Edit `$METRON_HOME/config/zookeeper/global.json` and add the following
properties:
```
"profiler.client.period.duration" : "1",
"profiler.client.period.duration.units" : "MINUTES"
```
### Deploy the custom parser
* Edit the value parser config at
`$METRON_HOME/config/zookeeper/parsers/value.json`:
```
{
"parserClassName":"org.apache.metron.parsers.json.JSONMapParser",
"sensorTopic":"value",
"fieldTransformations" : [
{
"transformation" : "STELLAR"
,"output" : [ "num_profiles_parser", "mean_parser" ]
,"config" : {
"num_profiles_parser" : "LENGTH(PROFILE_GET('stat', 'global',
PROFILE_WINDOW('5 minute window every 10 minutes starting from 2 minutes ago
until 32 minutes ago excluding holidays:us')))",
"mean_parser" : "STATS_MEAN(STATS_MERGE(PROFILE_GET('stat', 'global',
PROFILE_WINDOW('5 minute window every 10 minutes starting from 2 minutes ago
until 32 minutes ago excluding holidays:us'))))"
}
}
]
}
```
* Edit the value enrichment config at
`$METRON_HOME/config/zookeeper/enrichments/value.json`:
```
{
"enrichment" : {
"fieldMap": {
"stellar" : {
"config" : {
"num_profiles_enrichment" : "LENGTH(PROFILE_GET('stat', 'global',
PROFILE_WINDOW('5 minute window every 10 minutes starting from 2 minutes ago
until 32 minutes ago excluding holidays:us')))",
"mean_enrichment" : "STATS_MEAN(STATS_MERGE(PROFILE_GET('stat',
'global', PROFILE_WINDOW('5 minute window every 10 minutes starting from 2
minutes ago until 32 minutes ago excluding holidays:us'))))"
}
}
}
}
}
```
* Create the value kafka topic:
`/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper node1:2181
--create --topic value --partitions 1 --replication-factor 1`
* Push the configs via `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i
$METRON_HOME/config/zookeeper -z node1:2181`
* Start via `$METRON_HOME/bin/start_parser_topology.sh -k node1:6667 -z
node1:2181 -s value`
### Start the profiler
* Edit `$METRON_HOME/config/zookeeper/profiler.json` and paste in the
following:
```
{
"profiles": [
{
"profile": "stat",
"foreach": "'global'",
"onlyif": "true",
"init" : {
},
"update": {
"s": "STATS_ADD(s, value)"
},
"result": "s"
}
]
}
```
* `$METRON_HOME/bin/start_profiler_topology.sh`
### Test Case
* Set up a profile to accept some synthetic data with a numeric `value`
field and persist a stats summary of the data
* Send some synthetic data directly to the profiler:
`python ~/rand_gen.py 0 1 1 |
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list
node1:6667 --topic value`
* Wait for at least 32 minutes and execute the following via the Stellar
REPL:
```
# Grab the profiles from 1 minute ago to 8 minutes ago
LENGTH(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 1 minute ago to 8
minutes ago')))
# Looks like 7 were returned, great. Now try something more complex
# Grab the profiles in 5 minute windows every 10 minutes from 2 minutes ago
to 32 minutes ago:
# 32 minutes ago til 27 minutes ago should be 5 profiles
# 22 minutes ago til 17 minutes ago should be 5 profiles
# 12 minutes ago til 7 minutes ago should be 5 profiles
# for a total of 15 profiles
LENGTH(PROFILE_GET('stat', 'global', PROFILE_WINDOW('5 minute window every
10 minutes starting from 2 minutes ago until 32 minutes ago excluding
holidays:us')))
```
For me, the following was the result:
```
```
* Delete any value index that currently exists (if any do) via `curl
-XDELETE "http://localhost:9200/value*"`
* Wait for a couple of seconds and run
* `curl "http://localhost:9200/value*/_search?pretty=true&q=*:*" 2>
/dev/null`
* You should see values in the index with non-zero fields:
* `num_profiles_enrichment` should be 15
* `num_profiles_parser` should be 15
* `mean_enrichment` should be a non-zero double
* `mean_parser` should be a non-zero double
For reference, a sample message for me is:
```
```
> Migrate to storm-kafka-client kafka spout from storm-kafka
> ----------------------------------------------------------
>
> Key: METRON-793
> URL: https://issues.apache.org/jira/browse/METRON-793
> Project: Metron
> Issue Type: Improvement
> Reporter: Casey Stella
>
> In order to eventually support kerberos, the suggested path is to migrate to
> the new kafka spout (org.apache.storm:storm-kafka-client) which uses the new
> consumer API.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)