[
https://issues.apache.org/jira/browse/METRON-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940551#comment-15940551
]
ASF GitHub Bot commented on METRON-797:
---------------------------------------
Github user cestella commented on the issue:
https://github.com/apache/incubator-metron/pull/490
# Testing Plan
## Preliminaries
* Please perform the following tests on the `full-dev` vagrant environment.
* Set an environment variable to indicate `METRON_HOME`:
`export METRON_HOME=/usr/metron/0.3.1`
## Ensure Data Flows from the Indices
Ensure that with a basic full-dev we get data into the elasticsearch
indices and into HDFS.
## (Optional) Free Up Space on the virtual machine
First, let's free up some headroom on the virtual machine. If you are
running this on a
multinode cluster, you would not have to do this.
* Stop and disable Metron in Ambari
* Kill monit via `service monit stop`
* Kill tcpreplay via `for i in $(ps -ef | grep tcpreplay | awk '{print
$2}');do kill -9 $i;done`
* Kill yaf via `for i in $(ps -ef | grep yaf | awk '{print $2}');do kill -9
$i;done`
* Kill bro via `for i in $(ps -ef | grep bro | awk '{print $2}');do kill -9
$i;done`
## Test the PCAP topology
A new kafka spout necessitates testing pcap.
### Install and start pycapa
```
# set env vars
export PYCAPA_HOME=/opt/pycapa
export PYTHON27_HOME=/opt/rh/python27/root
# Install these packages via yum (RHEL, CentOS)
yum -y install epel-release centos-release-scl
yum -y install "@Development tools" python27 python27-scldevel
python27-python-virtualenv libpcap-devel libselinux-python
# Setup directories
mkdir $PYCAPA_HOME && chmod 755 $PYCAPA_HOME
#Grab pycapa from git
cd ~
git clone https://github.com/apache/incubator-metron.git
cp -R ~/incubator-metron/metron-sensors/pycapa* $PYCAPA_HOME
# Create virtualenv
export LD_LIBRARY_PATH="/opt/rh/python27/root/usr/lib64"
${PYTHON27_HOME}/usr/bin/virtualenv pycapa-venv
# Build it
cd ${PYCAPA_HOME}/pycapa
# activate the virtualenv
source ${PYCAPA_HOME}/pycapa-venv/bin/activate
pip install -r requirements.txt
python setup.py install
# Run it
cd ${PYCAPA_HOME}/pycapa-venv/bin
pycapa --producer --topic pcap -i eth1 -k node1:6667
```
### Ensure pycapa can write to HDFS
* Ensure that `/apps/metron/pcap` exists and can be written to by the
storm user. If not, then:
```
sudo su - hdfs
hadoop fs -mkdir -p /apps/metron/pcap
hadoop fs -chown metron:hadoop /apps/metron/pcap
hadoop fs -chmod 775 /apps/metron/pcap
exit
```
* Start the pcap topology via `$METRON_HOME/bin/start_pcap_topology.sh`
* Watch the topology in the Storm UI and kill the packet capture utility
from before, when the number of packets ingested is over 3k. Ensure that at at
least 3 files exist on HDFS by running `hadoop fs -ls /apps/metron/pcap`
* Choose a file (denoted by $FILE) and dump a few of the contents using the
pcap_inspector utility via `$METRON_HOME/bin/pcap_inspector.sh -i $FILE -n 5`
* Choose one of the lines and note the `ip_dst_port`.
* Note that when you run the commands below, the resulting file will be
placed in the execution directory where you kicked off the job from.
* Run a Stellar query filter query by executing a command similar to the
following, with the values noted above (match your start_time format to the
date format provided - default is to use millis since epoch):
```
$METRON_HOME/bin/pcap_query.sh query -st "20160617" -df "yyyyMMdd" -query
"ip_dst_port == 22" -rpf 500
```
* Note that if your MR job fails because of a lack of user directory for
`root`, then the following will create the directory appropriately:
```
sudo su - hdfs
hadoop fs -mkdir /user/root
hadoop fs -chown root:hadoop /user/root
hadoop fs -chmod 755 /user/root
exit
```
* Verify the MR job finishes successfully. Upon completion, you should see
multiple files named with relatively current datestamps in your current
directory, e.g. pcap-data-20160617160549737+0000.pcap
* Copy the files to your local machine and verify you can them it in
Wireshark. Open the files and ensure that they contain only packets to the
destination port in question.
## Test the Profiler
### Setup
* Ensure that Metron is stopped and put in maintenance mode in Ambari
* Create the profiler hbase table
`echo "create 'profiler', 'P'" | hbase shell`
* Open `~/rand_gen.py` and paste the following:
```
#!/usr/bin/python
import random
import sys
import time
def main():
mu = float(sys.argv[1])
sigma = float(sys.argv[2])
freq_s = int(sys.argv[3])
while True:
out = '{ "value" : ' + str(random.gauss(mu, sigma)) + ' }'
print out
sys.stdout.flush()
time.sleep(freq_s)
if __name__ == '__main__':
main()
```
This will generate random JSON maps with a numeric field called `value`
* From your metron build, copy up the profiler bundle:
```
scp
metron-analytics/metron-profiler/target/metron-profiler-0.3.1-archive.tar.gz
root@node1:/usr/metron/0.3.1
```
* From `$METRON_HOME` on `node1`:
```
tar xzvf metron-profiler-*.tar.gz
```
* Set the profiler to use 1 minute tick durations:
* Edit `$METRON_HOME/config/profiler.properties` to adjust the capture
duration by changing `profiler.period.duration=15` to
`profiler.period.duration=1`
* Edit `$METRON_HOME/config/zookeeper/global.json` and add the following
properties:
```
"profiler.client.period.duration" : "1",
"profiler.client.period.duration.units" : "MINUTES"
```
### Deploy the custom parser
* Edit the value parser config at
`$METRON_HOME/config/zookeeper/parsers/value.json`:
```
{
"parserClassName":"org.apache.metron.parsers.json.JSONMapParser",
"sensorTopic":"value",
"fieldTransformations" : [
{
"transformation" : "STELLAR"
,"output" : [ "num_profiles_parser", "mean_parser" ]
,"config" : {
"num_profiles_parser" : "LENGTH(PROFILE_GET('stat', 'global',
PROFILE_WINDOW('from 5 minutes ago')))",
"mean_parser" : "STATS_MEAN(STATS_MERGE(PROFILE_GET('stat', 'global',
PROFILE_WINDOW('from 5 minutes ago'))))"
}
}
]
}
```
* Edit the value enrichment config at
`$METRON_HOME/config/zookeeper/enrichments/value.json`:
```
{
"enrichment" : {
"fieldMap": {
"stellar" : {
"config" : {
"num_profiles_enrichment" : "LENGTH(PROFILE_GET('stat', 'global',
PROFILE_WINDOW('from 5 minutes ago')))",
"mean_enrichment" : "STATS_MEAN(STATS_MERGE(PROFILE_GET('stat',
'global', PROFILE_WINDOW('from 5 minutes ago'))))"
}
}
}
}
}
```
* Create the value kafka topic:
`/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper node1:2181
--create --topic value --partitions 1 --replication-factor 1`
* Push the configs via `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i
$METRON_HOME/config/zookeeper -z node1:2181`
* Start via `$METRON_HOME/bin/start_parser_topology.sh -k node1:6667 -z
node1:2181 -s value`
* Start the enrichment topology via
`$METRON_HOME/bin/start_enrichment_topology.sh`
* Start the indexing topology via
`$METRON_HOME/bin/start_elasticsearch_topology.sh`
### Start the profiler
* Edit `$METRON_HOME/config/zookeeper/profiler.json` and paste in the
following:
```
{
"profiles": [
{
"profile": "stat",
"foreach": "'global'",
"onlyif": "source.type == 'value'",
"init" : {
},
"update": {
"s": "STATS_ADD(s, value)"
},
"result": "s"
}
]
}
```
* `$METRON_HOME/bin/start_profiler_topology.sh`
### Test Case
* Set up a profile to accept some synthetic data with a numeric `value`
field and persist a stats summary of the data
* Send some synthetic data directly to the profiler:
`python ~/rand_gen.py 0 1 1 |
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list
node1:6667 --topic value`
* Wait for at least 15 minutes and execute the following via the Stellar
REPL:
```
# Grab the profiles from 1 minute ago to 8 minutes ago
LENGTH(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 5 minutes ago')))
# Looks like 4 or 5 were returned, great
```
For me, the following was the result:
```
Stellar, Go!
Please note that functions are loading lazily in the background and will
be unavailable until loaded fully.
{es.clustername=metron, es.ip=node1:9300, es.date.format=yyyy.MM.dd.HH,
parser.error.topic=indexing, profiler.client.period.duration=1,
profiler.client.period.duration.units=MINUTES}
[Stellar]>>> # Grab the profiles from 1 minute ago to 8 minutes ago
[Stellar]>>> LENGTH(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 5
minutes ago')))
Functions loaded, you may refer to functions now...
4
[Stellar]>>> # Looks like 4 or 5 were returned, great
```
* Delete any value index that currently exists (if any do) via `curl
-XDELETE "http://localhost:9200/value*"`
* Wait for a couple of seconds and run the following:
```
curl -XPOST 'http://localhost:9200/value*/_search?pretty' -d '
{
"_source" : [ "num_profiles_parser", "num_profiles_enrichment",
"mean_enrichment", "mean_parser"]
}
'
```
* You should see values in the index with non-zero fields:
* `num_profiles_enrichment` should be 5
* `num_profiles_parser` should be 5
* `mean_enrichment` should be a non-zero double
* `mean_parser` should be a non-zero double
For reference, a sample message for me is:
```
{
"num_profiles_parser" : 5,
"mean_enrichment" : 0.004850856309056547,
"num_profiles_enrichment" : 5,
"mean_parser" : 0.004850856309056547
}
```
## Test Enrichment Loading
* Download the Alexa top 1m data set
```
wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
unzip top-1m.csv.zip
```
* Stage import file
```
head -n 10000 top-1m.csv > top-10k.csv
head -n 10 top-1m.csv > top-10.csv
hadoop fs -put top-10k.csv /tmp
```
* Create an extractor.json for the CSV data by editing `extractor.json` and
pasting in these contents:
```
{
"config" : {
"zk_quorum" : "node1:2181",
"columns" : {
"rank" : 0,
"domain" : 1
},
"value_transform" : {
"domain" : "DOMAIN_REMOVE_TLD(domain)",
"port" : "es.port"
},
"value_filter" : "LENGTH(domain) > 0",
"indicator_column" : "domain",
"indicator_transform" : {
"indicator" : "DOMAIN_REMOVE_TLD(indicator)"
},
"indicator_filter" : "LENGTH(indicator) > 0",
"type" : "top_domains",
"separator" : ","
},
"extractor" : "CSV"
}
```
### Test Flat File
* You should see 9275 records in HBase. (Less than the perhaps expected 10k)
`echo "truncate 'enrichment'" | hbase shell &&
$METRON_HOME/bin/flatfile_loader.sh -i ./top-10k.csv -t enrichment -c t -e
./extractor.json -p 5 -b 128 && echo "count 'enrichment'" | hbase shell`
### Test MR Job
* You should see 9275 records in HBase. (Less than the perhaps expected 10k)
`echo "truncate 'enrichment'" | hbase shell &&
$METRON_HOME/bin/flatfile_loader.sh -i /tmp/top-10k.csv -t enrichment -c t -e
./extractor.json -m MR && echo "count 'enrichment'" | hbase shell`
> Pass security.protocol and enable auto-renew for the storm topologies
> ---------------------------------------------------------------------
>
> Key: METRON-797
> URL: https://issues.apache.org/jira/browse/METRON-797
> Project: Metron
> Issue Type: Improvement
> Reporter: Casey Stella
>
> METRON-793 migrated the storm topologies to the storm-kafka-client spout,
> which supports kerberos. To complete the kerberos work on the existing
> topologies, we need to be able to enable the spouts and kafka writers to use
> security protocols other than PLAINTEXT. Also, enabling auto-renew plugins
> for storm will enable the topologies to run for extended durations in a
> kerberized cluster.
> NOTE: This does not encompass MPack changes to enable kerberos or fix the
> sensors to work with a kerberized kafka. That would be follow-on work.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)