[
https://issues.apache.org/jira/browse/METRON-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872220#comment-15872220
]
ASF GitHub Bot commented on METRON-690:
---------------------------------------
Github user cestella commented on the issue:
https://github.com/apache/incubator-metron/pull/450
Testing Instructions beyond the normal smoke test (i.e. letting data
flow through to the indices and checking them).
## Free Up Space on the virtual machine
First, let's free up some headroom on the virtual machine. If you are
running this on a
multinode cluster, you would not have to do this.
* Kill monit via `service monit stop`
* Kill tcpreplay via `for i in $(ps -ef | grep tcpreplay | awk '{print
$2}');do kill -9 $i;done`
* Kill existing parser topologies via
* `storm kill snort`
* `storm kill bro`
* Kill yaf via `for i in $(ps -ef | grep yaf | awk '{print $2}');do kill -9
$i;done`
* Kill bro via `for i in $(ps -ef | grep bro | awk '{print $2}');do kill -9
$i;done`
## Preliminaries
* Set an environment variable to indicate `METRON_HOME`:
`export METRON_HOME=/usr/metron/0.3.1`
* Create the profiler hbase table
`echo "create 'profiler', 'P'" | hbase shell`
* Open `~/rand_gen.py` and paste the following:
```
#!/usr/bin/python
import random
import sys
import time
def main():
mu = float(sys.argv[1])
sigma = float(sys.argv[2])
freq_s = int(sys.argv[3])
while True:
out = '{ "value" : ' + str(random.gauss(mu, sigma)) + ' }'
print out
sys.stdout.flush()
time.sleep(freq_s)
if __name__ == '__main__':
main()
```
This will generate random JSON maps with a numeric field called `value`
* Set the profiler to use 1 minute tick durations:
* Edit `$METRON_HOME/config/profiler.properties` to adjust the capture
duration by changing `profiler.period.duration=15` to
`profiler.period.duration=1`
* Edit `$METRON_HOME/config/zookeeper/global.json` and add the following
properties:
```
"profiler.client.period.duration" : "1",
"profiler.client.period.duration.units" : "MINUTES"
```
## Deploy the custom parser
* Edit the value parser config at
`$METRON_HOME/config/zookeeper/parsers/value.json`:
```
{
"parserClassName":"org.apache.metron.parsers.json.JSONMapParser",
"sensorTopic":"value",
"fieldTransformations" : [
{
"transformation" : "STELLAR"
,"output" : [ "num_profiles_parser", "mean_parser" ]
,"config" : {
"num_profiles_parser" : "LENGTH(PROFILE_GET('stat', 'global',
PROFILE_WINDOW('5 minute window every 10 minutes starting from 2 minutes ago
until 32 minutes ago excluding holidays:us')))",
"mean_parser" : "STATS_MEAN(STATS_MERGE(PROFILE_GET('stat', 'global',
PROFILE_WINDOW('5 minute window every 10 minutes starting from 2 minutes ago
until 32 minutes ago excluding holidays:us'))))"
}
}
]
}
```
* Edit the value enrichment config at
`$METRON_HOME/config/zookeeper/enrichments/value.json`:
```
{
"enrichment" : {
"fieldMap": {
"stellar" : {
"config" : {
"num_profiles_enrichment" : "LENGTH(PROFILE_GET('stat', 'global',
PROFILE_WINDOW('5 minute window every 10 minutes starting from 2 minutes ago
until 32 minutes ago excluding holidays:us')))",
"mean_enrichment" : "STATS_MEAN(STATS_MERGE(PROFILE_GET('stat',
'global', PROFILE_WINDOW('5 minute window every 10 minutes starting from 2
minutes ago until 32 minutes ago excluding holidays:us'))))"
}
}
}
}
}
```
* Create the value kafka topic:
`/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper node1:2181
--create --topic value --partitions 1 --replication-factor 1`
* Push the configs via `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i
$METRON_HOME/config/zookeeper -z node1:2181`
* Start via `$METRON_HOME/bin/start_parser_topology.sh -k node1:6667 -z
node1:2181 -s value`
## Start the profiler
* Edit `$METRON_HOME/config/zookeeper/profiler.json` and paste in the
following:
```
{
"profiles": [
{
"profile": "stat",
"foreach": "'global'",
"onlyif": "true",
"init" : {
},
"update": {
"s": "STATS_ADD(s, value)"
},
"result": "s"
}
]
}
```
* `$METRON_HOME/bin/start_profiler_topology.sh`
## Test Case
* Set up a profile to accept some synthetic data with a numeric `value`
field and persist a stats summary of the data
* Send some synthetic data directly to the profiler:
`python ~/rand_gen.py 0 1 1 |
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list
node1:6667 --topic value`
* Wait for at least 32 minutes and execute the following via the Stellar
REPL:
```
# Grab the profiles from 1 minute ago to 8 minutes ago
LENGTH(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 1 minute ago to 8
minutes ago')))
# Looks like 7 were returned, great. Now try something more complex
# Grab the profiles in 5 minute windows every 10 minutes from 2 minutes ago
to 32 minutes ago:
# 32 minutes ago til 27 minutes ago should be 5 profiles
# 22 minutes ago til 17 minutes ago should be 5 profiles
# 12 minutes ago til 7 minutes ago should be 5 profiles
# for a total of 15 profiles
LENGTH(PROFILE_GET('stat', 'global', PROFILE_WINDOW('5 minute window every
10 minutes starting from 2 minutes ago until 32 minutes ago excluding
holidays:us')))
```
For me, the following was the result:
```
Stellar, Go!
Please note that functions are loading lazily in the background and will be
unavailable until loaded fully.
{es.clustername=metron, es.ip=node1, es.port=9300,
es.date.format=yyyy.MM.dd.HH, profiler.client.period.duration=1,
profiler.client.period.duration.units=MINUTES}
[Stellar]>>> # Grab the profiles from 1 minute ago to 8 minutes ago
[Stellar]>>> LENGTH(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 1
minute ago to 8 minutes ago')))
Functions loaded, you may refer to functions now...
7
[Stellar]>>> # Looks like 7 were returned, great.
[Stellar]>>> # Grab the profiles from 2 minutes ago to 32 minutes ago
[Stellar]>>> LENGTH(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 2
minutes ago to 32 minutes ago')))
30
[Stellar]>>> # Looks like 30 were returned, great.
[Stellar]>>> # Now try something more complex
[Stellar]>>> # Grab the profiles in 5 minute windows every 10 minutes from
2 minutes ago to 32 minutes ago:
[Stellar]>>> # 32 minutes ago til 27 minutes ago should be 5 profiles
[Stellar]>>> # 22 minutes ago til 17 minutes ago should be 5 profiles
[Stellar]>>> # 12 minutes ago til 7 minutes ago should be 5 profiles
[Stellar]>>> # for a total of 15 profiles
[Stellar]>>> LENGTH(PROFILE_GET('stat', 'global', PROFILE_WINDOW('5 minute
window every 10 minutes starting from 2 minutes ago until 32 minutes ago
excluding holidays:us')))
15
[Stellar]>>> STATS_MEAN(STATS_MERGE(PROFILE_GET('stat', 'global',
PROFILE_WINDOW('5 minute window every 10 minutes starting from 2 minutes ago
until 32 minutes ago excluding holidays:us'))))
0.028019658637877063
[Stellar]>>>
```
* Delete any value index that currently exists (if any do) via `curl
-XDELETE "http://localhost:9200/value*"`
* Wait for a couple of seconds and run
* `curl "http://localhost:9200/value*/_search?pretty=true&q=*:*" 2>
/dev/null`
* You should see values in the index with non-zero fields:
* `num_profiles_enrichment` should be 15
* `num_profiles_parser` should be 15
* `mean_enrichment` should be a non-zero double
* `mean_parser` should be a non-zero double
For reference, a sample message for me is:
```
"_index" : "value_index_2017.02.17.18",
"_type" : "value_doc",
"_id" : "AVpNPI8JQV00TRR_I4zn",
"_score" : 1.0,
"_source" : {
"adapter:stellaradapter:end:ts" : "1487354498620",
"threatinteljoinbolt:joiner:ts" : "1487354498628",
"enrichmentsplitterbolt:splitter:end:ts" : "1487354498576",
"num_profiles_parser" : 15,
"enrichmentsplitterbolt:splitter:begin:ts" : "1487354498571",
"enrichmentjoinbolt:joiner:ts" : "1487354498622",
"mean_enrichment" : 0.025770908095283665,
"adapter:stellaradapter:begin:ts" : "1487354498578",
"source:type" : "value",
"original_string" : "{ \"value\" : -0.274471660322 }",
"threatintelsplitterbolt:splitter:begin:ts" : "1487354498625",
"num_profiles_enrichment" : 15,
"threatintelsplitterbolt:splitter:end:ts" : "1487354498625",
"value" : -0.274471660322,
"mean_parser" : 0.025770908095283665,
"timestamp" : 1487354498529
}
```
Here we've validated that the new window function can be called from the
relevant topologies as well as the REPL and give consistent results that make
sense.
> Create a DSL-based timestamp lookup for profiler to enable sparse windows
> -------------------------------------------------------------------------
>
> Key: METRON-690
> URL: https://issues.apache.org/jira/browse/METRON-690
> Project: Metron
> Issue Type: New Feature
> Reporter: Casey Stella
>
> I propose that we support the following features:
> * A starting point that is not current time
> * Sparse bins (i.e. the last hour for every tuesday for the last month)
> * The ability to skip events (e.g. weekends, holidays)
> This would result in a new function with the following arguments:
> from - The lookback starting point (default to now)
> fromUnits - The units for the lookback starting point
> to - The ending point for the lookback window (default to from + binSize)
> toUnits - The units for the lookback ending point
> including - A list of conditions which we would skip.
> weekend
> holiday
> sunday through saturday
> excluding - A list of conditions which we would skip.
> weekend
> holiday
> sunday through saturday
> binSize - The size of the lookback bin
> binUnits - The units of the lookback bin
> Given the number of arguments and their complexity and the fact that many,
> many are optional,
> PROFILE_LOOKBACK accept a string backed by a DSL to express these criteria
> Base Case: A lookback of 1 hour ago
> PROFILE_LOOKBACK( '1 hour bins from now')
> Example 1: The same time window every tuesday for the last month starting one
> hour ago
> Just to make this as clear as possible, if this is run at 3PM on Monday
> January 23rd, 2017, it would include the following bins:
> January 17th, 2PM - 3PM
> January 10th, 2PM - 3PM
> January 3rd, 2PM - 3PM
> December 27th, 2PM - 3PM
> PROFILE_LOOKBACK( '1 hour bins from 1 hour to 1 month including tuesdays')
> Example 2: The same time window every sunday for the last month starting one
> hour ago skipping holidays
> Just to make this as clear as possible, if this is run at 3PM on Monday
> January 22rd, 2017, it would include the following bins:
> January 16th, 2PM - 3PM
> January 9th, 2PM - 3PM
> January 2rd, 2PM - 3PM
> NOT December 25th
> PROFILE_LOOKBACK( '1 hour bins from 1 hour to 1 month including tuesdays
> excluding holidays')
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)