[
https://issues.apache.org/jira/browse/METRON-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15850429#comment-15850429
]
ASF GitHub Bot commented on METRON-684:
---------------------------------------
Github user cestella commented on the issue:
https://github.com/apache/incubator-metron/pull/435
Testing Instructions beyond the normal smoke test (i.e. letting data
flow through to the indices and checking them).
## Preliminaries
* Set an environment variable to indicate `METRON_HOME`:
`export METRON_HOME=/usr/metron/0.3.0`
* Create the profiler hbase table
`echo "create 'profiler', 'P'" | hbase shell`
* Open `~/rand_gen.py` and paste the following:
```
#!/usr/bin/python
import random
import sys
import time
def main():
mu = float(sys.argv[1])
sigma = float(sys.argv[2])
freq_s = int(sys.argv[3])
while True:
out = '{ "value" : ' + str(random.gauss(mu, sigma)) + ' }'
print out
sys.stdout.flush()
time.sleep(freq_s)
if __name__ == '__main__':
main()
```
This will generate random JSON maps with a numeric field called `value`
* Set the profiler to use 1 minute tick durations:
* Edit `$METRON_HOME/config/profiler.properties` to adjust the capture
duration by changing `profiler.period.duration=15` to
`profiler.period.duration=1`
* Edit `$METRON_HOME/config/zookeeper/global.json` and add the following
properties:
```
"profiler.client.period.duration" : "1",
"profiler.client.period.duration.units" : "MINUTES"
```
## Free Up Space on the virtual machine
First, let's free up some headroom on the virtual machine. If you are
running this on a
multinode cluster, you would not have to do this.
* Kill monit via `service monit stop`
* Kill tcpreplay via `for i in $(ps -ef | grep tcpreplay | awk '{print
$2}');do kill -9 $i;done`
* Kill existing parser topologies via
* `storm kill snort`
* `storm kill bro`
* We won't need the enrichment or indexing topologies for this test, so you
can kill them via:
* `storm kill enrichment`
* `storm kill indexing`
* Kill yaf via `for i in $(ps -ef | grep yaf | awk '{print $2}');do kill -9
$i;done`
* Kill bro via `for i in $(ps -ef | grep bro | awk '{print $2}');do kill -9
$i;done`
## Start the profiler
* `$METRON_HOME/bin/start_profiler_topology.sh`
## Test Case
* Set up a profile to accept some synthetic data with a numeric `value`
field and persist a stats summary of the data
* Edit `$METRON_HOME/config/zookeeper/profiler.json` and paste in the
following:
```
{
"profiles": [
{
"profile": "stat",
"foreach": "'global'",
"onlyif": "true",
"init" : {
},
"update": {
"s": "STATS_ADD(s, value)"
},
"result": "s"
}
]
}
```
* Send some synthetic data directly to the profiler:
`python ~/rand_gen.py 0 1 1 |
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list
node1:6667 --topic indexing`
* Wait for at least 10 minutes and execute the following via the Stellar
REPL:
```
# Grab the last 10 minutes worth of timestamps
PROFILE_FIXED( 10, 'MINUTES')
# Looks like 10 were returned, great. Now, validate that I get 10 profile
measurements back
PROFILE_GET('stat', 'global', PROFILE_FIXED( 10, 'MINUTES' ) )
# Ok, now look at the mean across the distribution
# STATS_MEAN( STATS_MERGE(PROFILE_GET('stat', 'global', PROFILE_FIXED( 10,
'MINUTES' ) )))
```
For me, the following was the result:
```
Stellar, Go!
Please note that functions are loading lazily in the background and will be
unavailable until loaded fully.
{es.clustername=metron, es.ip=node1, es.port=9300,
es.date.format=yyyy.MM.dd.HH, profiler.client.period.duration=1,
profiler.client.period.duration.units=MINUTES}
[Stellar]>>> # Grab the last 10 minutes worth of timestamps
[Stellar]>>> PROFILE_FIXED( 10, 'MINUTES')
Functions loaded, you may refer to functions now...
[24767772, 24767773, 24767774, 24767775, 24767776, 24767777, 24767778,
24767779, 24767780, 24767781, 24767782]
[Stellar]>>> # Looks like 10 were returned, great. Now, validate that I
get 10 profile measurements back
[Stellar]>>> PROFILE_GET('stat', 'global', PROFILE_FIXED( 10, 'MINUTES' ) )
[org.apache.metron.statistics.OnlineStatisticsProvider@44749031,
org.apache.metron.statistics.OnlineStatisticsProvider@d2a7fbb9,
org.apache.metron.statistics.OnlineStatisticsProvider@a217cfd7,
org.apache.metron.statistics.OnlineStatisticsProvider@c5e42aed,
org.apache.metron.statistics.OnlineStatisticsProvider@c4f4753d,
org.apache.metron.statistics.OnlineStatisticsProvider@87a1606a,
org.apache.metron.statistics.OnlineStatisticsProvider@e1b4c8dc,
org.apache.metron.statistics.OnlineStatisticsProvider@fdb7b8d8]
[Stellar]>>> # Ok, now look at the mean across the distribution
[Stellar]>>> STATS_MEAN( STATS_MERGE(PROFILE_GET('stat', 'global',
PROFILE_FIXED( 10, 'MINUTES' ) )))
-0.0077433441069769265
[Stellar]>>>
```
> Decouple Timestamp calculation from PROFILE_GET
> -----------------------------------------------
>
> Key: METRON-684
> URL: https://issues.apache.org/jira/browse/METRON-684
> Project: Metron
> Issue Type: Improvement
> Reporter: Casey Stella
> Assignee: Casey Stella
>
> Currently PROFILE_GET only supports a static lookback of a fixed duration.
> As we have more complicated, potentially sparse, lookbacks (e.g. the same
> time slice every tuesday for a month), it would be nice to decouple the
> construction of timestamps from PROFILE_GET into its own set of functions.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)