On Thursday, 27 October 2016 21:46:08 UTC+5:30, [email protected] wrote:
> Clarification from Amith:
>
>
>
>
>
>
> Hi Nathaniel,
>
>
> Thanks a lot for your quick reply, what is confusing for me here is how
> morgoth calculated anomalyScore field whose value has turned out to be
> 0.9897172236503856. And how is this being used to detect anomaly.
> How does this particular node function
>
>
>
> …
>
> @morgoth()
> .field(field)
> .scoreField(scoreField)
> .minSupport(minSupport)
> .errorTolerance(errorTolerance)
> .consensus(consensus)
> // Configure a single Sigma fingerprinter
>
>
>
>
> .sigma(sigmas).
>
>
> You can choose some arbitrary data to help me understand this. :)
> Thanks,
> Amith
>
>
> My response:
>
>
> The `anomalyScore` is `1 - averageSupport`, where averageSupport is the
> average of the support values returned from each or the fingerprinters. In
> your case you only have one fingerprinter `sigma` so using the anomalyScore
> of ~ `0.99` we can determine that the sigma fingerprinter returned a support
> of ~ `0.01`. Support is defined as `count / total`, where count is the number
> of times a specific event has been seen and total is the total number events
> seen. The support can be interpreted as a frequency percentage, i.e. the most
> recent window has only been seen 1% of the time. Since 0.01 is < 0.05 (the
> min support defined) an anomaly was triggered. Taking this back to the
> anomaly score it can be interpreted that 99% of the time we do not see an
> event like this one.
>
>
> Remember that Morgoth distinguishs different windows as different events
> using the fingerprinters. In your case the sigma function is computing the
> std deviation and mean of the windows it receives. If a window arrives that
> is more than 3 stddevs away from the mean than it is not considered the same
> event and is a unique event.
>
>
> Taking all of that and putting it together receiving an anomaly score of 99%
> out of Morgoth for your setup can be interpreted as: You have sent several
> 1m windows to Morgoth. The window that triggered the anomaly event is only
> similar to ~1% of those windows, where similar is defined as being within 3
> std deviations.
>
>
>
>
> On Thursday, October 27, 2016 at 9:30:13 AM UTC-6, [email protected] wrote:
>
>
>
> In short there are two parts to Morgoth.
>
>
> 1. A system that counts the frequency of different kinds of events. This is
> the lossy counting part
> 2. A system that determines if a window of data is the same as an existing
> event being tracked or something new. This is the fingerprinting part.
>
>
>
> Here is a quick read through for those concepts
> http://docs.morgoth.io/docs/detection_framework/
>
>
>
> Its a little hard to tell if Morgoth has done anything unexpected without
> more detail. Can you share some of the data that lead to this alert, so I can
> talk to the specifics of what is going on? Or maybe you could ask a more
> specific question about which part is confusing?
>
>
>
>
> On Thursday, October 27, 2016 at 6:47:02 AM UTC-6, [email protected]
> wrote:Hi All,
> I am trying to run morgoth as a child process to kapacitor, but I am failing
> understand how morgoth functions. Below is the sample tick script I tried out
> of the Morgoth docs. This is generating some alerts but I am unable to figure
> out if they are suppose to get triggered way they have. Pasting a snippet out
> of alert as well.
> I basically want to understand the functioning of Morgoth through this
> example.
> Alert
> ===================================================================
> {
> "id":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal,",
> "message":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal, is CRITICAL",
> "details":"",
> "time":"2016-10-27T11:33:00Z",
> "duration":21780000000000,
> "level":"CRITICAL",
> "data":{
> "series":[
> {
> "name":"cpu",
> "tags":{
> "cpu":"cpu-total",
> "host":"ip-10-121-48-24.ec2.internal"
> },
> "columns":[
> "time",
> "anomalyScore",
> "usage_guest",
> "usage_guest_nice",
> "usage_idle",
> "usage_iowait",
> "usage_irq",
> "usage_nice",
> "usage_softirq",
> "usage_steal",
> "usage_system",
> "usage_user"
> ],
> "values":[
> [
> "2016-10-27T11:33:00Z",
> 0.9897172236503856,
> 0,
> 0,
> 99.49748743708487,
> 0,
> 0,
> 0,
> 0,
> 0,
> 0.5025125628122904,
> 0
> ]
> ===================================================================
> // The measurement to analyze
> var measurement = 'cpu'
> // Optional group by dimensions
> var groups = [*]
> // Optional where filter
> var whereFilter = lambda: TRUE
> // The amount of data to window at once
> var window = 1m
> // The field to process
> var field = 'usage_idle'
> // The name for the anomaly score field
> var scoreField = 'anomalyScore'
> // The minimum support
> var minSupport = 0.05
> // The error tolerance
> var errorTolerance = 0.01
> // The consensus
> var consensus = 0.5
> // Number of sigmas allowed for normal window deviation
> var sigmas = 3.0
> stream
> // Select the data we want
> |from()
> .measurement(measurement)
> .groupBy(groups)
> .where(whereFilter)
> // Window the data for a certain amount of time
> |window()
> .period(window)
> .every(window)
> .align()
> // Send each window to Morgoth
> @morgoth()
> .field(field)
> .scoreField(scoreField)
> .minSupport(minSupport)
> .errorTolerance(errorTolerance)
> .consensus(consensus)
> // Configure a single Sigma fingerprinter
> .sigma(sigmas)
> // Morgoth returns any anomalous windows
> |alert()
> .details('')
> .crit(lamda: TRUE)
> .log('/tmp/cpu_alert.log')
Thanks a lot Nathaneil for your explanation on Morgoth, I have come back with a
new example and its set of alerts. I will brief on what I am trying to achieve
here.
Below a set of data with count of errors(eventcount) that occurred for a
particular errorcode out of IIS logs. I want to run Morgoth on field eventcount
to detect if its an anomaly.
time app eventcount status tech
2016-11-07T11:31:28.261Z "OTSI" 586 "Success" "IIS"
2016-11-07T11:32:03.254Z "OTSI" 1 "Failure" "IIS"
2016-11-07T11:33:03.243Z "OTSI" 8 "Success" "IIS"
2016-11-07T11:33:23.259Z "ANALYTICS" 158 "Success" "IIS"
2016-11-07T11:33:23.26Z "ANALYTICS" 24 "Failure" "IIS"
My tickscript:
TICKscript:
// The measurement to analyze
var measurement = 'eventflow_IIS'
// The amount of data to window at once
var window = 1m
// The field to process
var field = 'eventcount'
// The name for the anomaly score field
var scoreField = 'anomalyScore'
// The minimum support
var minSupport = 0.05
// The error tolerance
var errorTolerance = 0.01
// The consensus
var consensus = 0.5
// Number of sigmas allowed for normal window deviation
var sigmas = 3.0
batch
|query('''
SELECT *
FROM "statistics"."autogen"."eventflow_IIS"
''')
.period(1m)
.every(1m)
.groupBy(*)
// |.where(lambda: TRUE)
@morgoth()
.field(field)
.scoreField(scoreField)
.minSupport(minSupport)
.errorTolerance(errorTolerance)
.consensus(consensus)
// Configure a single Sigma fingerprinter
.sigma(sigmas)
// Morgoth returns any anomalous windows
|alert()
.details('Count is anomalous')
.id('kapacitor/{{ .TaskName }}/{{ .Name }}/{{ .Group }}')
.message('{{ .ID }} is at level {{ .Level }} Errorcount is:{{ index
.Fields "eventcount" }}')
.crit(lambda: TRUE)
.log('/tmp/morgothbb.log')
|influxDBOut()
.database('anomaly')
.retentionPolicy('autogen')
.flushInterval(1s)
.measurement('Anomaly')
// .tag('eventcount','field')
// .tag('AnomalyScore','scoreField')
// .tag('Time','time')
// .tag('Status','status')
.precision('u')
Below is the alert what it has generated pumped into a table.
time
anomalyScore app eventcount status tech
2016-11-08T09:34:40.169285533Z 0.95
"OTSI" 296 "Success" "IIS"
2016-11-08T09:35:40.171285533Z 0.9523809523809523 "OTSI"
28 "Success" "IIS"
2016-11-08T09:36:40.170285533Z 0.9545454545454546 "OTSI"
12 "Success" "IIS"
2016-11-08T09:37:40.169285533Z 0.9565217391304348 "OTSI"
20 "Success" "IIS"
2016-11-08T09:38:40.170285533Z 0.9583333333333334 "OTSI"
249 "Success" "IIS"
2016-11-08T09:39:40.167285533Z 0.96
"OTSI" 70 "Success" "IIS"
2016-11-08T09:43:00.167285533Z 0.9615384615384616
"ANALYTICS" 1 "Success" "IIS"
2016-11-08T09:43:40.164285533Z 0.962962962962963 "OTSI"
24 "Success" "IIS"
2016-11-08T09:52:00.160285533Z 0.9642857142857143
"ANALYTICS" 1 "Success" "IIS"
My question is:
How to interpret the anomaly score generated here ~0.95 with the counts for
which Morgoth has triggered an Anomaly.Going by our earliar discussion Support
here turns out to be ~0.05 (1- Anomaly Score). And anomaly gets triggered when
(support < Min Support), so in this case it turns out 0.05 < 0.05 which should
not be true. But still anomaly is getting triggered almost every minute. Could
you please help me understand this.
Also let me know if e,M,N need to be tweaked here for this particular data
sample to generate meaningful alert out of it.
--
Remember to include the version number!
---
You received this message because you are subscribed to the Google Groups
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit
https://groups.google.com/d/msgid/influxdb/606ed4f1-3c8b-45fe-b895-c072d31fd43b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.