[influxdb] Re: Understanding Morgoth

amith . hegde Tue, 08 Nov 2016 06:47:55 -0800

On Thursday, 27 October 2016 21:46:08 UTC+5:30, [email protected]  wrote:
> Clarification from Amith:
> 
> 
> 
> 
> 
> 
> Hi Nathaniel,
> 
> 
> Thanks a lot for your quick reply, what is confusing for me here is how 
> morgoth calculated anomalyScore field whose value has turned out to be 
> 0.9897172236503856. And how is this being used to detect anomaly.
> How does this particular node function
> 
> 
> 
> …
> 
>   @morgoth()
>      .field(field)
>      .scoreField(scoreField)
>      .minSupport(minSupport)
>      .errorTolerance(errorTolerance)
>      .consensus(consensus)
>      // Configure a single Sigma fingerprinter
> 
> 
> 
> 
>      .sigma(sigmas).
> 
> 
> You can choose some arbitrary data to help me understand this. :)
> Thanks,
> Amith
> 
> 
> My response:
> 
> 
> The `anomalyScore` is `1 - averageSupport`, where averageSupport is the 
> average of the support values returned from each or the fingerprinters. In 
> your case you only have one fingerprinter `sigma` so using the anomalyScore 
> of ~ `0.99` we can determine that the sigma fingerprinter returned a support 
> of ~ `0.01`. Support is defined as `count / total`, where count is the number 
> of times a specific event has been seen and total is the total number events 
> seen. The support can be interpreted as a frequency percentage, i.e. the most 
> recent window has only been seen 1% of the time. Since 0.01 is < 0.05 (the 
> min support defined) an anomaly was triggered. Taking this back to the 
> anomaly score it can be interpreted that 99% of the time we do not see an 
> event like this one.
> 
> 
> Remember that Morgoth distinguishs different windows as different events 
> using the fingerprinters. In your case the sigma function is computing the 
> std deviation and mean of the windows it receives. If a window arrives that 
> is more than 3 stddevs away from the mean than it is not considered the same 
> event and is a unique event.
> 
> 
> Taking all of that and putting it together receiving an anomaly score of 99% 
> out of Morgoth for your setup can be interpreted  as: You have sent several 
> 1m windows to Morgoth. The window that triggered the anomaly event is only 
> similar to ~1% of those windows, where similar is defined as being within 3 
> std deviations.
> 
> 
> 
> 
> On Thursday, October 27, 2016 at 9:30:13 AM UTC-6, [email protected] wrote:
> 
> 
> 
> In short there are two parts to Morgoth.
> 
> 
> 1. A system that counts the frequency of different kinds of events. This is 
> the lossy counting part
> 2. A system that determines if a window of data is the same as an existing 
> event being tracked or something new. This is the fingerprinting part.
> 
> 
> 
> Here is a quick read through for those concepts 
> http://docs.morgoth.io/docs/detection_framework/
> 
> 
> 
> Its a little hard to tell if Morgoth has done anything unexpected without 
> more detail. Can you share some of the data that lead to this alert, so I can 
> talk to the specifics of what is going on? Or maybe you could ask a more 
> specific question about which part is confusing?
> 
> 
> 
> 
> On Thursday, October 27, 2016 at 6:47:02 AM UTC-6, [email protected] 
> wrote:Hi All,
> I am trying to run morgoth as a child process to kapacitor, but I am failing 
> understand how morgoth functions. Below is the sample tick script I tried out 
> of the Morgoth docs. This is generating some alerts but I am unable to figure 
> out if they are suppose to get triggered way they have. Pasting a snippet out 
> of alert as well.
> I basically want to understand the functioning of Morgoth through this 
> example. 
> Alert
> ===================================================================
> {
> "id":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal,",
> "message":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal, is CRITICAL",
> "details":"",
> "time":"2016-10-27T11:33:00Z",
> "duration":21780000000000,
> "level":"CRITICAL",
> "data":{
> "series":[
> {
> "name":"cpu",
> "tags":{
> "cpu":"cpu-total",
> "host":"ip-10-121-48-24.ec2.internal"
> },
> "columns":[
> "time",
> "anomalyScore",
> "usage_guest",
> "usage_guest_nice",
> "usage_idle",
> "usage_iowait",
> "usage_irq",
> "usage_nice",
> "usage_softirq",
> "usage_steal",
> "usage_system",
> "usage_user"
> ],
> "values":[
> [
> "2016-10-27T11:33:00Z",
> 0.9897172236503856,
> 0,
> 0,
> 99.49748743708487,
> 0,
> 0,
> 0,
> 0,
> 0,
> 0.5025125628122904,
> 0
> ]
> ===================================================================
> // The measurement to analyze
> var measurement = 'cpu'
> // Optional group by dimensions
> var groups = [*]
> // Optional where filter
> var whereFilter = lambda: TRUE
> // The amount of data to window at once
> var window = 1m
> // The field to process
> var field = 'usage_idle'
> // The name for the anomaly score field
> var scoreField = 'anomalyScore'
> // The minimum support
> var minSupport = 0.05
> // The error tolerance
> var errorTolerance = 0.01
> // The consensus
> var consensus = 0.5
> // Number of sigmas allowed for normal window deviation
> var sigmas = 3.0
> stream
>   // Select the data we want
>   |from()
>       .measurement(measurement)
>       .groupBy(groups)
>       .where(whereFilter)
>   // Window the data for a certain amount of time
>   |window()
>      .period(window)
>      .every(window)
>      .align()
>   // Send each window to Morgoth
>   @morgoth()
>      .field(field)
>      .scoreField(scoreField)
>      .minSupport(minSupport)
>      .errorTolerance(errorTolerance)
>      .consensus(consensus)
>      // Configure a single Sigma fingerprinter
>      .sigma(sigmas)
>   // Morgoth returns any anomalous windows
>   |alert()
>      .details('')
>      .crit(lamda: TRUE)
>      .log('/tmp/cpu_alert.log')


Thanks a lot Nathaneil for your explanation on Morgoth, I have come back with a 
new example and its set of alerts. I will brief on what I am trying to achieve 
here. 

Below a set of data with count of errors(eventcount) that occurred for a 
particular errorcode out of IIS logs. I want to run Morgoth on field eventcount 
to detect if its an anomaly.

time    app     eventcount      status  tech
2016-11-07T11:31:28.261Z        "OTSI"  586     "Success"       "IIS"
2016-11-07T11:32:03.254Z        "OTSI"  1       "Failure"       "IIS"  
2016-11-07T11:33:03.243Z        "OTSI"  8       "Success"       "IIS"
2016-11-07T11:33:23.259Z        "ANALYTICS"     158     "Success"       "IIS"
2016-11-07T11:33:23.26Z "ANALYTICS"     24      "Failure"       "IIS"

My tickscript:

TICKscript:
// The measurement to analyze
var measurement = 'eventflow_IIS'

// The amount of data to window at once
var window = 1m

// The field to process
var field = 'eventcount'

// The name for the anomaly score field
var scoreField = 'anomalyScore'

// The minimum support
var minSupport = 0.05

// The error tolerance
var errorTolerance = 0.01

// The consensus
var consensus = 0.5

// Number of sigmas allowed for normal window deviation
var sigmas = 3.0

batch
    |query('''
        SELECT *
        FROM "statistics"."autogen"."eventflow_IIS"
    ''')
        .period(1m)
        .every(1m)
        .groupBy(*)
    // |.where(lambda: TRUE)
    @morgoth()
        .field(field)
        .scoreField(scoreField)
        .minSupport(minSupport)
        .errorTolerance(errorTolerance)
        .consensus(consensus)
        // Configure a single Sigma fingerprinter
        .sigma(sigmas)
    // Morgoth returns any anomalous windows
    |alert()
        .details('Count is anomalous')
        .id('kapacitor/{{ .TaskName }}/{{ .Name }}/{{ .Group }}')
        .message('{{ .ID }} is at level {{ .Level }} Errorcount is:{{ index 
.Fields "eventcount" }}')
        .crit(lambda: TRUE)
        .log('/tmp/morgothbb.log')
    |influxDBOut()
        .database('anomaly')
        .retentionPolicy('autogen')
        .flushInterval(1s)
        .measurement('Anomaly')
        // .tag('eventcount','field')
        // .tag('AnomalyScore','scoreField')
        // .tag('Time','time')
        // .tag('Status','status')
        .precision('u')

Below is the alert what it has generated pumped into a table.

time                                                                    
anomalyScore            app     eventcount      status          tech
2016-11-08T09:34:40.169285533Z                  0.95                            
"OTSI"  296             "Success"       "IIS"
2016-11-08T09:35:40.171285533Z                  0.9523809523809523      "OTSI"  
28              "Success"       "IIS"
2016-11-08T09:36:40.170285533Z                  0.9545454545454546      "OTSI"  
12              "Success"       "IIS"
2016-11-08T09:37:40.169285533Z                  0.9565217391304348      "OTSI"  
20              "Success"       "IIS"
2016-11-08T09:38:40.170285533Z                  0.9583333333333334      "OTSI"  
249             "Success"       "IIS"
2016-11-08T09:39:40.167285533Z                  0.96                            
"OTSI"  70              "Success"       "IIS"
2016-11-08T09:43:00.167285533Z                  0.9615384615384616      
"ANALYTICS"     1       "Success"       "IIS"
2016-11-08T09:43:40.164285533Z                  0.962962962962963       "OTSI"  
24              "Success"       "IIS"
2016-11-08T09:52:00.160285533Z                  0.9642857142857143      
"ANALYTICS"     1       "Success"       "IIS"

My question is:

How to interpret the anomaly score generated here ~0.95 with the counts for 
which Morgoth has triggered an Anomaly.Going by our earliar discussion Support 
here turns out to be ~0.05 (1- Anomaly Score). And anomaly gets triggered when 
(support < Min Support), so in this case it turns out 0.05 < 0.05 which should 
not be true. But still anomaly is getting triggered almost every minute. Could 
you please help me understand this. 

Also let me know if e,M,N need to be tweaked here for this particular data 
sample to generate meaningful alert out of it.

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/606ed4f1-3c8b-45fe-b895-c072d31fd43b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[influxdb] Re: Understanding Morgoth

Reply via email to