[influxdb] Re: Understanding Morgoth

nathaniel Wed, 09 Nov 2016 08:56:10 -0800

1. Yes, the srpl files are just a gzipped line protocol file. The brpl 
files are a zip of serveral files containing the json data for the 
recording.
2. In my previous post I explained how average support was computed, and 
linked to docs on the lossy counting algorithm which it is origin.


On Wednesday, November 9, 2016 at 9:38:56 AM UTC-7, amith hegde wrote:
>
> Thankyou for your advice, I am working on this piece. In the meanwhile I 
> have couple of questions if you can help me with. 
> 1. Is it possible to take a look at the recording to see what data it 
> holds? If yes how can we do that?
> 2. How is the Anomaly score determined.? What is the formula to calculate 
> anomalyScore? If it is (1-averagesupport ), even average support is not a 
> defined value. 
>
> Thanks,
> Amith
>
> On Nov 8, 2016 9:33 PM, <[email protected] <javascript:>> wrote:
>
>> The actual comparison is  <= which is why you received the alert. But if 
>> your tolerances are tight enough that <= matters over < then you are 
>> probably too tight on your tolerances.
>>
>> I would first recommend that you tweak the sigmas value, may increase it 
>> to 3.5 or 4. To iterate quickly on for these tests I recommend that you 
>> create a recording of the data set and then tweak value replay the 
>> recording check the results, and repeat until you have something you like. 
>> If you share your recording with me I would be willing to take a quick look 
>> as well. As it is its a little hard to give good advice based of a handful 
>> of data points.
>>
>> On Tuesday, November 8, 2016 at 7:47:39 AM UTC-7, [email protected] 
>> wrote:
>>>
>>> On Thursday, 27 October 2016 21:46:08 UTC+5:30, [email protected] 
>>>  wrote: 
>>> > Clarification from Amith: 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > Hi Nathaniel, 
>>> > 
>>> > 
>>> > Thanks a lot for your quick reply, what is confusing for me here is 
>>> how morgoth calculated anomalyScore field whose value has turned out to be 
>>> 0.9897172236503856. And how is this being used to detect anomaly. 
>>> > How does this particular node function 
>>> > 
>>> > 
>>> > 
>>> > … 
>>> > 
>>> >   @morgoth() 
>>> >      .field(field) 
>>> >      .scoreField(scoreField) 
>>> >      .minSupport(minSupport) 
>>> >      .errorTolerance(errorTolerance) 
>>> >      .consensus(consensus) 
>>> >      // Configure a single Sigma fingerprinter 
>>> > 
>>> > 
>>> > 
>>> > 
>>> >      .sigma(sigmas). 
>>> > 
>>> > 
>>> > You can choose some arbitrary data to help me understand this. :) 
>>> > Thanks, 
>>> > Amith 
>>> > 
>>> > 
>>> > My response: 
>>> > 
>>> > 
>>> > The `anomalyScore` is `1 - averageSupport`, where averageSupport is 
>>> the average of the support values returned from each or the fingerprinters. 
>>> In your case you only have one fingerprinter `sigma` so using the 
>>> anomalyScore of ~ `0.99` we can determine that the sigma fingerprinter 
>>> returned a support of ~ `0.01`. Support is defined as `count / total`, 
>>> where count is the number of times a specific event has been seen and total 
>>> is the total number events seen. The support can be interpreted as a 
>>> frequency percentage, i.e. the most recent window has only been seen 1% of 
>>> the time. Since 0.01 is < 0.05 (the min support defined) an anomaly was 
>>> triggered. Taking this back to the anomaly score it can be interpreted that 
>>> 99% of the time we do not see an event like this one. 
>>> > 
>>> > 
>>> > Remember that Morgoth distinguishs different windows as different 
>>> events using the fingerprinters. In your case the sigma function is 
>>> computing the std deviation and mean of the windows it receives. If a 
>>> window arrives that is more than 3 stddevs away from the mean than it is 
>>> not considered the same event and is a unique event. 
>>> > 
>>> > 
>>> > Taking all of that and putting it together receiving an anomaly score 
>>> of 99% out of Morgoth for your setup can be interpreted  as: You have sent 
>>> several 1m windows to Morgoth. The window that triggered the anomaly event 
>>> is only similar to ~1% of those windows, where similar is defined as being 
>>> within 3 std deviations. 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > On Thursday, October 27, 2016 at 9:30:13 AM UTC-6, 
>>> [email protected] wrote: 
>>> > 
>>> > 
>>> > 
>>> > In short there are two parts to Morgoth. 
>>> > 
>>> > 
>>> > 1. A system that counts the frequency of different kinds of events. 
>>> This is the lossy counting part 
>>> > 2. A system that determines if a window of data is the same as an 
>>> existing event being tracked or something new. This is the fingerprinting 
>>> part. 
>>> > 
>>> > 
>>> > 
>>> > Here is a quick read through for those concepts 
>>> http://docs.morgoth.io/docs/detection_framework/ 
>>> > 
>>> > 
>>> > 
>>> > Its a little hard to tell if Morgoth has done anything unexpected 
>>> without more detail. Can you share some of the data that lead to this 
>>> alert, so I can talk to the specifics of what is going on? Or maybe you 
>>> could ask a more specific question about which part is confusing? 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > On Thursday, October 27, 2016 at 6:47:02 AM UTC-6, [email protected] 
>>> wrote:Hi All, 
>>> > I am trying to run morgoth as a child process to kapacitor, but I am 
>>> failing understand how morgoth functions. Below is the sample tick script I 
>>> tried out of the Morgoth docs. This is generating some alerts but I am 
>>> unable to figure out if they are suppose to get triggered way they have. 
>>> Pasting a snippet out of alert as well. 
>>> > I basically want to understand the functioning of Morgoth through this 
>>> example. 
>>> > Alert 
>>> > =================================================================== 
>>> > { 
>>> > "id":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal,", 
>>> > "message":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal, is 
>>> CRITICAL", 
>>> > "details":"", 
>>> > "time":"2016-10-27T11:33:00Z", 
>>> > "duration":21780000000000, 
>>> > "level":"CRITICAL", 
>>> > "data":{ 
>>> > "series":[ 
>>> > { 
>>> > "name":"cpu", 
>>> > "tags":{ 
>>> > "cpu":"cpu-total", 
>>> > "host":"ip-10-121-48-24.ec2.internal" 
>>> > }, 
>>> > "columns":[ 
>>> > "time", 
>>> > "anomalyScore", 
>>> > "usage_guest", 
>>> > "usage_guest_nice", 
>>> > "usage_idle", 
>>> > "usage_iowait", 
>>> > "usage_irq", 
>>> > "usage_nice", 
>>> > "usage_softirq", 
>>> > "usage_steal", 
>>> > "usage_system", 
>>> > "usage_user" 
>>> > ], 
>>> > "values":[ 
>>> > [ 
>>> > "2016-10-27T11:33:00Z", 
>>> > 0.9897172236503856, 
>>> > 0, 
>>> > 0, 
>>> > 99.49748743708487, 
>>> > 0, 
>>> > 0, 
>>> > 0, 
>>> > 0, 
>>> > 0, 
>>> > 0.5025125628122904, 
>>> > 0 
>>> > ] 
>>> > =================================================================== 
>>> > // The measurement to analyze 
>>> > var measurement = 'cpu' 
>>> > // Optional group by dimensions 
>>> > var groups = [*] 
>>> > // Optional where filter 
>>> > var whereFilter = lambda: TRUE 
>>> > // The amount of data to window at once 
>>> > var window = 1m 
>>> > // The field to process 
>>> > var field = 'usage_idle' 
>>> > // The name for the anomaly score field 
>>> > var scoreField = 'anomalyScore' 
>>> > // The minimum support 
>>> > var minSupport = 0.05 
>>> > // The error tolerance 
>>> > var errorTolerance = 0.01 
>>> > // The consensus 
>>> > var consensus = 0.5 
>>> > // Number of sigmas allowed for normal window deviation 
>>> > var sigmas = 3.0 
>>> > stream 
>>> >   // Select the data we want 
>>> >   |from() 
>>> >       .measurement(measurement) 
>>> >       .groupBy(groups) 
>>> >       .where(whereFilter) 
>>> >   // Window the data for a certain amount of time 
>>> >   |window() 
>>> >      .period(window) 
>>> >      .every(window) 
>>> >      .align() 
>>> >   // Send each window to Morgoth 
>>> >   @morgoth() 
>>> >      .field(field) 
>>> >      .scoreField(scoreField) 
>>> >      .minSupport(minSupport) 
>>> >      .errorTolerance(errorTolerance) 
>>> >      .consensus(consensus) 
>>> >      // Configure a single Sigma fingerprinter 
>>> >      .sigma(sigmas) 
>>> >   // Morgoth returns any anomalous windows 
>>> >   |alert() 
>>> >      .details('') 
>>> >      .crit(lamda: TRUE) 
>>> >      .log('/tmp/cpu_alert.log') 
>>>
>>> Thanks a lot Nathaneil for your explanation on Morgoth, I have come back 
>>> with a new example and its set of alerts. I will brief on what I am trying 
>>> to achieve here. 
>>>
>>> Below a set of data with count of errors(eventcount) that occurred for a 
>>> particular errorcode out of IIS logs. I want to run Morgoth on field 
>>> eventcount to detect if its an anomaly. 
>>>
>>> time        app        eventcount        status        tech 
>>> 2016-11-07T11:31:28.261Z        "OTSI"        586        "Success"        
>>> "IIS" 
>>>
>>> 2016-11-07T11:32:03.254Z        "OTSI"        1        "Failure"        
>>> "IIS" 
>>>   
>>> 2016-11-07T11:33:03.243Z        "OTSI"        8        "Success"        
>>> "IIS" 
>>>
>>> 2016-11-07T11:33:23.259Z        "ANALYTICS"        158        "Success"     
>>>    "IIS" 
>>>
>>> 2016-11-07T11:33:23.26Z        "ANALYTICS"        24        "Failure"       
>>>  "IIS" 
>>>
>>>
>>> My tickscript: 
>>>
>>> TICKscript: 
>>> // The measurement to analyze 
>>> var measurement = 'eventflow_IIS' 
>>>
>>> // The amount of data to window at once 
>>> var window = 1m 
>>>
>>> // The field to process 
>>> var field = 'eventcount' 
>>>
>>> // The name for the anomaly score field 
>>> var scoreField = 'anomalyScore' 
>>>
>>> // The minimum support 
>>> var minSupport = 0.05 
>>>
>>> // The error tolerance 
>>> var errorTolerance = 0.01 
>>>
>>> // The consensus 
>>> var consensus = 0.5 
>>>
>>> // Number of sigmas allowed for normal window deviation 
>>> var sigmas = 3.0 
>>>
>>> batch 
>>>     |query(''' 
>>>         SELECT * 
>>>         FROM "statistics"."autogen"."eventflow_IIS" 
>>>     ''') 
>>>         .period(1m) 
>>>         .every(1m) 
>>>         .groupBy(*) 
>>>     // |.where(lambda: TRUE) 
>>>     @morgoth() 
>>>         .field(field) 
>>>         .scoreField(scoreField) 
>>>         .minSupport(minSupport) 
>>>         .errorTolerance(errorTolerance) 
>>>         .consensus(consensus) 
>>>         // Configure a single Sigma fingerprinter 
>>>         .sigma(sigmas) 
>>>     // Morgoth returns any anomalous windows 
>>>     |alert() 
>>>         .details('Count is anomalous') 
>>>         .id('kapacitor/{{ .TaskName }}/{{ .Name }}/{{ .Group }}') 
>>>         .message('{{ .ID }} is at level {{ .Level }} Errorcount is:{{ 
>>> index .Fields "eventcount" }}') 
>>>         .crit(lambda: TRUE) 
>>>         .log('/tmp/morgothbb.log') 
>>>     |influxDBOut() 
>>>         .database('anomaly') 
>>>         .retentionPolicy('autogen') 
>>>         .flushInterval(1s) 
>>>         .measurement('Anomaly') 
>>>         // .tag('eventcount','field') 
>>>         // .tag('AnomalyScore','scoreField') 
>>>         // .tag('Time','time') 
>>>         // .tag('Status','status') 
>>>         .precision('u') 
>>>
>>> Below is the alert what it has generated pumped into a table. 
>>>
>>> time                                                                        
>>> anomalyScore                app        eventcount        status             
>>>    tech 
>>>
>>> 2016-11-08T09:34:40.169285533Z                        0.95                  
>>>               "OTSI"        296                "Success"        "IIS" 
>>>
>>> 2016-11-08T09:35:40.171285533Z                        0.9523809523809523    
>>>     "OTSI"        28                "Success"        "IIS" 
>>>
>>> 2016-11-08T09:36:40.170285533Z                        0.9545454545454546    
>>>     "OTSI"        12                "Success"        "IIS" 
>>>
>>> 2016-11-08T09:37:40.169285533Z                        0.9565217391304348    
>>>     "OTSI"        20                "Success"        "IIS" 
>>>
>>> 2016-11-08T09:38:40.170285533Z                        0.9583333333333334    
>>>     "OTSI"        249                "Success"        "IIS" 
>>>
>>> 2016-11-08T09:39:40.167285533Z                        0.96                  
>>>               "OTSI"        70                "Success"        "IIS" 
>>>
>>> 2016-11-08T09:43:00.167285533Z                        0.9615384615384616    
>>>     "ANALYTICS"        1        "Success"        "IIS" 
>>>
>>> 2016-11-08T09:43:40.164285533Z                        0.962962962962963     
>>>    "OTSI"        24                "Success"        "IIS" 
>>>
>>> 2016-11-08T09:52:00.160285533Z                        0.9642857142857143    
>>>     "ANALYTICS"        1        "Success"        "IIS" 
>>>
>>>
>>> My question is: 
>>>
>>> How to interpret the anomaly score generated here ~0.95 with the counts 
>>> for which Morgoth has triggered an Anomaly.Going by our earliar discussion 
>>> Support here turns out to be ~0.05 (1- Anomaly Score). And anomaly gets 
>>> triggered when (support < Min Support), so in this case it turns out 0.05 < 
>>> 0.05 which should not be true. But still anomaly is getting triggered 
>>> almost every minute. Could you please help me understand this. 
>>>
>>> Also let me know if e,M,N need to be tweaked here for this particular 
>>> data sample to generate meaningful alert out of it.
>>
>>

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/3031d86e-a536-4e7c-a713-f2aa1d706743%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[influxdb] Re: Understanding Morgoth

Reply via email to