[influxdb] Re: Understanding Morgoth

nathaniel Thu, 10 Nov 2016 08:41:15 -0800

If the data isn't in /var/lib/kapacitor/replay then check the value of the 
`[replay] dir` config option for where the data is stored.


On Wednesday, November 9, 2016 at 11:59:57 PM UTC-7, [email protected] 
wrote:
>
> Hi Nathan, 
>
> I do not find these files under the dir /var/lib/kapacitor/replay. But 
> when I say 'kapacitor list recordings', I find the below. 
>
>
> ID                                      Type    Status    Size      Date 
> 2aa4cc3b-964d-4956-85ef-77f671fded6f    batch   finished  6.4 kB    09 Nov 
> 16 01:18 EST 
> 1562b674-cbff-497b-be34-781da1ae9d4f    batch   finished  5.9 kB    08 Nov 
> 16 23:41 EST 
> 823efccb-3241-40d0-8b19-6b55a3f147ee    batch   finished  5.8 kB    08 Nov 
> 16 22:50 EST 
> b07bac9f-0d6e-4324-9301-f7114834135e    batch   finished  1.8 kB    08 Nov 
> 16 07:56 EST 
> 0d3cb557-e993-4656-bf55-80b403ad7228    stream  finished  23 B      08 Nov 
> 16 07:42 EST 
> 6d8820d2-d674-448d-92de-cef0a2494267    batch   finished  271 B     08 Nov 
> 16 06:26 EST 
> 5f4988a0-58dd-4926-965c-dd98d7492b8f    batch   finished  622 B     08 Nov 
> 16 04:35 EST 
> 6e8eb49f-32fa-4aa5-ba75-736658dd326d    batch   finished  3.7 kB    08 Nov 
> 16 04:10 EST 
> 3139e958-54fb-441d-bc8b-11b1c8c5a6a3    batch   finished  121 B     08 Nov 
> 16 02:49 EST 
> 4cc14c6e-842d-41a3-be9b-0298f24cbc3 
>
> How can I access these files to read their contents? 
>
> Thanks, 
> Amith 
>
>
> On Wednesday, 9 November 2016 22:25:14 UTC+5:30, [email protected] 
>  wrote: 
> > 1. Yes, the srpl files are just a gzipped line protocol file. The brpl 
> files are a zip of serveral files containing the json data for the 
> recording. 
> > 2. In my previous post I explained how average support was computed, and 
> linked to docs on the lossy counting algorithm which it is origin. 
> > 
> > On Wednesday, November 9, 2016 at 9:38:56 AM UTC-7, amith hegde wrote: 
> > Thankyou for your advice, I am working on this piece. In the meanwhile I 
> have couple of questions if you can help me with. 
> > 
> > 1. Is it possible to take a look at the recording to see what data it 
> holds? If yes how can we do that? 
> > 
> > 2. How is the Anomaly score determined.? What is the formula to 
> calculate anomalyScore? If it is (1-averagesupport ), even average support 
> is not a defined value. 
> > 
> > Thanks, 
> > 
> > Amith 
> > 
> > 
> > 
> > On Nov 8, 2016 9:33 PM,  <[email protected]> wrote: 
> > 
> > The actual comparison is  <= which is why you received the alert. But if 
> your tolerances are tight enough that <= matters over < then you are 
> probably too tight on your tolerances. 
> > 
> > 
> > I would first recommend that you tweak the sigmas value, may increase it 
> to 3.5 or 4. To iterate quickly on for these tests I recommend that you 
> create a recording of the data set and then tweak value replay the 
> recording check the results, and repeat until you have something you like. 
> If you share your recording with me I would be willing to take a quick look 
> as well. As it is its a little hard to give good advice based of a handful 
> of data points. 
> > 
> > On Tuesday, November 8, 2016 at 7:47:39 AM UTC-7, [email protected] 
> wrote:On Thursday, 27 October 2016 21:46:08 UTC+5:30, [email protected] 
>  wrote: 
> > 
> > > Clarification from Amith: 
> > 
> > > 
> > 
> > > 
> > 
> > > 
> > 
> > > 
> > 
> > > 
> > 
> > > 
> > 
> > > Hi Nathaniel, 
> > 
> > > 
> > 
> > > 
> > 
> > > Thanks a lot for your quick reply, what is confusing for me here is 
> how morgoth calculated anomalyScore field whose value has turned out to be 
> 0.9897172236503856. And how is this being used to detect anomaly. 
> > 
> > > How does this particular node function 
> > 
> > > 
> > 
> > > 
> > 
> > > 
> > 
> > > … 
> > 
> > > 
> > 
> > >   @morgoth() 
> > 
> > >      .field(field) 
> > 
> > >      .scoreField(scoreField) 
> > 
> > >      .minSupport(minSupport) 
> > 
> > >      .errorTolerance(errorTolerance) 
> > 
> > >      .consensus(consensus) 
> > 
> > >      // Configure a single Sigma fingerprinter 
> > 
> > > 
> > 
> > > 
> > 
> > > 
> > 
> > > 
> > 
> > >      .sigma(sigmas). 
> > 
> > > 
> > 
> > > 
> > 
> > > You can choose some arbitrary data to help me understand this. :) 
> > 
> > > Thanks, 
> > 
> > > Amith 
> > 
> > > 
> > 
> > > 
> > 
> > > My response: 
> > 
> > > 
> > 
> > > 
> > 
> > > The `anomalyScore` is `1 - averageSupport`, where averageSupport is 
> the average of the support values returned from each or the fingerprinters. 
> In your case you only have one fingerprinter `sigma` so using the 
> anomalyScore of ~ `0.99` we can determine that the sigma fingerprinter 
> returned a support of ~ `0.01`. Support is defined as `count / total`, 
> where count is the number of times a specific event has been seen and total 
> is the total number events seen. The support can be interpreted as a 
> frequency percentage, i.e. the most recent window has only been seen 1% of 
> the time. Since 0.01 is < 0.05 (the min support defined) an anomaly was 
> triggered. Taking this back to the anomaly score it can be interpreted that 
> 99% of the time we do not see an event like this one. 
> > 
> > > 
> > 
> > > 
> > 
> > > Remember that Morgoth distinguishs different windows as different 
> events using the fingerprinters. In your case the sigma function is 
> computing the std deviation and mean of the windows it receives. If a 
> window arrives that is more than 3 stddevs away from the mean than it is 
> not considered the same event and is a unique event. 
> > 
> > > 
> > 
> > > 
> > 
> > > Taking all of that and putting it together receiving an anomaly score 
> of 99% out of Morgoth for your setup can be interpreted  as: You have sent 
> several 1m windows to Morgoth. The window that triggered the anomaly event 
> is only similar to ~1% of those windows, where similar is defined as being 
> within 3 std deviations. 
> > 
> > > 
> > 
> > > 
> > 
> > > 
> > 
> > > 
> > 
> > > On Thursday, October 27, 2016 at 9:30:13 AM UTC-6, 
> [email protected] wrote: 
> > 
> > > 
> > 
> > > 
> > 
> > > 
> > 
> > > In short there are two parts to Morgoth. 
> > 
> > > 
> > 
> > > 
> > 
> > > 1. A system that counts the frequency of different kinds of events. 
> This is the lossy counting part 
> > 
> > > 2. A system that determines if a window of data is the same as an 
> existing event being tracked or something new. This is the fingerprinting 
> part. 
> > 
> > > 
> > 
> > > 
> > 
> > > 
> > 
> > > Here is a quick read through for those concepts 
> http://docs.morgoth.io/docs/detection_framework/ 
> > 
> > > 
> > 
> > > 
> > 
> > > 
> > 
> > > Its a little hard to tell if Morgoth has done anything unexpected 
> without more detail. Can you share some of the data that lead to this 
> alert, so I can talk to the specifics of what is going on? Or maybe you 
> could ask a more specific question about which part is confusing? 
> > 
> > > 
> > 
> > > 
> > 
> > > 
> > 
> > > 
> > 
> > > On Thursday, October 27, 2016 at 6:47:02 AM UTC-6, [email protected] 
> wrote:Hi All, 
> > 
> > > I am trying to run morgoth as a child process to kapacitor, but I am 
> failing understand how morgoth functions. Below is the sample tick script I 
> tried out of the Morgoth docs. This is generating some alerts but I am 
> unable to figure out if they are suppose to get triggered way they have. 
> Pasting a snippet out of alert as well. 
> > 
> > > I basically want to understand the functioning of Morgoth through this 
> example. 
> > 
> > > Alert 
> > 
> > > =================================================================== 
> > 
> > > { 
> > 
> > > "id":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal,", 
> > 
> > > "message":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal, is 
> CRITICAL", 
> > 
> > > "details":"", 
> > 
> > > "time":"2016-10-27T11:33:00Z", 
> > 
> > > "duration":21780000000000, 
> > 
> > > "level":"CRITICAL", 
> > 
> > > "data":{ 
> > 
> > > "series":[ 
> > 
> > > { 
> > 
> > > "name":"cpu", 
> > 
> > > "tags":{ 
> > 
> > > "cpu":"cpu-total", 
> > 
> > > "host":"ip-10-121-48-24.ec2.internal" 
> > 
> > > }, 
> > 
> > > "columns":[ 
> > 
> > > "time", 
> > 
> > > "anomalyScore", 
> > 
> > > "usage_guest", 
> > 
> > > "usage_guest_nice", 
> > 
> > > "usage_idle", 
> > 
> > > "usage_iowait", 
> > 
> > > "usage_irq", 
> > 
> > > "usage_nice", 
> > 
> > > "usage_softirq", 
> > 
> > > "usage_steal", 
> > 
> > > "usage_system", 
> > 
> > > "usage_user" 
> > 
> > > ], 
> > 
> > > "values":[ 
> > 
> > > [ 
> > 
> > > "2016-10-27T11:33:00Z", 
> > 
> > > 0.9897172236503856, 
> > 
> > > 0, 
> > 
> > > 0, 
> > 
> > > 99.49748743708487, 
> > 
> > > 0, 
> > 
> > > 0, 
> > 
> > > 0, 
> > 
> > > 0, 
> > 
> > > 0, 
> > 
> > > 0.5025125628122904, 
> > 
> > > 0 
> > 
> > > ] 
> > 
> > > =================================================================== 
> > 
> > > // The measurement to analyze 
> > 
> > > var measurement = 'cpu' 
> > 
> > > // Optional group by dimensions 
> > 
> > > var groups = [*] 
> > 
> > > // Optional where filter 
> > 
> > > var whereFilter = lambda: TRUE 
> > 
> > > // The amount of data to window at once 
> > 
> > > var window = 1m 
> > 
> > > // The field to process 
> > 
> > > var field = 'usage_idle' 
> > 
> > > // The name for the anomaly score field 
> > 
> > > var scoreField = 'anomalyScore' 
> > 
> > > // The minimum support 
> > 
> > > var minSupport = 0.05 
> > 
> > > // The error tolerance 
> > 
> > > var errorTolerance = 0.01 
> > 
> > > // The consensus 
> > 
> > > var consensus = 0.5 
> > 
> > > // Number of sigmas allowed for normal window deviation 
> > 
> > > var sigmas = 3.0 
> > 
> > > stream 
> > 
> > >   // Select the data we want 
> > 
> > >   |from() 
> > 
> > >       .measurement(measurement) 
> > 
> > >       .groupBy(groups) 
> > 
> > >       .where(whereFilter) 
> > 
> > >   // Window the data for a certain amount of time 
> > 
> > >   |window() 
> > 
> > >      .period(window) 
> > 
> > >      .every(window) 
> > 
> > >      .align() 
> > 
> > >   // Send each window to Morgoth 
> > 
> > >   @morgoth() 
> > 
> > >      .field(field) 
> > 
> > >      .scoreField(scoreField) 
> > 
> > >      .minSupport(minSupport) 
> > 
> > >      .errorTolerance(errorTolerance) 
> > 
> > >      .consensus(consensus) 
> > 
> > >      // Configure a single Sigma fingerprinter 
> > 
> > >      .sigma(sigmas) 
> > 
> > >   // Morgoth returns any anomalous windows 
> > 
> > >   |alert() 
> > 
> > >      .details('') 
> > 
> > >      .crit(lamda: TRUE) 
> > 
> > >      .log('/tmp/cpu_alert.log') 
> > 
> > 
> > 
> > Thanks a lot Nathaneil for your explanation on Morgoth, I have come back 
> with a new example and its set of alerts. I will brief on what I am trying 
> to achieve here. 
> > 
> > 
> > 
> > Below a set of data with count of errors(eventcount) that occurred for a 
> particular errorcode out of IIS logs. I want to run Morgoth on field 
> eventcount to detect if its an anomaly. 
> > 
> > 
> > 
> > time        app        eventcount        status        tech 
> > 
> > 
> 2016-11-07T11:31:28.261Z        "OTSI"        586        "Success"        
> "IIS" 
>
> > 
> > 
> 2016-11-07T11:32:03.254Z        "OTSI"        1        "Failure"        "IIS" 
>   
> > 
> > 
> 2016-11-07T11:33:03.243Z        "OTSI"        8        "Success"        "IIS" 
>
> > 
> > 
> 2016-11-07T11:33:23.259Z        "ANALYTICS"        158        "Success"       
>  "IIS" 
>
> > 
> > 
> 2016-11-07T11:33:23.26Z        "ANALYTICS"        24        "Failure"        
> "IIS" 
>
> > 
> > 
> > 
> > My tickscript: 
> > 
> > 
> > 
> > TICKscript: 
> > 
> > // The measurement to analyze 
> > 
> > var measurement = 'eventflow_IIS' 
> > 
> > 
> > 
> > // The amount of data to window at once 
> > 
> > var window = 1m 
> > 
> > 
> > 
> > // The field to process 
> > 
> > var field = 'eventcount' 
> > 
> > 
> > 
> > // The name for the anomaly score field 
> > 
> > var scoreField = 'anomalyScore' 
> > 
> > 
> > 
> > // The minimum support 
> > 
> > var minSupport = 0.05 
> > 
> > 
> > 
> > // The error tolerance 
> > 
> > var errorTolerance = 0.01 
> > 
> > 
> > 
> > // The consensus 
> > 
> > var consensus = 0.5 
> > 
> > 
> > 
> > // Number of sigmas allowed for normal window deviation 
> > 
> > var sigmas = 3.0 
> > 
> > 
> > 
> > batch 
> > 
> >     |query(''' 
> > 
> >         SELECT * 
> > 
> >         FROM "statistics"."autogen"."eventflow_IIS" 
> > 
> >     ''') 
> > 
> >         .period(1m) 
> > 
> >         .every(1m) 
> > 
> >         .groupBy(*) 
> > 
> >     // |.where(lambda: TRUE) 
> > 
> >     @morgoth() 
> > 
> >         .field(field) 
> > 
> >         .scoreField(scoreField) 
> > 
> >         .minSupport(minSupport) 
> > 
> >         .errorTolerance(errorTolerance) 
> > 
> >         .consensus(consensus) 
> > 
> >         // Configure a single Sigma fingerprinter 
> > 
> >         .sigma(sigmas) 
> > 
> >     // Morgoth returns any anomalous windows 
> > 
> >     |alert() 
> > 
> >         .details('Count is anomalous') 
> > 
> >         .id('kapacitor/{{ .TaskName }}/{{ .Name }}/{{ .Group }}') 
> > 
> >         .message('{{ .ID }} is at level {{ .Level }} Errorcount is:{{ 
> index .Fields "eventcount" }}') 
> > 
> >         .crit(lambda: TRUE) 
> > 
> >         .log('/tmp/morgothbb.log') 
> > 
> >     |influxDBOut() 
> > 
> >         .database('anomaly') 
> > 
> >         .retentionPolicy('autogen') 
> > 
> >         .flushInterval(1s) 
> > 
> >         .measurement('Anomaly') 
> > 
> >         // .tag('eventcount','field') 
> > 
> >         // .tag('AnomalyScore','scoreField') 
> > 
> >         // .tag('Time','time') 
> > 
> >         // .tag('Status','status') 
> > 
> >         .precision('u') 
> > 
> > 
> > 
> > Below is the alert what it has generated pumped into a table. 
> > 
> > 
> > 
> > 
> time                                                                        
> anomalyScore                app        eventcount        status               
>  tech 
>
> > 
> > 
> 2016-11-08T09:34:40.169285533Z                        0.95                    
>             "OTSI"        296                "Success"        "IIS" 
>
> > 
> > 
> 2016-11-08T09:35:40.171285533Z                        0.9523809523809523      
>   "OTSI"        28                "Success"        "IIS" 
>
> > 
> > 
> 2016-11-08T09:36:40.170285533Z                        0.9545454545454546      
>   "OTSI"        12                "Success"        "IIS" 
>
> > 
> > 
> 2016-11-08T09:37:40.169285533Z                        0.9565217391304348      
>   "OTSI"        20                "Success"        "IIS" 
>
> > 
> > 
> 2016-11-08T09:38:40.170285533Z                        0.9583333333333334      
>   "OTSI"        249                "Success"        "IIS" 
>
> > 
> > 
> 2016-11-08T09:39:40.167285533Z                        0.96                    
>             "OTSI"        70                "Success"        "IIS" 
>
> > 
> > 
> 2016-11-08T09:43:00.167285533Z                        0.9615384615384616      
>   "ANALYTICS"        1        "Success"        "IIS" 
>
> > 
> > 
> 2016-11-08T09:43:40.164285533Z                        0.962962962962963       
>  "OTSI"        24                "Success"        "IIS" 
>
> > 
> > 
> 2016-11-08T09:52:00.160285533Z                        0.9642857142857143      
>   "ANALYTICS"        1        "Success"        "IIS" 
>
> > 
> > 
> > 
> > My question is: 
> > 
> > 
> > 
> > How to interpret the anomaly score generated here ~0.95 with the counts 
> for which Morgoth has triggered an Anomaly.Going by our earliar discussion 
> Support here turns out to be ~0.05 (1- Anomaly Score). And anomaly gets 
> triggered when (support < Min Support), so in this case it turns out 0.05 < 
> 0.05 which should not be true. But still anomaly is getting triggered 
> almost every minute. Could you please help me understand this. 
> > 
> > 
> > 
> > Also let me know if e,M,N need to be tweaked here for this particular 
> data sample to generate meaningful alert out of it. 
>
>

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/78be8c3f-b039-49da-8ac5-0878077af8b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[influxdb] Re: Understanding Morgoth

Reply via email to