[influxdb] Re: Understanding Morgoth

amith . hegde Wed, 09 Nov 2016 23:00:07 -0800

Hi Nathan,

I do not find these files under the dir /var/lib/kapacitor/replay. But when I 
say 'kapacitor list recordings', I find the below.



ID                                      Type    Status    Size      Date
2aa4cc3b-964d-4956-85ef-77f671fded6f    batch   finished  6.4 kB    09 Nov 16 
01:18 EST
1562b674-cbff-497b-be34-781da1ae9d4f    batch   finished  5.9 kB    08 Nov 16 
23:41 EST
823efccb-3241-40d0-8b19-6b55a3f147ee    batch   finished  5.8 kB    08 Nov 16 
22:50 EST
b07bac9f-0d6e-4324-9301-f7114834135e    batch   finished  1.8 kB    08 Nov 16 
07:56 EST
0d3cb557-e993-4656-bf55-80b403ad7228    stream  finished  23 B      08 Nov 16 
07:42 EST
6d8820d2-d674-448d-92de-cef0a2494267    batch   finished  271 B     08 Nov 16 
06:26 EST
5f4988a0-58dd-4926-965c-dd98d7492b8f    batch   finished  622 B     08 Nov 16 
04:35 EST
6e8eb49f-32fa-4aa5-ba75-736658dd326d    batch   finished  3.7 kB    08 Nov 16 
04:10 EST
3139e958-54fb-441d-bc8b-11b1c8c5a6a3    batch   finished  121 B     08 Nov 16 
02:49 EST
4cc14c6e-842d-41a3-be9b-0298f24cbc3

How can I access these files to read their contents?

Thanks,
Amith


On Wednesday, 9 November 2016 22:25:14 UTC+5:30, [email protected]  wrote:
> 1. Yes, the srpl files are just a gzipped line protocol file. The brpl files 
> are a zip of serveral files containing the json data for the recording.
> 2. In my previous post I explained how average support was computed, and 
> linked to docs on the lossy counting algorithm which it is origin.
> 
> On Wednesday, November 9, 2016 at 9:38:56 AM UTC-7, amith hegde wrote:
> Thankyou for your advice, I am working on this piece. In the meanwhile I have 
> couple of questions if you can help me with. 
> 
> 1. Is it possible to take a look at the recording to see what data it holds? 
> If yes how can we do that?
> 
> 2. How is the Anomaly score determined.? What is the formula to calculate 
> anomalyScore? If it is (1-averagesupport ), even average support is not a 
> defined value. 
> 
> Thanks,
> 
> Amith
> 
> 
> 
> On Nov 8, 2016 9:33 PM,  <[email protected]> wrote:
> 
> The actual comparison is  <= which is why you received the alert. But if your 
> tolerances are tight enough that <= matters over < then you are probably too 
> tight on your tolerances.
> 
> 
> I would first recommend that you tweak the sigmas value, may increase it to 
> 3.5 or 4. To iterate quickly on for these tests I recommend that you create a 
> recording of the data set and then tweak value replay the recording check the 
> results, and repeat until you have something you like. If you share your 
> recording with me I would be willing to take a quick look as well. As it is 
> its a little hard to give good advice based of a handful of data points.
> 
> On Tuesday, November 8, 2016 at 7:47:39 AM UTC-7, [email protected] wrote:On 
> Thursday, 27 October 2016 21:46:08 UTC+5:30, [email protected]  wrote:
> 
> > Clarification from Amith:
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > Hi Nathaniel,
> 
> > 
> 
> > 
> 
> > Thanks a lot for your quick reply, what is confusing for me here is how 
> > morgoth calculated anomalyScore field whose value has turned out to be 
> > 0.9897172236503856. And how is this being used to detect anomaly.
> 
> > How does this particular node function
> 
> > 
> 
> > 
> 
> > 
> 
> > …
> 
> > 
> 
> >   @morgoth()
> 
> >      .field(field)
> 
> >      .scoreField(scoreField)
> 
> >      .minSupport(minSupport)
> 
> >      .errorTolerance(errorTolerance)
> 
> >      .consensus(consensus)
> 
> >      // Configure a single Sigma fingerprinter
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> >      .sigma(sigmas).
> 
> > 
> 
> > 
> 
> > You can choose some arbitrary data to help me understand this. :)
> 
> > Thanks,
> 
> > Amith
> 
> > 
> 
> > 
> 
> > My response:
> 
> > 
> 
> > 
> 
> > The `anomalyScore` is `1 - averageSupport`, where averageSupport is the 
> > average of the support values returned from each or the fingerprinters. In 
> > your case you only have one fingerprinter `sigma` so using the anomalyScore 
> > of ~ `0.99` we can determine that the sigma fingerprinter returned a 
> > support of ~ `0.01`. Support is defined as `count / total`, where count is 
> > the number of times a specific event has been seen and total is the total 
> > number events seen. The support can be interpreted as a frequency 
> > percentage, i.e. the most recent window has only been seen 1% of the time. 
> > Since 0.01 is < 0.05 (the min support defined) an anomaly was triggered. 
> > Taking this back to the anomaly score it can be interpreted that 99% of the 
> > time we do not see an event like this one.
> 
> > 
> 
> > 
> 
> > Remember that Morgoth distinguishs different windows as different events 
> > using the fingerprinters. In your case the sigma function is computing the 
> > std deviation and mean of the windows it receives. If a window arrives that 
> > is more than 3 stddevs away from the mean than it is not considered the 
> > same event and is a unique event.
> 
> > 
> 
> > 
> 
> > Taking all of that and putting it together receiving an anomaly score of 
> > 99% out of Morgoth for your setup can be interpreted  as: You have sent 
> > several 1m windows to Morgoth. The window that triggered the anomaly event 
> > is only similar to ~1% of those windows, where similar is defined as being 
> > within 3 std deviations.
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > On Thursday, October 27, 2016 at 9:30:13 AM UTC-6, [email protected] 
> > wrote:
> 
> > 
> 
> > 
> 
> > 
> 
> > In short there are two parts to Morgoth.
> 
> > 
> 
> > 
> 
> > 1. A system that counts the frequency of different kinds of events. This is 
> > the lossy counting part
> 
> > 2. A system that determines if a window of data is the same as an existing 
> > event being tracked or something new. This is the fingerprinting part.
> 
> > 
> 
> > 
> 
> > 
> 
> > Here is a quick read through for those concepts 
> > http://docs.morgoth.io/docs/detection_framework/
> 
> > 
> 
> > 
> 
> > 
> 
> > Its a little hard to tell if Morgoth has done anything unexpected without 
> > more detail. Can you share some of the data that lead to this alert, so I 
> > can talk to the specifics of what is going on? Or maybe you could ask a 
> > more specific question about which part is confusing?
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > On Thursday, October 27, 2016 at 6:47:02 AM UTC-6, [email protected] 
> > wrote:Hi All,
> 
> > I am trying to run morgoth as a child process to kapacitor, but I am 
> > failing understand how morgoth functions. Below is the sample tick script I 
> > tried out of the Morgoth docs. This is generating some alerts but I am 
> > unable to figure out if they are suppose to get triggered way they have. 
> > Pasting a snippet out of alert as well.
> 
> > I basically want to understand the functioning of Morgoth through this 
> > example. 
> 
> > Alert
> 
> > ===================================================================
> 
> > {
> 
> > "id":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal,",
> 
> > "message":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal, is 
> > CRITICAL",
> 
> > "details":"",
> 
> > "time":"2016-10-27T11:33:00Z",
> 
> > "duration":21780000000000,
> 
> > "level":"CRITICAL",
> 
> > "data":{
> 
> > "series":[
> 
> > {
> 
> > "name":"cpu",
> 
> > "tags":{
> 
> > "cpu":"cpu-total",
> 
> > "host":"ip-10-121-48-24.ec2.internal"
> 
> > },
> 
> > "columns":[
> 
> > "time",
> 
> > "anomalyScore",
> 
> > "usage_guest",
> 
> > "usage_guest_nice",
> 
> > "usage_idle",
> 
> > "usage_iowait",
> 
> > "usage_irq",
> 
> > "usage_nice",
> 
> > "usage_softirq",
> 
> > "usage_steal",
> 
> > "usage_system",
> 
> > "usage_user"
> 
> > ],
> 
> > "values":[
> 
> > [
> 
> > "2016-10-27T11:33:00Z",
> 
> > 0.9897172236503856,
> 
> > 0,
> 
> > 0,
> 
> > 99.49748743708487,
> 
> > 0,
> 
> > 0,
> 
> > 0,
> 
> > 0,
> 
> > 0,
> 
> > 0.5025125628122904,
> 
> > 0
> 
> > ]
> 
> > ===================================================================
> 
> > // The measurement to analyze
> 
> > var measurement = 'cpu'
> 
> > // Optional group by dimensions
> 
> > var groups = [*]
> 
> > // Optional where filter
> 
> > var whereFilter = lambda: TRUE
> 
> > // The amount of data to window at once
> 
> > var window = 1m
> 
> > // The field to process
> 
> > var field = 'usage_idle'
> 
> > // The name for the anomaly score field
> 
> > var scoreField = 'anomalyScore'
> 
> > // The minimum support
> 
> > var minSupport = 0.05
> 
> > // The error tolerance
> 
> > var errorTolerance = 0.01
> 
> > // The consensus
> 
> > var consensus = 0.5
> 
> > // Number of sigmas allowed for normal window deviation
> 
> > var sigmas = 3.0
> 
> > stream
> 
> >   // Select the data we want
> 
> >   |from()
> 
> >       .measurement(measurement)
> 
> >       .groupBy(groups)
> 
> >       .where(whereFilter)
> 
> >   // Window the data for a certain amount of time
> 
> >   |window()
> 
> >      .period(window)
> 
> >      .every(window)
> 
> >      .align()
> 
> >   // Send each window to Morgoth
> 
> >   @morgoth()
> 
> >      .field(field)
> 
> >      .scoreField(scoreField)
> 
> >      .minSupport(minSupport)
> 
> >      .errorTolerance(errorTolerance)
> 
> >      .consensus(consensus)
> 
> >      // Configure a single Sigma fingerprinter
> 
> >      .sigma(sigmas)
> 
> >   // Morgoth returns any anomalous windows
> 
> >   |alert()
> 
> >      .details('')
> 
> >      .crit(lamda: TRUE)
> 
> >      .log('/tmp/cpu_alert.log')
> 
> 
> 
> Thanks a lot Nathaneil for your explanation on Morgoth, I have come back with 
> a new example and its set of alerts. I will brief on what I am trying to 
> achieve here. 
> 
> 
> 
> Below a set of data with count of errors(eventcount) that occurred for a 
> particular errorcode out of IIS logs. I want to run Morgoth on field 
> eventcount to detect if its an anomaly.
> 
> 
> 
> time        app        eventcount        status        tech
> 
> 2016-11-07T11:31:28.261Z        "OTSI"        586        "Success"        
> "IIS"
> 
> 2016-11-07T11:32:03.254Z        "OTSI"        1        "Failure"        "IIS" 
>  
> 
> 2016-11-07T11:33:03.243Z        "OTSI"        8        "Success"        "IIS"
> 
> 2016-11-07T11:33:23.259Z        "ANALYTICS"        158        "Success"       
>  "IIS"
> 
> 2016-11-07T11:33:23.26Z        "ANALYTICS"        24        "Failure"        
> "IIS"
> 
> 
> 
> My tickscript:
> 
> 
> 
> TICKscript:
> 
> // The measurement to analyze
> 
> var measurement = 'eventflow_IIS'
> 
> 
> 
> // The amount of data to window at once
> 
> var window = 1m
> 
> 
> 
> // The field to process
> 
> var field = 'eventcount'
> 
> 
> 
> // The name for the anomaly score field
> 
> var scoreField = 'anomalyScore'
> 
> 
> 
> // The minimum support
> 
> var minSupport = 0.05
> 
> 
> 
> // The error tolerance
> 
> var errorTolerance = 0.01
> 
> 
> 
> // The consensus
> 
> var consensus = 0.5
> 
> 
> 
> // Number of sigmas allowed for normal window deviation
> 
> var sigmas = 3.0
> 
> 
> 
> batch
> 
>     |query('''
> 
>         SELECT *
> 
>         FROM "statistics"."autogen"."eventflow_IIS"
> 
>     ''')
> 
>         .period(1m)
> 
>         .every(1m)
> 
>         .groupBy(*)
> 
>     // |.where(lambda: TRUE)
> 
>     @morgoth()
> 
>         .field(field)
> 
>         .scoreField(scoreField)
> 
>         .minSupport(minSupport)
> 
>         .errorTolerance(errorTolerance)
> 
>         .consensus(consensus)
> 
>         // Configure a single Sigma fingerprinter
> 
>         .sigma(sigmas)
> 
>     // Morgoth returns any anomalous windows
> 
>     |alert()
> 
>         .details('Count is anomalous')
> 
>         .id('kapacitor/{{ .TaskName }}/{{ .Name }}/{{ .Group }}')
> 
>         .message('{{ .ID }} is at level {{ .Level }} Errorcount is:{{ index 
> .Fields "eventcount" }}')
> 
>         .crit(lambda: TRUE)
> 
>         .log('/tmp/morgothbb.log')
> 
>     |influxDBOut()
> 
>         .database('anomaly')
> 
>         .retentionPolicy('autogen')
> 
>         .flushInterval(1s)
> 
>         .measurement('Anomaly')
> 
>         // .tag('eventcount','field')
> 
>         // .tag('AnomalyScore','scoreField')
> 
>         // .tag('Time','time')
> 
>         // .tag('Status','status')
> 
>         .precision('u')
> 
> 
> 
> Below is the alert what it has generated pumped into a table.
> 
> 
> 
> time                                                                        
> anomalyScore                app        eventcount        status               
>  tech
> 
> 2016-11-08T09:34:40.169285533Z                        0.95                    
>             "OTSI"        296                "Success"        "IIS"
> 
> 2016-11-08T09:35:40.171285533Z                        0.9523809523809523      
>   "OTSI"        28                "Success"        "IIS"
> 
> 2016-11-08T09:36:40.170285533Z                        0.9545454545454546      
>   "OTSI"        12                "Success"        "IIS"
> 
> 2016-11-08T09:37:40.169285533Z                        0.9565217391304348      
>   "OTSI"        20                "Success"        "IIS"
> 
> 2016-11-08T09:38:40.170285533Z                        0.9583333333333334      
>   "OTSI"        249                "Success"        "IIS"
> 
> 2016-11-08T09:39:40.167285533Z                        0.96                    
>             "OTSI"        70                "Success"        "IIS"
> 
> 2016-11-08T09:43:00.167285533Z                        0.9615384615384616      
>   "ANALYTICS"        1        "Success"        "IIS"
> 
> 2016-11-08T09:43:40.164285533Z                        0.962962962962963       
>  "OTSI"        24                "Success"        "IIS"
> 
> 2016-11-08T09:52:00.160285533Z                        0.9642857142857143      
>   "ANALYTICS"        1        "Success"        "IIS"
> 
> 
> 
> My question is:
> 
> 
> 
> How to interpret the anomaly score generated here ~0.95 with the counts for 
> which Morgoth has triggered an Anomaly.Going by our earliar discussion 
> Support here turns out to be ~0.05 (1- Anomaly Score). And anomaly gets 
> triggered when (support < Min Support), so in this case it turns out 0.05 < 
> 0.05 which should not be true. But still anomaly is getting triggered almost 
> every minute. Could you please help me understand this. 
> 
> 
> 
> Also let me know if e,M,N need to be tweaked here for this particular data 
> sample to generate meaningful alert out of it.

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/3584d632-73cd-43f4-8375-3a7c6efeca26%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[influxdb] Re: Understanding Morgoth

Reply via email to