[influxdb] Re: Understanding Morgoth

amith hegde Wed, 09 Nov 2016 08:39:12 -0800

Thankyou for your advice, I am working on this piece. In the meanwhile I
have couple of questions if you can help me with.
1. Is it possible to take a look at the recording to see what data it
holds? If yes how can we do that?
2. How is the Anomaly score determined.? What is the formula to calculate
anomalyScore? If it is (1-averagesupport ), even average support is not a
defined value.


Thanks,
Amith

On Nov 8, 2016 9:33 PM, <[email protected]> wrote:

> The actual comparison is  <= which is why you received the alert. But if
> your tolerances are tight enough that <= matters over < then you are
> probably too tight on your tolerances.
>
> I would first recommend that you tweak the sigmas value, may increase it
> to 3.5 or 4. To iterate quickly on for these tests I recommend that you
> create a recording of the data set and then tweak value replay the
> recording check the results, and repeat until you have something you like.
> If you share your recording with me I would be willing to take a quick look
> as well. As it is its a little hard to give good advice based of a handful
> of data points.
>
> On Tuesday, November 8, 2016 at 7:47:39 AM UTC-7, [email protected]
> wrote:
>>
>> On Thursday, 27 October 2016 21:46:08 UTC+5:30, [email protected]
>>  wrote:
>> > Clarification from Amith:
>> >
>> >
>> >
>> >
>> >
>> >
>> > Hi Nathaniel,
>> >
>> >
>> > Thanks a lot for your quick reply, what is confusing for me here is how
>> morgoth calculated anomalyScore field whose value has turned out to be
>> 0.9897172236503856. And how is this being used to detect anomaly.
>> > How does this particular node function
>> >
>> >
>> >
>> > …
>> >
>> >   @morgoth()
>> >      .field(field)
>> >      .scoreField(scoreField)
>> >      .minSupport(minSupport)
>> >      .errorTolerance(errorTolerance)
>> >      .consensus(consensus)
>> >      // Configure a single Sigma fingerprinter
>> >
>> >
>> >
>> >
>> >      .sigma(sigmas).
>> >
>> >
>> > You can choose some arbitrary data to help me understand this. :)
>> > Thanks,
>> > Amith
>> >
>> >
>> > My response:
>> >
>> >
>> > The `anomalyScore` is `1 - averageSupport`, where averageSupport is the
>> average of the support values returned from each or the fingerprinters. In
>> your case you only have one fingerprinter `sigma` so using the anomalyScore
>> of ~ `0.99` we can determine that the sigma fingerprinter returned a
>> support of ~ `0.01`. Support is defined as `count / total`, where count is
>> the number of times a specific event has been seen and total is the total
>> number events seen. The support can be interpreted as a frequency
>> percentage, i.e. the most recent window has only been seen 1% of the time.
>> Since 0.01 is < 0.05 (the min support defined) an anomaly was triggered.
>> Taking this back to the anomaly score it can be interpreted that 99% of the
>> time we do not see an event like this one.
>> >
>> >
>> > Remember that Morgoth distinguishs different windows as different
>> events using the fingerprinters. In your case the sigma function is
>> computing the std deviation and mean of the windows it receives. If a
>> window arrives that is more than 3 stddevs away from the mean than it is
>> not considered the same event and is a unique event.
>> >
>> >
>> > Taking all of that and putting it together receiving an anomaly score
>> of 99% out of Morgoth for your setup can be interpreted  as: You have sent
>> several 1m windows to Morgoth. The window that triggered the anomaly event
>> is only similar to ~1% of those windows, where similar is defined as being
>> within 3 std deviations.
>> >
>> >
>> >
>> >
>> > On Thursday, October 27, 2016 at 9:30:13 AM UTC-6, [email protected]
>> wrote:
>> >
>> >
>> >
>> > In short there are two parts to Morgoth.
>> >
>> >
>> > 1. A system that counts the frequency of different kinds of events.
>> This is the lossy counting part
>> > 2. A system that determines if a window of data is the same as an
>> existing event being tracked or something new. This is the fingerprinting
>> part.
>> >
>> >
>> >
>> > Here is a quick read through for those concepts
>> http://docs.morgoth.io/docs/detection_framework/
>> >
>> >
>> >
>> > Its a little hard to tell if Morgoth has done anything unexpected
>> without more detail. Can you share some of the data that lead to this
>> alert, so I can talk to the specifics of what is going on? Or maybe you
>> could ask a more specific question about which part is confusing?
>> >
>> >
>> >
>> >
>> > On Thursday, October 27, 2016 at 6:47:02 AM UTC-6, [email protected]
>> wrote:Hi All,
>> > I am trying to run morgoth as a child process to kapacitor, but I am
>> failing understand how morgoth functions. Below is the sample tick script I
>> tried out of the Morgoth docs. This is generating some alerts but I am
>> unable to figure out if they are suppose to get triggered way they have.
>> Pasting a snippet out of alert as well.
>> > I basically want to understand the functioning of Morgoth through this
>> example.
>> > Alert
>> > ===================================================================
>> > {
>> > "id":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal,",
>> > "message":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal, is
>> CRITICAL",
>> > "details":"",
>> > "time":"2016-10-27T11:33:00Z",
>> > "duration":21780000000000,
>> > "level":"CRITICAL",
>> > "data":{
>> > "series":[
>> > {
>> > "name":"cpu",
>> > "tags":{
>> > "cpu":"cpu-total",
>> > "host":"ip-10-121-48-24.ec2.internal"
>> > },
>> > "columns":[
>> > "time",
>> > "anomalyScore",
>> > "usage_guest",
>> > "usage_guest_nice",
>> > "usage_idle",
>> > "usage_iowait",
>> > "usage_irq",
>> > "usage_nice",
>> > "usage_softirq",
>> > "usage_steal",
>> > "usage_system",
>> > "usage_user"
>> > ],
>> > "values":[
>> > [
>> > "2016-10-27T11:33:00Z",
>> > 0.9897172236503856,
>> > 0,
>> > 0,
>> > 99.49748743708487,
>> > 0,
>> > 0,
>> > 0,
>> > 0,
>> > 0,
>> > 0.5025125628122904,
>> > 0
>> > ]
>> > ===================================================================
>> > // The measurement to analyze
>> > var measurement = 'cpu'
>> > // Optional group by dimensions
>> > var groups = [*]
>> > // Optional where filter
>> > var whereFilter = lambda: TRUE
>> > // The amount of data to window at once
>> > var window = 1m
>> > // The field to process
>> > var field = 'usage_idle'
>> > // The name for the anomaly score field
>> > var scoreField = 'anomalyScore'
>> > // The minimum support
>> > var minSupport = 0.05
>> > // The error tolerance
>> > var errorTolerance = 0.01
>> > // The consensus
>> > var consensus = 0.5
>> > // Number of sigmas allowed for normal window deviation
>> > var sigmas = 3.0
>> > stream
>> >   // Select the data we want
>> >   |from()
>> >       .measurement(measurement)
>> >       .groupBy(groups)
>> >       .where(whereFilter)
>> >   // Window the data for a certain amount of time
>> >   |window()
>> >      .period(window)
>> >      .every(window)
>> >      .align()
>> >   // Send each window to Morgoth
>> >   @morgoth()
>> >      .field(field)
>> >      .scoreField(scoreField)
>> >      .minSupport(minSupport)
>> >      .errorTolerance(errorTolerance)
>> >      .consensus(consensus)
>> >      // Configure a single Sigma fingerprinter
>> >      .sigma(sigmas)
>> >   // Morgoth returns any anomalous windows
>> >   |alert()
>> >      .details('')
>> >      .crit(lamda: TRUE)
>> >      .log('/tmp/cpu_alert.log')
>>
>> Thanks a lot Nathaneil for your explanation on Morgoth, I have come back
>> with a new example and its set of alerts. I will brief on what I am trying
>> to achieve here.
>>
>> Below a set of data with count of errors(eventcount) that occurred for a
>> particular errorcode out of IIS logs. I want to run Morgoth on field
>> eventcount to detect if its an anomaly.
>>
>> time        app        eventcount        status        tech
>> 2016-11-07T11:31:28.261Z        "OTSI"        586        "Success"        
>> "IIS"
>>
>> 2016-11-07T11:32:03.254Z        "OTSI"        1        "Failure"        "IIS"
>>
>> 2016-11-07T11:33:03.243Z        "OTSI"        8        "Success"        "IIS"
>>
>> 2016-11-07T11:33:23.259Z        "ANALYTICS"        158        "Success"      
>>   "IIS"
>>
>> 2016-11-07T11:33:23.26Z        "ANALYTICS"        24        "Failure"        
>> "IIS"
>>
>>
>> My tickscript:
>>
>> TICKscript:
>> // The measurement to analyze
>> var measurement = 'eventflow_IIS'
>>
>> // The amount of data to window at once
>> var window = 1m
>>
>> // The field to process
>> var field = 'eventcount'
>>
>> // The name for the anomaly score field
>> var scoreField = 'anomalyScore'
>>
>> // The minimum support
>> var minSupport = 0.05
>>
>> // The error tolerance
>> var errorTolerance = 0.01
>>
>> // The consensus
>> var consensus = 0.5
>>
>> // Number of sigmas allowed for normal window deviation
>> var sigmas = 3.0
>>
>> batch
>>     |query('''
>>         SELECT *
>>         FROM "statistics"."autogen"."eventflow_IIS"
>>     ''')
>>         .period(1m)
>>         .every(1m)
>>         .groupBy(*)
>>     // |.where(lambda: TRUE)
>>     @morgoth()
>>         .field(field)
>>         .scoreField(scoreField)
>>         .minSupport(minSupport)
>>         .errorTolerance(errorTolerance)
>>         .consensus(consensus)
>>         // Configure a single Sigma fingerprinter
>>         .sigma(sigmas)
>>     // Morgoth returns any anomalous windows
>>     |alert()
>>         .details('Count is anomalous')
>>         .id('kapacitor/{{ .TaskName }}/{{ .Name }}/{{ .Group }}')
>>         .message('{{ .ID }} is at level {{ .Level }} Errorcount is:{{
>> index .Fields "eventcount" }}')
>>         .crit(lambda: TRUE)
>>         .log('/tmp/morgothbb.log')
>>     |influxDBOut()
>>         .database('anomaly')
>>         .retentionPolicy('autogen')
>>         .flushInterval(1s)
>>         .measurement('Anomaly')
>>         // .tag('eventcount','field')
>>         // .tag('AnomalyScore','scoreField')
>>         // .tag('Time','time')
>>         // .tag('Status','status')
>>         .precision('u')
>>
>> Below is the alert what it has generated pumped into a table.
>>
>> time
>>                 anomalyScore                app        event
>> count        status                tech
>> 2016-11-08T09:34:40.169285533Z                        0.95
>>                               "OTSI"        296                "Success"     
>>    "IIS"
>>
>> 2016-11-08T09:35:40.171285533Z                        0.
>> 9523809523809523        "OTSI"        28                "Success"        
>> "IIS"
>>
>> 2016-11-08T09:36:40.170285533Z                        0.
>> 9545454545454546        "OTSI"        12                "Success"        
>> "IIS"
>>
>> 2016-11-08T09:37:40.169285533Z                        0.
>> 9565217391304348        "OTSI"        20                "Success"        
>> "IIS"
>>
>> 2016-11-08T09:38:40.170285533Z                        0.
>> 9583333333333334        "OTSI"        249                "Success"        
>> "IIS"
>>
>> 2016-11-08T09:39:40.167285533Z                        0.96
>>                               "OTSI"        70                "Success"      
>>   "IIS"
>>
>> 2016-11-08T09:43:00.167285533Z                        0.
>> 9615384615384616        "ANALYTICS"        1        "Success"        "IIS"
>>
>> 2016-11-08T09:43:40.164285533Z                        0.
>> 962962962962963        "OTSI"        24                "Success"        "IIS"
>>
>> 2016-11-08T09:52:00.160285533Z                        0.
>> 9642857142857143        "ANALYTICS"        1        "Success"        "IIS"
>>
>>
>> My question is:
>>
>> How to interpret the anomaly score generated here ~0.95 with the counts
>> for which Morgoth has triggered an Anomaly.Going by our earliar discussion
>> Support here turns out to be ~0.05 (1- Anomaly Score). And anomaly gets
>> triggered when (support < Min Support), so in this case it turns out 0.05 <
>> 0.05 which should not be true. But still anomaly is getting triggered
>> almost every minute. Could you please help me understand this.
>>
>> Also let me know if e,M,N need to be tweaked here for this particular
>> data sample to generate meaningful alert out of it.
>
>

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/CAO41PqxV0%3DS9wCT5N0dA5d5K8oqGL%3DYvwk-15v5%3DP4c9C1XLCw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[influxdb] Re: Understanding Morgoth

Reply via email to