1. Yes, the srpl files are just a gzipped line protocol file. The brpl files are a zip of serveral files containing the json data for the recording. 2. In my previous post I explained how average support was computed, and linked to docs on the lossy counting algorithm which it is origin.
On Wednesday, November 9, 2016 at 9:38:56 AM UTC-7, amith hegde wrote: > > Thankyou for your advice, I am working on this piece. In the meanwhile I > have couple of questions if you can help me with. > 1. Is it possible to take a look at the recording to see what data it > holds? If yes how can we do that? > 2. How is the Anomaly score determined.? What is the formula to calculate > anomalyScore? If it is (1-averagesupport ), even average support is not a > defined value. > > Thanks, > Amith > > On Nov 8, 2016 9:33 PM, <[email protected] <javascript:>> wrote: > >> The actual comparison is <= which is why you received the alert. But if >> your tolerances are tight enough that <= matters over < then you are >> probably too tight on your tolerances. >> >> I would first recommend that you tweak the sigmas value, may increase it >> to 3.5 or 4. To iterate quickly on for these tests I recommend that you >> create a recording of the data set and then tweak value replay the >> recording check the results, and repeat until you have something you like. >> If you share your recording with me I would be willing to take a quick look >> as well. As it is its a little hard to give good advice based of a handful >> of data points. >> >> On Tuesday, November 8, 2016 at 7:47:39 AM UTC-7, [email protected] >> wrote: >>> >>> On Thursday, 27 October 2016 21:46:08 UTC+5:30, [email protected] >>> wrote: >>> > Clarification from Amith: >>> > >>> > >>> > >>> > >>> > >>> > >>> > Hi Nathaniel, >>> > >>> > >>> > Thanks a lot for your quick reply, what is confusing for me here is >>> how morgoth calculated anomalyScore field whose value has turned out to be >>> 0.9897172236503856. And how is this being used to detect anomaly. >>> > How does this particular node function >>> > >>> > >>> > >>> > … >>> > >>> > @morgoth() >>> > .field(field) >>> > .scoreField(scoreField) >>> > .minSupport(minSupport) >>> > .errorTolerance(errorTolerance) >>> > .consensus(consensus) >>> > // Configure a single Sigma fingerprinter >>> > >>> > >>> > >>> > >>> > .sigma(sigmas). >>> > >>> > >>> > You can choose some arbitrary data to help me understand this. :) >>> > Thanks, >>> > Amith >>> > >>> > >>> > My response: >>> > >>> > >>> > The `anomalyScore` is `1 - averageSupport`, where averageSupport is >>> the average of the support values returned from each or the fingerprinters. >>> In your case you only have one fingerprinter `sigma` so using the >>> anomalyScore of ~ `0.99` we can determine that the sigma fingerprinter >>> returned a support of ~ `0.01`. Support is defined as `count / total`, >>> where count is the number of times a specific event has been seen and total >>> is the total number events seen. The support can be interpreted as a >>> frequency percentage, i.e. the most recent window has only been seen 1% of >>> the time. Since 0.01 is < 0.05 (the min support defined) an anomaly was >>> triggered. Taking this back to the anomaly score it can be interpreted that >>> 99% of the time we do not see an event like this one. >>> > >>> > >>> > Remember that Morgoth distinguishs different windows as different >>> events using the fingerprinters. In your case the sigma function is >>> computing the std deviation and mean of the windows it receives. If a >>> window arrives that is more than 3 stddevs away from the mean than it is >>> not considered the same event and is a unique event. >>> > >>> > >>> > Taking all of that and putting it together receiving an anomaly score >>> of 99% out of Morgoth for your setup can be interpreted as: You have sent >>> several 1m windows to Morgoth. The window that triggered the anomaly event >>> is only similar to ~1% of those windows, where similar is defined as being >>> within 3 std deviations. >>> > >>> > >>> > >>> > >>> > On Thursday, October 27, 2016 at 9:30:13 AM UTC-6, >>> [email protected] wrote: >>> > >>> > >>> > >>> > In short there are two parts to Morgoth. >>> > >>> > >>> > 1. A system that counts the frequency of different kinds of events. >>> This is the lossy counting part >>> > 2. A system that determines if a window of data is the same as an >>> existing event being tracked or something new. This is the fingerprinting >>> part. >>> > >>> > >>> > >>> > Here is a quick read through for those concepts >>> http://docs.morgoth.io/docs/detection_framework/ >>> > >>> > >>> > >>> > Its a little hard to tell if Morgoth has done anything unexpected >>> without more detail. Can you share some of the data that lead to this >>> alert, so I can talk to the specifics of what is going on? Or maybe you >>> could ask a more specific question about which part is confusing? >>> > >>> > >>> > >>> > >>> > On Thursday, October 27, 2016 at 6:47:02 AM UTC-6, [email protected] >>> wrote:Hi All, >>> > I am trying to run morgoth as a child process to kapacitor, but I am >>> failing understand how morgoth functions. Below is the sample tick script I >>> tried out of the Morgoth docs. This is generating some alerts but I am >>> unable to figure out if they are suppose to get triggered way they have. >>> Pasting a snippet out of alert as well. >>> > I basically want to understand the functioning of Morgoth through this >>> example. >>> > Alert >>> > =================================================================== >>> > { >>> > "id":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal,", >>> > "message":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal, is >>> CRITICAL", >>> > "details":"", >>> > "time":"2016-10-27T11:33:00Z", >>> > "duration":21780000000000, >>> > "level":"CRITICAL", >>> > "data":{ >>> > "series":[ >>> > { >>> > "name":"cpu", >>> > "tags":{ >>> > "cpu":"cpu-total", >>> > "host":"ip-10-121-48-24.ec2.internal" >>> > }, >>> > "columns":[ >>> > "time", >>> > "anomalyScore", >>> > "usage_guest", >>> > "usage_guest_nice", >>> > "usage_idle", >>> > "usage_iowait", >>> > "usage_irq", >>> > "usage_nice", >>> > "usage_softirq", >>> > "usage_steal", >>> > "usage_system", >>> > "usage_user" >>> > ], >>> > "values":[ >>> > [ >>> > "2016-10-27T11:33:00Z", >>> > 0.9897172236503856, >>> > 0, >>> > 0, >>> > 99.49748743708487, >>> > 0, >>> > 0, >>> > 0, >>> > 0, >>> > 0, >>> > 0.5025125628122904, >>> > 0 >>> > ] >>> > =================================================================== >>> > // The measurement to analyze >>> > var measurement = 'cpu' >>> > // Optional group by dimensions >>> > var groups = [*] >>> > // Optional where filter >>> > var whereFilter = lambda: TRUE >>> > // The amount of data to window at once >>> > var window = 1m >>> > // The field to process >>> > var field = 'usage_idle' >>> > // The name for the anomaly score field >>> > var scoreField = 'anomalyScore' >>> > // The minimum support >>> > var minSupport = 0.05 >>> > // The error tolerance >>> > var errorTolerance = 0.01 >>> > // The consensus >>> > var consensus = 0.5 >>> > // Number of sigmas allowed for normal window deviation >>> > var sigmas = 3.0 >>> > stream >>> > // Select the data we want >>> > |from() >>> > .measurement(measurement) >>> > .groupBy(groups) >>> > .where(whereFilter) >>> > // Window the data for a certain amount of time >>> > |window() >>> > .period(window) >>> > .every(window) >>> > .align() >>> > // Send each window to Morgoth >>> > @morgoth() >>> > .field(field) >>> > .scoreField(scoreField) >>> > .minSupport(minSupport) >>> > .errorTolerance(errorTolerance) >>> > .consensus(consensus) >>> > // Configure a single Sigma fingerprinter >>> > .sigma(sigmas) >>> > // Morgoth returns any anomalous windows >>> > |alert() >>> > .details('') >>> > .crit(lamda: TRUE) >>> > .log('/tmp/cpu_alert.log') >>> >>> Thanks a lot Nathaneil for your explanation on Morgoth, I have come back >>> with a new example and its set of alerts. I will brief on what I am trying >>> to achieve here. >>> >>> Below a set of data with count of errors(eventcount) that occurred for a >>> particular errorcode out of IIS logs. I want to run Morgoth on field >>> eventcount to detect if its an anomaly. >>> >>> time app eventcount status tech >>> 2016-11-07T11:31:28.261Z "OTSI" 586 "Success" >>> "IIS" >>> >>> 2016-11-07T11:32:03.254Z "OTSI" 1 "Failure" >>> "IIS" >>> >>> 2016-11-07T11:33:03.243Z "OTSI" 8 "Success" >>> "IIS" >>> >>> 2016-11-07T11:33:23.259Z "ANALYTICS" 158 "Success" >>> "IIS" >>> >>> 2016-11-07T11:33:23.26Z "ANALYTICS" 24 "Failure" >>> "IIS" >>> >>> >>> My tickscript: >>> >>> TICKscript: >>> // The measurement to analyze >>> var measurement = 'eventflow_IIS' >>> >>> // The amount of data to window at once >>> var window = 1m >>> >>> // The field to process >>> var field = 'eventcount' >>> >>> // The name for the anomaly score field >>> var scoreField = 'anomalyScore' >>> >>> // The minimum support >>> var minSupport = 0.05 >>> >>> // The error tolerance >>> var errorTolerance = 0.01 >>> >>> // The consensus >>> var consensus = 0.5 >>> >>> // Number of sigmas allowed for normal window deviation >>> var sigmas = 3.0 >>> >>> batch >>> |query(''' >>> SELECT * >>> FROM "statistics"."autogen"."eventflow_IIS" >>> ''') >>> .period(1m) >>> .every(1m) >>> .groupBy(*) >>> // |.where(lambda: TRUE) >>> @morgoth() >>> .field(field) >>> .scoreField(scoreField) >>> .minSupport(minSupport) >>> .errorTolerance(errorTolerance) >>> .consensus(consensus) >>> // Configure a single Sigma fingerprinter >>> .sigma(sigmas) >>> // Morgoth returns any anomalous windows >>> |alert() >>> .details('Count is anomalous') >>> .id('kapacitor/{{ .TaskName }}/{{ .Name }}/{{ .Group }}') >>> .message('{{ .ID }} is at level {{ .Level }} Errorcount is:{{ >>> index .Fields "eventcount" }}') >>> .crit(lambda: TRUE) >>> .log('/tmp/morgothbb.log') >>> |influxDBOut() >>> .database('anomaly') >>> .retentionPolicy('autogen') >>> .flushInterval(1s) >>> .measurement('Anomaly') >>> // .tag('eventcount','field') >>> // .tag('AnomalyScore','scoreField') >>> // .tag('Time','time') >>> // .tag('Status','status') >>> .precision('u') >>> >>> Below is the alert what it has generated pumped into a table. >>> >>> time >>> anomalyScore app eventcount status >>> tech >>> >>> 2016-11-08T09:34:40.169285533Z 0.95 >>> "OTSI" 296 "Success" "IIS" >>> >>> 2016-11-08T09:35:40.171285533Z 0.9523809523809523 >>> "OTSI" 28 "Success" "IIS" >>> >>> 2016-11-08T09:36:40.170285533Z 0.9545454545454546 >>> "OTSI" 12 "Success" "IIS" >>> >>> 2016-11-08T09:37:40.169285533Z 0.9565217391304348 >>> "OTSI" 20 "Success" "IIS" >>> >>> 2016-11-08T09:38:40.170285533Z 0.9583333333333334 >>> "OTSI" 249 "Success" "IIS" >>> >>> 2016-11-08T09:39:40.167285533Z 0.96 >>> "OTSI" 70 "Success" "IIS" >>> >>> 2016-11-08T09:43:00.167285533Z 0.9615384615384616 >>> "ANALYTICS" 1 "Success" "IIS" >>> >>> 2016-11-08T09:43:40.164285533Z 0.962962962962963 >>> "OTSI" 24 "Success" "IIS" >>> >>> 2016-11-08T09:52:00.160285533Z 0.9642857142857143 >>> "ANALYTICS" 1 "Success" "IIS" >>> >>> >>> My question is: >>> >>> How to interpret the anomaly score generated here ~0.95 with the counts >>> for which Morgoth has triggered an Anomaly.Going by our earliar discussion >>> Support here turns out to be ~0.05 (1- Anomaly Score). And anomaly gets >>> triggered when (support < Min Support), so in this case it turns out 0.05 < >>> 0.05 which should not be true. But still anomaly is getting triggered >>> almost every minute. Could you please help me understand this. >>> >>> Also let me know if e,M,N need to be tweaked here for this particular >>> data sample to generate meaningful alert out of it. >> >> -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/3031d86e-a536-4e7c-a713-f2aa1d706743%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
