Hi Nathan, I want to understand the role of Sigma here in Morgoth which is the fingerprint. 1.Where are the fingerprints stored internally for Morgoth to work on new window of data that comes through? 2. How do varying the Sigma value(currently I am using 3.0) change the way Anomaly is detected since Sigma here is essentially the mean,sd and count. 3. What are the other fingerprint I can use with Morgoth.
Looking forward to a reply from you. Thanks, Amith On Thursday, 10 November 2016 22:10:38 UTC+5:30, [email protected] wrote: > If the data isn't in /var/lib/kapacitor/replay then check the value of the > `[replay] dir` config option for where the data is stored. > > On Wednesday, November 9, 2016 at 11:59:57 PM UTC-7, [email protected] > wrote:Hi Nathan, > > > > I do not find these files under the dir /var/lib/kapacitor/replay. But when I > say 'kapacitor list recordings', I find the below. > > > > > > ID Type Status Size Date > > 2aa4cc3b-964d-4956-85ef-77f671fded6f batch finished 6.4 kB 09 Nov 16 > 01:18 EST > > 1562b674-cbff-497b-be34-781da1ae9d4f batch finished 5.9 kB 08 Nov 16 > 23:41 EST > > 823efccb-3241-40d0-8b19-6b55a3f147ee batch finished 5.8 kB 08 Nov 16 > 22:50 EST > > b07bac9f-0d6e-4324-9301-f7114834135e batch finished 1.8 kB 08 Nov 16 > 07:56 EST > > 0d3cb557-e993-4656-bf55-80b403ad7228 stream finished 23 B 08 Nov 16 > 07:42 EST > > 6d8820d2-d674-448d-92de-cef0a2494267 batch finished 271 B 08 Nov 16 > 06:26 EST > > 5f4988a0-58dd-4926-965c-dd98d7492b8f batch finished 622 B 08 Nov 16 > 04:35 EST > > 6e8eb49f-32fa-4aa5-ba75-736658dd326d batch finished 3.7 kB 08 Nov 16 > 04:10 EST > > 3139e958-54fb-441d-bc8b-11b1c8c5a6a3 batch finished 121 B 08 Nov 16 > 02:49 EST > > 4cc14c6e-842d-41a3-be9b-0298f24cbc3 > > > > How can I access these files to read their contents? > > > > Thanks, > > Amith > > > > > > On Wednesday, 9 November 2016 22:25:14 UTC+5:30, [email protected] wrote: > > > 1. Yes, the srpl files are just a gzipped line protocol file. The brpl > > files are a zip of serveral files containing the json data for the > > recording. > > > 2. In my previous post I explained how average support was computed, and > > linked to docs on the lossy counting algorithm which it is origin. > > > > > > On Wednesday, November 9, 2016 at 9:38:56 AM UTC-7, amith hegde wrote: > > > Thankyou for your advice, I am working on this piece. In the meanwhile I > > have couple of questions if you can help me with. > > > > > > 1. Is it possible to take a look at the recording to see what data it > > holds? If yes how can we do that? > > > > > > 2. How is the Anomaly score determined.? What is the formula to calculate > > anomalyScore? If it is (1-averagesupport ), even average support is not a > > defined value. > > > > > > Thanks, > > > > > > Amith > > > > > > > > > > > > On Nov 8, 2016 9:33 PM, <[email protected]> wrote: > > > > > > The actual comparison is <= which is why you received the alert. But if > > your tolerances are tight enough that <= matters over < then you are > > probably too tight on your tolerances. > > > > > > > > > I would first recommend that you tweak the sigmas value, may increase it to > > 3.5 or 4. To iterate quickly on for these tests I recommend that you create > > a recording of the data set and then tweak value replay the recording check > > the results, and repeat until you have something you like. If you share > > your recording with me I would be willing to take a quick look as well. As > > it is its a little hard to give good advice based of a handful of data > > points. > > > > > > On Tuesday, November 8, 2016 at 7:47:39 AM UTC-7, [email protected] > > wrote:On Thursday, 27 October 2016 21:46:08 UTC+5:30, [email protected] > > wrote: > > > > > > > Clarification from Amith: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Nathaniel, > > > > > > > > > > > > > > > > > > > > > Thanks a lot for your quick reply, what is confusing for me here is how > > > morgoth calculated anomalyScore field whose value has turned out to be > > > 0.9897172236503856. And how is this being used to detect anomaly. > > > > > > > How does this particular node function > > > > > > > > > > > > > > > > > > > > > > > > > > > > … > > > > > > > > > > > > > > @morgoth() > > > > > > > .field(field) > > > > > > > .scoreField(scoreField) > > > > > > > .minSupport(minSupport) > > > > > > > .errorTolerance(errorTolerance) > > > > > > > .consensus(consensus) > > > > > > > // Configure a single Sigma fingerprinter > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > .sigma(sigmas). > > > > > > > > > > > > > > > > > > > > > You can choose some arbitrary data to help me understand this. :) > > > > > > > Thanks, > > > > > > > Amith > > > > > > > > > > > > > > > > > > > > > My response: > > > > > > > > > > > > > > > > > > > > > The `anomalyScore` is `1 - averageSupport`, where averageSupport is the > > > average of the support values returned from each or the fingerprinters. > > > In your case you only have one fingerprinter `sigma` so using the > > > anomalyScore of ~ `0.99` we can determine that the sigma fingerprinter > > > returned a support of ~ `0.01`. Support is defined as `count / total`, > > > where count is the number of times a specific event has been seen and > > > total is the total number events seen. The support can be interpreted as > > > a frequency percentage, i.e. the most recent window has only been seen 1% > > > of the time. Since 0.01 is < 0.05 (the min support defined) an anomaly > > > was triggered. Taking this back to the anomaly score it can be > > > interpreted that 99% of the time we do not see an event like this one. > > > > > > > > > > > > > > > > > > > > > Remember that Morgoth distinguishs different windows as different events > > > using the fingerprinters. In your case the sigma function is computing > > > the std deviation and mean of the windows it receives. If a window > > > arrives that is more than 3 stddevs away from the mean than it is not > > > considered the same event and is a unique event. > > > > > > > > > > > > > > > > > > > > > Taking all of that and putting it together receiving an anomaly score of > > > 99% out of Morgoth for your setup can be interpreted as: You have sent > > > several 1m windows to Morgoth. The window that triggered the anomaly > > > event is only similar to ~1% of those windows, where similar is defined > > > as being within 3 std deviations. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thursday, October 27, 2016 at 9:30:13 AM UTC-6, [email protected] > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > In short there are two parts to Morgoth. > > > > > > > > > > > > > > > > > > > > > 1. A system that counts the frequency of different kinds of events. This > > > is the lossy counting part > > > > > > > 2. A system that determines if a window of data is the same as an > > > existing event being tracked or something new. This is the fingerprinting > > > part. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Here is a quick read through for those concepts > > > http://docs.morgoth.io/docs/detection_framework/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > Its a little hard to tell if Morgoth has done anything unexpected without > > > more detail. Can you share some of the data that lead to this alert, so I > > > can talk to the specifics of what is going on? Or maybe you could ask a > > > more specific question about which part is confusing? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thursday, October 27, 2016 at 6:47:02 AM UTC-6, [email protected] > > > wrote:Hi All, > > > > > > > I am trying to run morgoth as a child process to kapacitor, but I am > > > failing understand how morgoth functions. Below is the sample tick script > > > I tried out of the Morgoth docs. This is generating some alerts but I am > > > unable to figure out if they are suppose to get triggered way they have. > > > Pasting a snippet out of alert as well. > > > > > > > I basically want to understand the functioning of Morgoth through this > > > example. > > > > > > > Alert > > > > > > > =================================================================== > > > > > > > { > > > > > > > "id":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal,", > > > > > > > "message":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal, is > > > CRITICAL", > > > > > > > "details":"", > > > > > > > "time":"2016-10-27T11:33:00Z", > > > > > > > "duration":21780000000000, > > > > > > > "level":"CRITICAL", > > > > > > > "data":{ > > > > > > > "series":[ > > > > > > > { > > > > > > > "name":"cpu", > > > > > > > "tags":{ > > > > > > > "cpu":"cpu-total", > > > > > > > "host":"ip-10-121-48-24.ec2.internal" > > > > > > > }, > > > > > > > "columns":[ > > > > > > > "time", > > > > > > > "anomalyScore", > > > > > > > "usage_guest", > > > > > > > "usage_guest_nice", > > > > > > > "usage_idle", > > > > > > > "usage_iowait", > > > > > > > "usage_irq", > > > > > > > "usage_nice", > > > > > > > "usage_softirq", > > > > > > > "usage_steal", > > > > > > > "usage_system", > > > > > > > "usage_user" > > > > > > > ], > > > > > > > "values":[ > > > > > > > [ > > > > > > > "2016-10-27T11:33:00Z", > > > > > > > 0.9897172236503856, > > > > > > > 0, > > > > > > > 0, > > > > > > > 99.49748743708487, > > > > > > > 0, > > > > > > > 0, > > > > > > > 0, > > > > > > > 0, > > > > > > > 0, > > > > > > > 0.5025125628122904, > > > > > > > 0 > > > > > > > ] > > > > > > > =================================================================== > > > > > > > // The measurement to analyze > > > > > > > var measurement = 'cpu' > > > > > > > // Optional group by dimensions > > > > > > > var groups = [*] > > > > > > > // Optional where filter > > > > > > > var whereFilter = lambda: TRUE > > > > > > > // The amount of data to window at once > > > > > > > var window = 1m > > > > > > > // The field to process > > > > > > > var field = 'usage_idle' > > > > > > > // The name for the anomaly score field > > > > > > > var scoreField = 'anomalyScore' > > > > > > > // The minimum support > > > > > > > var minSupport = 0.05 > > > > > > > // The error tolerance > > > > > > > var errorTolerance = 0.01 > > > > > > > // The consensus > > > > > > > var consensus = 0.5 > > > > > > > // Number of sigmas allowed for normal window deviation > > > > > > > var sigmas = 3.0 > > > > > > > stream > > > > > > > // Select the data we want > > > > > > > |from() > > > > > > > .measurement(measurement) > > > > > > > .groupBy(groups) > > > > > > > .where(whereFilter) > > > > > > > // Window the data for a certain amount of time > > > > > > > |window() > > > > > > > .period(window) > > > > > > > .every(window) > > > > > > > .align() > > > > > > > // Send each window to Morgoth > > > > > > > @morgoth() > > > > > > > .field(field) > > > > > > > .scoreField(scoreField) > > > > > > > .minSupport(minSupport) > > > > > > > .errorTolerance(errorTolerance) > > > > > > > .consensus(consensus) > > > > > > > // Configure a single Sigma fingerprinter > > > > > > > .sigma(sigmas) > > > > > > > // Morgoth returns any anomalous windows > > > > > > > |alert() > > > > > > > .details('') > > > > > > > .crit(lamda: TRUE) > > > > > > > .log('/tmp/cpu_alert.log') > > > > > > > > > > > > Thanks a lot Nathaneil for your explanation on Morgoth, I have come back > > with a new example and its set of alerts. I will brief on what I am trying > > to achieve here. > > > > > > > > > > > > Below a set of data with count of errors(eventcount) that occurred for a > > particular errorcode out of IIS logs. I want to run Morgoth on field > > eventcount to detect if its an anomaly. > > > > > > > > > > > > time app eventcount status tech > > > > > > 2016-11-07T11:31:28.261Z "OTSI" 586 "Success" > > "IIS" > > > > > > 2016-11-07T11:32:03.254Z "OTSI" 1 "Failure" > > "IIS" > > > > > > 2016-11-07T11:33:03.243Z "OTSI" 8 "Success" > > "IIS" > > > > > > 2016-11-07T11:33:23.259Z "ANALYTICS" 158 "Success" > > "IIS" > > > > > > 2016-11-07T11:33:23.26Z "ANALYTICS" 24 "Failure" > > "IIS" > > > > > > > > > > > > My tickscript: > > > > > > > > > > > > TICKscript: > > > > > > // The measurement to analyze > > > > > > var measurement = 'eventflow_IIS' > > > > > > > > > > > > // The amount of data to window at once > > > > > > var window = 1m > > > > > > > > > > > > // The field to process > > > > > > var field = 'eventcount' > > > > > > > > > > > > // The name for the anomaly score field > > > > > > var scoreField = 'anomalyScore' > > > > > > > > > > > > // The minimum support > > > > > > var minSupport = 0.05 > > > > > > > > > > > > // The error tolerance > > > > > > var errorTolerance = 0.01 > > > > > > > > > > > > // The consensus > > > > > > var consensus = 0.5 > > > > > > > > > > > > // Number of sigmas allowed for normal window deviation > > > > > > var sigmas = 3.0 > > > > > > > > > > > > batch > > > > > > |query(''' > > > > > > SELECT * > > > > > > FROM "statistics"."autogen"."eventflow_IIS" > > > > > > ''') > > > > > > .period(1m) > > > > > > .every(1m) > > > > > > .groupBy(*) > > > > > > // |.where(lambda: TRUE) > > > > > > @morgoth() > > > > > > .field(field) > > > > > > .scoreField(scoreField) > > > > > > .minSupport(minSupport) > > > > > > .errorTolerance(errorTolerance) > > > > > > .consensus(consensus) > > > > > > // Configure a single Sigma fingerprinter > > > > > > .sigma(sigmas) > > > > > > // Morgoth returns any anomalous windows > > > > > > |alert() > > > > > > .details('Count is anomalous') > > > > > > .id('kapacitor/{{ .TaskName }}/{{ .Name }}/{{ .Group }}') > > > > > > .message('{{ .ID }} is at level {{ .Level }} Errorcount is:{{ index > > .Fields "eventcount" }}') > > > > > > .crit(lambda: TRUE) > > > > > > .log('/tmp/morgothbb.log') > > > > > > |influxDBOut() > > > > > > .database('anomaly') > > > > > > .retentionPolicy('autogen') > > > > > > .flushInterval(1s) > > > > > > .measurement('Anomaly') > > > > > > // .tag('eventcount','field') > > > > > > // .tag('AnomalyScore','scoreField') > > > > > > // .tag('Time','time') > > > > > > // .tag('Status','status') > > > > > > .precision('u') > > > > > > > > > > > > Below is the alert what it has generated pumped into a table. > > > > > > > > > > > > time > > anomalyScore app eventcount status > > tech > > > > > > 2016-11-08T09:34:40.169285533Z 0.95 > > "OTSI" 296 "Success" "IIS" > > > > > > 2016-11-08T09:35:40.171285533Z 0.9523809523809523 > > "OTSI" 28 "Success" "IIS" > > > > > > 2016-11-08T09:36:40.170285533Z 0.9545454545454546 > > "OTSI" 12 "Success" "IIS" > > > > > > 2016-11-08T09:37:40.169285533Z 0.9565217391304348 > > "OTSI" 20 "Success" "IIS" > > > > > > 2016-11-08T09:38:40.170285533Z 0.9583333333333334 > > "OTSI" 249 "Success" "IIS" > > > > > > 2016-11-08T09:39:40.167285533Z 0.96 > > "OTSI" 70 "Success" "IIS" > > > > > > 2016-11-08T09:43:00.167285533Z 0.9615384615384616 > > "ANALYTICS" 1 "Success" "IIS" > > > > > > 2016-11-08T09:43:40.164285533Z 0.962962962962963 > > "OTSI" 24 "Success" "IIS" > > > > > > 2016-11-08T09:52:00.160285533Z 0.9642857142857143 > > "ANALYTICS" 1 "Success" "IIS" > > > > > > > > > > > > My question is: > > > > > > > > > > > > How to interpret the anomaly score generated here ~0.95 with the counts for > > which Morgoth has triggered an Anomaly.Going by our earliar discussion > > Support here turns out to be ~0.05 (1- Anomaly Score). And anomaly gets > > triggered when (support < Min Support), so in this case it turns out 0.05 < > > 0.05 which should not be true. But still anomaly is getting triggered > > almost every minute. Could you please help me understand this. > > > > > > > > > > > > Also let me know if e,M,N need to be tweaked here for this particular data > > sample to generate meaningful alert out of it. -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/031c427d-1fc9-4c31-8056-7e8f296518cf%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
