Hi Nathan, I do not find these files under the dir /var/lib/kapacitor/replay. But when I say 'kapacitor list recordings', I find the below.
ID Type Status Size Date 2aa4cc3b-964d-4956-85ef-77f671fded6f batch finished 6.4 kB 09 Nov 16 01:18 EST 1562b674-cbff-497b-be34-781da1ae9d4f batch finished 5.9 kB 08 Nov 16 23:41 EST 823efccb-3241-40d0-8b19-6b55a3f147ee batch finished 5.8 kB 08 Nov 16 22:50 EST b07bac9f-0d6e-4324-9301-f7114834135e batch finished 1.8 kB 08 Nov 16 07:56 EST 0d3cb557-e993-4656-bf55-80b403ad7228 stream finished 23 B 08 Nov 16 07:42 EST 6d8820d2-d674-448d-92de-cef0a2494267 batch finished 271 B 08 Nov 16 06:26 EST 5f4988a0-58dd-4926-965c-dd98d7492b8f batch finished 622 B 08 Nov 16 04:35 EST 6e8eb49f-32fa-4aa5-ba75-736658dd326d batch finished 3.7 kB 08 Nov 16 04:10 EST 3139e958-54fb-441d-bc8b-11b1c8c5a6a3 batch finished 121 B 08 Nov 16 02:49 EST 4cc14c6e-842d-41a3-be9b-0298f24cbc3 How can I access these files to read their contents? Thanks, Amith On Wednesday, 9 November 2016 22:25:14 UTC+5:30, [email protected] wrote: > 1. Yes, the srpl files are just a gzipped line protocol file. The brpl files > are a zip of serveral files containing the json data for the recording. > 2. In my previous post I explained how average support was computed, and > linked to docs on the lossy counting algorithm which it is origin. > > On Wednesday, November 9, 2016 at 9:38:56 AM UTC-7, amith hegde wrote: > Thankyou for your advice, I am working on this piece. In the meanwhile I have > couple of questions if you can help me with. > > 1. Is it possible to take a look at the recording to see what data it holds? > If yes how can we do that? > > 2. How is the Anomaly score determined.? What is the formula to calculate > anomalyScore? If it is (1-averagesupport ), even average support is not a > defined value. > > Thanks, > > Amith > > > > On Nov 8, 2016 9:33 PM, <[email protected]> wrote: > > The actual comparison is <= which is why you received the alert. But if your > tolerances are tight enough that <= matters over < then you are probably too > tight on your tolerances. > > > I would first recommend that you tweak the sigmas value, may increase it to > 3.5 or 4. To iterate quickly on for these tests I recommend that you create a > recording of the data set and then tweak value replay the recording check the > results, and repeat until you have something you like. If you share your > recording with me I would be willing to take a quick look as well. As it is > its a little hard to give good advice based of a handful of data points. > > On Tuesday, November 8, 2016 at 7:47:39 AM UTC-7, [email protected] wrote:On > Thursday, 27 October 2016 21:46:08 UTC+5:30, [email protected] wrote: > > > Clarification from Amith: > > > > > > > > > > > > > > > > > > > > > Hi Nathaniel, > > > > > > > > > Thanks a lot for your quick reply, what is confusing for me here is how > > morgoth calculated anomalyScore field whose value has turned out to be > > 0.9897172236503856. And how is this being used to detect anomaly. > > > How does this particular node function > > > > > > > > > > > > … > > > > > > @morgoth() > > > .field(field) > > > .scoreField(scoreField) > > > .minSupport(minSupport) > > > .errorTolerance(errorTolerance) > > > .consensus(consensus) > > > // Configure a single Sigma fingerprinter > > > > > > > > > > > > > > > .sigma(sigmas). > > > > > > > > > You can choose some arbitrary data to help me understand this. :) > > > Thanks, > > > Amith > > > > > > > > > My response: > > > > > > > > > The `anomalyScore` is `1 - averageSupport`, where averageSupport is the > > average of the support values returned from each or the fingerprinters. In > > your case you only have one fingerprinter `sigma` so using the anomalyScore > > of ~ `0.99` we can determine that the sigma fingerprinter returned a > > support of ~ `0.01`. Support is defined as `count / total`, where count is > > the number of times a specific event has been seen and total is the total > > number events seen. The support can be interpreted as a frequency > > percentage, i.e. the most recent window has only been seen 1% of the time. > > Since 0.01 is < 0.05 (the min support defined) an anomaly was triggered. > > Taking this back to the anomaly score it can be interpreted that 99% of the > > time we do not see an event like this one. > > > > > > > > > Remember that Morgoth distinguishs different windows as different events > > using the fingerprinters. In your case the sigma function is computing the > > std deviation and mean of the windows it receives. If a window arrives that > > is more than 3 stddevs away from the mean than it is not considered the > > same event and is a unique event. > > > > > > > > > Taking all of that and putting it together receiving an anomaly score of > > 99% out of Morgoth for your setup can be interpreted as: You have sent > > several 1m windows to Morgoth. The window that triggered the anomaly event > > is only similar to ~1% of those windows, where similar is defined as being > > within 3 std deviations. > > > > > > > > > > > > > > > On Thursday, October 27, 2016 at 9:30:13 AM UTC-6, [email protected] > > wrote: > > > > > > > > > > > > In short there are two parts to Morgoth. > > > > > > > > > 1. A system that counts the frequency of different kinds of events. This is > > the lossy counting part > > > 2. A system that determines if a window of data is the same as an existing > > event being tracked or something new. This is the fingerprinting part. > > > > > > > > > > > > Here is a quick read through for those concepts > > http://docs.morgoth.io/docs/detection_framework/ > > > > > > > > > > > > Its a little hard to tell if Morgoth has done anything unexpected without > > more detail. Can you share some of the data that lead to this alert, so I > > can talk to the specifics of what is going on? Or maybe you could ask a > > more specific question about which part is confusing? > > > > > > > > > > > > > > > On Thursday, October 27, 2016 at 6:47:02 AM UTC-6, [email protected] > > wrote:Hi All, > > > I am trying to run morgoth as a child process to kapacitor, but I am > > failing understand how morgoth functions. Below is the sample tick script I > > tried out of the Morgoth docs. This is generating some alerts but I am > > unable to figure out if they are suppose to get triggered way they have. > > Pasting a snippet out of alert as well. > > > I basically want to understand the functioning of Morgoth through this > > example. > > > Alert > > > =================================================================== > > > { > > > "id":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal,", > > > "message":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal, is > > CRITICAL", > > > "details":"", > > > "time":"2016-10-27T11:33:00Z", > > > "duration":21780000000000, > > > "level":"CRITICAL", > > > "data":{ > > > "series":[ > > > { > > > "name":"cpu", > > > "tags":{ > > > "cpu":"cpu-total", > > > "host":"ip-10-121-48-24.ec2.internal" > > > }, > > > "columns":[ > > > "time", > > > "anomalyScore", > > > "usage_guest", > > > "usage_guest_nice", > > > "usage_idle", > > > "usage_iowait", > > > "usage_irq", > > > "usage_nice", > > > "usage_softirq", > > > "usage_steal", > > > "usage_system", > > > "usage_user" > > > ], > > > "values":[ > > > [ > > > "2016-10-27T11:33:00Z", > > > 0.9897172236503856, > > > 0, > > > 0, > > > 99.49748743708487, > > > 0, > > > 0, > > > 0, > > > 0, > > > 0, > > > 0.5025125628122904, > > > 0 > > > ] > > > =================================================================== > > > // The measurement to analyze > > > var measurement = 'cpu' > > > // Optional group by dimensions > > > var groups = [*] > > > // Optional where filter > > > var whereFilter = lambda: TRUE > > > // The amount of data to window at once > > > var window = 1m > > > // The field to process > > > var field = 'usage_idle' > > > // The name for the anomaly score field > > > var scoreField = 'anomalyScore' > > > // The minimum support > > > var minSupport = 0.05 > > > // The error tolerance > > > var errorTolerance = 0.01 > > > // The consensus > > > var consensus = 0.5 > > > // Number of sigmas allowed for normal window deviation > > > var sigmas = 3.0 > > > stream > > > // Select the data we want > > > |from() > > > .measurement(measurement) > > > .groupBy(groups) > > > .where(whereFilter) > > > // Window the data for a certain amount of time > > > |window() > > > .period(window) > > > .every(window) > > > .align() > > > // Send each window to Morgoth > > > @morgoth() > > > .field(field) > > > .scoreField(scoreField) > > > .minSupport(minSupport) > > > .errorTolerance(errorTolerance) > > > .consensus(consensus) > > > // Configure a single Sigma fingerprinter > > > .sigma(sigmas) > > > // Morgoth returns any anomalous windows > > > |alert() > > > .details('') > > > .crit(lamda: TRUE) > > > .log('/tmp/cpu_alert.log') > > > > Thanks a lot Nathaneil for your explanation on Morgoth, I have come back with > a new example and its set of alerts. I will brief on what I am trying to > achieve here. > > > > Below a set of data with count of errors(eventcount) that occurred for a > particular errorcode out of IIS logs. I want to run Morgoth on field > eventcount to detect if its an anomaly. > > > > time app eventcount status tech > > 2016-11-07T11:31:28.261Z "OTSI" 586 "Success" > "IIS" > > 2016-11-07T11:32:03.254Z "OTSI" 1 "Failure" "IIS" > > > 2016-11-07T11:33:03.243Z "OTSI" 8 "Success" "IIS" > > 2016-11-07T11:33:23.259Z "ANALYTICS" 158 "Success" > "IIS" > > 2016-11-07T11:33:23.26Z "ANALYTICS" 24 "Failure" > "IIS" > > > > My tickscript: > > > > TICKscript: > > // The measurement to analyze > > var measurement = 'eventflow_IIS' > > > > // The amount of data to window at once > > var window = 1m > > > > // The field to process > > var field = 'eventcount' > > > > // The name for the anomaly score field > > var scoreField = 'anomalyScore' > > > > // The minimum support > > var minSupport = 0.05 > > > > // The error tolerance > > var errorTolerance = 0.01 > > > > // The consensus > > var consensus = 0.5 > > > > // Number of sigmas allowed for normal window deviation > > var sigmas = 3.0 > > > > batch > > |query(''' > > SELECT * > > FROM "statistics"."autogen"."eventflow_IIS" > > ''') > > .period(1m) > > .every(1m) > > .groupBy(*) > > // |.where(lambda: TRUE) > > @morgoth() > > .field(field) > > .scoreField(scoreField) > > .minSupport(minSupport) > > .errorTolerance(errorTolerance) > > .consensus(consensus) > > // Configure a single Sigma fingerprinter > > .sigma(sigmas) > > // Morgoth returns any anomalous windows > > |alert() > > .details('Count is anomalous') > > .id('kapacitor/{{ .TaskName }}/{{ .Name }}/{{ .Group }}') > > .message('{{ .ID }} is at level {{ .Level }} Errorcount is:{{ index > .Fields "eventcount" }}') > > .crit(lambda: TRUE) > > .log('/tmp/morgothbb.log') > > |influxDBOut() > > .database('anomaly') > > .retentionPolicy('autogen') > > .flushInterval(1s) > > .measurement('Anomaly') > > // .tag('eventcount','field') > > // .tag('AnomalyScore','scoreField') > > // .tag('Time','time') > > // .tag('Status','status') > > .precision('u') > > > > Below is the alert what it has generated pumped into a table. > > > > time > anomalyScore app eventcount status > tech > > 2016-11-08T09:34:40.169285533Z 0.95 > "OTSI" 296 "Success" "IIS" > > 2016-11-08T09:35:40.171285533Z 0.9523809523809523 > "OTSI" 28 "Success" "IIS" > > 2016-11-08T09:36:40.170285533Z 0.9545454545454546 > "OTSI" 12 "Success" "IIS" > > 2016-11-08T09:37:40.169285533Z 0.9565217391304348 > "OTSI" 20 "Success" "IIS" > > 2016-11-08T09:38:40.170285533Z 0.9583333333333334 > "OTSI" 249 "Success" "IIS" > > 2016-11-08T09:39:40.167285533Z 0.96 > "OTSI" 70 "Success" "IIS" > > 2016-11-08T09:43:00.167285533Z 0.9615384615384616 > "ANALYTICS" 1 "Success" "IIS" > > 2016-11-08T09:43:40.164285533Z 0.962962962962963 > "OTSI" 24 "Success" "IIS" > > 2016-11-08T09:52:00.160285533Z 0.9642857142857143 > "ANALYTICS" 1 "Success" "IIS" > > > > My question is: > > > > How to interpret the anomaly score generated here ~0.95 with the counts for > which Morgoth has triggered an Anomaly.Going by our earliar discussion > Support here turns out to be ~0.05 (1- Anomaly Score). And anomaly gets > triggered when (support < Min Support), so in this case it turns out 0.05 < > 0.05 which should not be true. But still anomaly is getting triggered almost > every minute. Could you please help me understand this. > > > > Also let me know if e,M,N need to be tweaked here for this particular data > sample to generate meaningful alert out of it. -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/3584d632-73cd-43f4-8375-3a7c6efeca26%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
