If the data isn't in /var/lib/kapacitor/replay then check the value of the `[replay] dir` config option for where the data is stored.
On Wednesday, November 9, 2016 at 11:59:57 PM UTC-7, [email protected] wrote: > > Hi Nathan, > > I do not find these files under the dir /var/lib/kapacitor/replay. But > when I say 'kapacitor list recordings', I find the below. > > > ID Type Status Size Date > 2aa4cc3b-964d-4956-85ef-77f671fded6f batch finished 6.4 kB 09 Nov > 16 01:18 EST > 1562b674-cbff-497b-be34-781da1ae9d4f batch finished 5.9 kB 08 Nov > 16 23:41 EST > 823efccb-3241-40d0-8b19-6b55a3f147ee batch finished 5.8 kB 08 Nov > 16 22:50 EST > b07bac9f-0d6e-4324-9301-f7114834135e batch finished 1.8 kB 08 Nov > 16 07:56 EST > 0d3cb557-e993-4656-bf55-80b403ad7228 stream finished 23 B 08 Nov > 16 07:42 EST > 6d8820d2-d674-448d-92de-cef0a2494267 batch finished 271 B 08 Nov > 16 06:26 EST > 5f4988a0-58dd-4926-965c-dd98d7492b8f batch finished 622 B 08 Nov > 16 04:35 EST > 6e8eb49f-32fa-4aa5-ba75-736658dd326d batch finished 3.7 kB 08 Nov > 16 04:10 EST > 3139e958-54fb-441d-bc8b-11b1c8c5a6a3 batch finished 121 B 08 Nov > 16 02:49 EST > 4cc14c6e-842d-41a3-be9b-0298f24cbc3 > > How can I access these files to read their contents? > > Thanks, > Amith > > > On Wednesday, 9 November 2016 22:25:14 UTC+5:30, [email protected] > wrote: > > 1. Yes, the srpl files are just a gzipped line protocol file. The brpl > files are a zip of serveral files containing the json data for the > recording. > > 2. In my previous post I explained how average support was computed, and > linked to docs on the lossy counting algorithm which it is origin. > > > > On Wednesday, November 9, 2016 at 9:38:56 AM UTC-7, amith hegde wrote: > > Thankyou for your advice, I am working on this piece. In the meanwhile I > have couple of questions if you can help me with. > > > > 1. Is it possible to take a look at the recording to see what data it > holds? If yes how can we do that? > > > > 2. How is the Anomaly score determined.? What is the formula to > calculate anomalyScore? If it is (1-averagesupport ), even average support > is not a defined value. > > > > Thanks, > > > > Amith > > > > > > > > On Nov 8, 2016 9:33 PM, <[email protected]> wrote: > > > > The actual comparison is <= which is why you received the alert. But if > your tolerances are tight enough that <= matters over < then you are > probably too tight on your tolerances. > > > > > > I would first recommend that you tweak the sigmas value, may increase it > to 3.5 or 4. To iterate quickly on for these tests I recommend that you > create a recording of the data set and then tweak value replay the > recording check the results, and repeat until you have something you like. > If you share your recording with me I would be willing to take a quick look > as well. As it is its a little hard to give good advice based of a handful > of data points. > > > > On Tuesday, November 8, 2016 at 7:47:39 AM UTC-7, [email protected] > wrote:On Thursday, 27 October 2016 21:46:08 UTC+5:30, [email protected] > wrote: > > > > > Clarification from Amith: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Nathaniel, > > > > > > > > > > > > > > > Thanks a lot for your quick reply, what is confusing for me here is > how morgoth calculated anomalyScore field whose value has turned out to be > 0.9897172236503856. And how is this being used to detect anomaly. > > > > > How does this particular node function > > > > > > > > > > > > > > > > > > > > … > > > > > > > > > > @morgoth() > > > > > .field(field) > > > > > .scoreField(scoreField) > > > > > .minSupport(minSupport) > > > > > .errorTolerance(errorTolerance) > > > > > .consensus(consensus) > > > > > // Configure a single Sigma fingerprinter > > > > > > > > > > > > > > > > > > > > > > > > > .sigma(sigmas). > > > > > > > > > > > > > > > You can choose some arbitrary data to help me understand this. :) > > > > > Thanks, > > > > > Amith > > > > > > > > > > > > > > > My response: > > > > > > > > > > > > > > > The `anomalyScore` is `1 - averageSupport`, where averageSupport is > the average of the support values returned from each or the fingerprinters. > In your case you only have one fingerprinter `sigma` so using the > anomalyScore of ~ `0.99` we can determine that the sigma fingerprinter > returned a support of ~ `0.01`. Support is defined as `count / total`, > where count is the number of times a specific event has been seen and total > is the total number events seen. The support can be interpreted as a > frequency percentage, i.e. the most recent window has only been seen 1% of > the time. Since 0.01 is < 0.05 (the min support defined) an anomaly was > triggered. Taking this back to the anomaly score it can be interpreted that > 99% of the time we do not see an event like this one. > > > > > > > > > > > > > > > Remember that Morgoth distinguishs different windows as different > events using the fingerprinters. In your case the sigma function is > computing the std deviation and mean of the windows it receives. If a > window arrives that is more than 3 stddevs away from the mean than it is > not considered the same event and is a unique event. > > > > > > > > > > > > > > > Taking all of that and putting it together receiving an anomaly score > of 99% out of Morgoth for your setup can be interpreted as: You have sent > several 1m windows to Morgoth. The window that triggered the anomaly event > is only similar to ~1% of those windows, where similar is defined as being > within 3 std deviations. > > > > > > > > > > > > > > > > > > > > > > > > > On Thursday, October 27, 2016 at 9:30:13 AM UTC-6, > [email protected] wrote: > > > > > > > > > > > > > > > > > > > > In short there are two parts to Morgoth. > > > > > > > > > > > > > > > 1. A system that counts the frequency of different kinds of events. > This is the lossy counting part > > > > > 2. A system that determines if a window of data is the same as an > existing event being tracked or something new. This is the fingerprinting > part. > > > > > > > > > > > > > > > > > > > > Here is a quick read through for those concepts > http://docs.morgoth.io/docs/detection_framework/ > > > > > > > > > > > > > > > > > > > > Its a little hard to tell if Morgoth has done anything unexpected > without more detail. Can you share some of the data that lead to this > alert, so I can talk to the specifics of what is going on? Or maybe you > could ask a more specific question about which part is confusing? > > > > > > > > > > > > > > > > > > > > > > > > > On Thursday, October 27, 2016 at 6:47:02 AM UTC-6, [email protected] > wrote:Hi All, > > > > > I am trying to run morgoth as a child process to kapacitor, but I am > failing understand how morgoth functions. Below is the sample tick script I > tried out of the Morgoth docs. This is generating some alerts but I am > unable to figure out if they are suppose to get triggered way they have. > Pasting a snippet out of alert as well. > > > > > I basically want to understand the functioning of Morgoth through this > example. > > > > > Alert > > > > > =================================================================== > > > > > { > > > > > "id":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal,", > > > > > "message":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal, is > CRITICAL", > > > > > "details":"", > > > > > "time":"2016-10-27T11:33:00Z", > > > > > "duration":21780000000000, > > > > > "level":"CRITICAL", > > > > > "data":{ > > > > > "series":[ > > > > > { > > > > > "name":"cpu", > > > > > "tags":{ > > > > > "cpu":"cpu-total", > > > > > "host":"ip-10-121-48-24.ec2.internal" > > > > > }, > > > > > "columns":[ > > > > > "time", > > > > > "anomalyScore", > > > > > "usage_guest", > > > > > "usage_guest_nice", > > > > > "usage_idle", > > > > > "usage_iowait", > > > > > "usage_irq", > > > > > "usage_nice", > > > > > "usage_softirq", > > > > > "usage_steal", > > > > > "usage_system", > > > > > "usage_user" > > > > > ], > > > > > "values":[ > > > > > [ > > > > > "2016-10-27T11:33:00Z", > > > > > 0.9897172236503856, > > > > > 0, > > > > > 0, > > > > > 99.49748743708487, > > > > > 0, > > > > > 0, > > > > > 0, > > > > > 0, > > > > > 0, > > > > > 0.5025125628122904, > > > > > 0 > > > > > ] > > > > > =================================================================== > > > > > // The measurement to analyze > > > > > var measurement = 'cpu' > > > > > // Optional group by dimensions > > > > > var groups = [*] > > > > > // Optional where filter > > > > > var whereFilter = lambda: TRUE > > > > > // The amount of data to window at once > > > > > var window = 1m > > > > > // The field to process > > > > > var field = 'usage_idle' > > > > > // The name for the anomaly score field > > > > > var scoreField = 'anomalyScore' > > > > > // The minimum support > > > > > var minSupport = 0.05 > > > > > // The error tolerance > > > > > var errorTolerance = 0.01 > > > > > // The consensus > > > > > var consensus = 0.5 > > > > > // Number of sigmas allowed for normal window deviation > > > > > var sigmas = 3.0 > > > > > stream > > > > > // Select the data we want > > > > > |from() > > > > > .measurement(measurement) > > > > > .groupBy(groups) > > > > > .where(whereFilter) > > > > > // Window the data for a certain amount of time > > > > > |window() > > > > > .period(window) > > > > > .every(window) > > > > > .align() > > > > > // Send each window to Morgoth > > > > > @morgoth() > > > > > .field(field) > > > > > .scoreField(scoreField) > > > > > .minSupport(minSupport) > > > > > .errorTolerance(errorTolerance) > > > > > .consensus(consensus) > > > > > // Configure a single Sigma fingerprinter > > > > > .sigma(sigmas) > > > > > // Morgoth returns any anomalous windows > > > > > |alert() > > > > > .details('') > > > > > .crit(lamda: TRUE) > > > > > .log('/tmp/cpu_alert.log') > > > > > > > > Thanks a lot Nathaneil for your explanation on Morgoth, I have come back > with a new example and its set of alerts. I will brief on what I am trying > to achieve here. > > > > > > > > Below a set of data with count of errors(eventcount) that occurred for a > particular errorcode out of IIS logs. I want to run Morgoth on field > eventcount to detect if its an anomaly. > > > > > > > > time app eventcount status tech > > > > > 2016-11-07T11:31:28.261Z "OTSI" 586 "Success" > "IIS" > > > > > > 2016-11-07T11:32:03.254Z "OTSI" 1 "Failure" "IIS" > > > > > > 2016-11-07T11:33:03.243Z "OTSI" 8 "Success" "IIS" > > > > > > 2016-11-07T11:33:23.259Z "ANALYTICS" 158 "Success" > "IIS" > > > > > > 2016-11-07T11:33:23.26Z "ANALYTICS" 24 "Failure" > "IIS" > > > > > > > > > My tickscript: > > > > > > > > TICKscript: > > > > // The measurement to analyze > > > > var measurement = 'eventflow_IIS' > > > > > > > > // The amount of data to window at once > > > > var window = 1m > > > > > > > > // The field to process > > > > var field = 'eventcount' > > > > > > > > // The name for the anomaly score field > > > > var scoreField = 'anomalyScore' > > > > > > > > // The minimum support > > > > var minSupport = 0.05 > > > > > > > > // The error tolerance > > > > var errorTolerance = 0.01 > > > > > > > > // The consensus > > > > var consensus = 0.5 > > > > > > > > // Number of sigmas allowed for normal window deviation > > > > var sigmas = 3.0 > > > > > > > > batch > > > > |query(''' > > > > SELECT * > > > > FROM "statistics"."autogen"."eventflow_IIS" > > > > ''') > > > > .period(1m) > > > > .every(1m) > > > > .groupBy(*) > > > > // |.where(lambda: TRUE) > > > > @morgoth() > > > > .field(field) > > > > .scoreField(scoreField) > > > > .minSupport(minSupport) > > > > .errorTolerance(errorTolerance) > > > > .consensus(consensus) > > > > // Configure a single Sigma fingerprinter > > > > .sigma(sigmas) > > > > // Morgoth returns any anomalous windows > > > > |alert() > > > > .details('Count is anomalous') > > > > .id('kapacitor/{{ .TaskName }}/{{ .Name }}/{{ .Group }}') > > > > .message('{{ .ID }} is at level {{ .Level }} Errorcount is:{{ > index .Fields "eventcount" }}') > > > > .crit(lambda: TRUE) > > > > .log('/tmp/morgothbb.log') > > > > |influxDBOut() > > > > .database('anomaly') > > > > .retentionPolicy('autogen') > > > > .flushInterval(1s) > > > > .measurement('Anomaly') > > > > // .tag('eventcount','field') > > > > // .tag('AnomalyScore','scoreField') > > > > // .tag('Time','time') > > > > // .tag('Status','status') > > > > .precision('u') > > > > > > > > Below is the alert what it has generated pumped into a table. > > > > > > > > > time > anomalyScore app eventcount status > tech > > > > > > 2016-11-08T09:34:40.169285533Z 0.95 > "OTSI" 296 "Success" "IIS" > > > > > > 2016-11-08T09:35:40.171285533Z 0.9523809523809523 > "OTSI" 28 "Success" "IIS" > > > > > > 2016-11-08T09:36:40.170285533Z 0.9545454545454546 > "OTSI" 12 "Success" "IIS" > > > > > > 2016-11-08T09:37:40.169285533Z 0.9565217391304348 > "OTSI" 20 "Success" "IIS" > > > > > > 2016-11-08T09:38:40.170285533Z 0.9583333333333334 > "OTSI" 249 "Success" "IIS" > > > > > > 2016-11-08T09:39:40.167285533Z 0.96 > "OTSI" 70 "Success" "IIS" > > > > > > 2016-11-08T09:43:00.167285533Z 0.9615384615384616 > "ANALYTICS" 1 "Success" "IIS" > > > > > > 2016-11-08T09:43:40.164285533Z 0.962962962962963 > "OTSI" 24 "Success" "IIS" > > > > > > 2016-11-08T09:52:00.160285533Z 0.9642857142857143 > "ANALYTICS" 1 "Success" "IIS" > > > > > > > > > My question is: > > > > > > > > How to interpret the anomaly score generated here ~0.95 with the counts > for which Morgoth has triggered an Anomaly.Going by our earliar discussion > Support here turns out to be ~0.05 (1- Anomaly Score). And anomaly gets > triggered when (support < Min Support), so in this case it turns out 0.05 < > 0.05 which should not be true. But still anomaly is getting triggered > almost every minute. Could you please help me understand this. > > > > > > > > Also let me know if e,M,N need to be tweaked here for this particular > data sample to generate meaningful alert out of it. > > -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/78be8c3f-b039-49da-8ac5-0878077af8b1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
