Hey Alex, Thanks for the feedback.
So based on that approach, to establish ground truth, all I have to do is select anomaly points that designate the start of an anomalous pattern and then dump that into the JSON file as [start, end] = [anom_pos - window/2, anon_pos + window/2] where window is calculated with the heuristic you mentioned. Is that correct? My one remaining issue is how to produce anomaly scores from my custom detector in a way that makes it as comparable as possible to online detectors. By that I mean finding a way to use a window based anomaly detector in NAB which is tailored to online point by point anomaly detection. The window anomaly detection method I’m using reacts to how periodic the pattern within the window is since a periodic regular pattern indicates normal breathing while flat lines and irregular patterns designate abnormal breathing patterns. So to assign an anomaly score of 1 to all points within an anomalous window is reasonable (I think) if the frame/window size is kept small enough to not pour over to normal patterns. If I were to do that, the anomaly would be chosen as the middle point. Would that really be that unfair to the other detectors? Also, one more point of concern. The documentation suggests that the user should use the format in appendix F to create a file that can be scored (using path III). The format seems to be wrong (timestamp, value, label). The only way I’ve managed to get it to work is by including anomaly scores before the label column. I looked in the scorer.py code and it definitely requires that field as it’s extracted from the panda object. So… if I were to use the approach mentioned above, I’d have to create a csv file with the amended format where label is populated by ‘1’s inside the windows specified in the ground truth anomalies of the JSON file, and anomaly_score is populated with ’1’s within the anomalous windows detected by the DUT. Correct? I do want to thank you for the prompt replies. You’ve been really helpful and given the fact that my defense is in a week, I really appreciate you taking time even on a weekend to help out. Thanks! Nick > On Apr 20, 2015, at 1:58 AM, Alex Lavin <[email protected]> wrote: > > Nick, yes the combined anomaly json file contains anomaly windows, which > define the periods within which detections are true positives. The file is a > dictionary where the keys and values are the file names and lists of windows, > respectively. Each window is specified by two timestamps in a list. > > You are correct that window sizes are calculated with > 0.1*data_size/numOfAnomalies. I would recommend > against defining anomaly windows in the method you described, for two main > reasons: > The NAB scoring function relies on the fact that a given anomaly starts > precisely at the center of the window. It is a scaled sigmoid, where true > positives early in the window score higher than those later; we can assign > appropriate values to earlier/later detections. > Merely checking windows for the existence of an anomaly, as in your method, > ignores the value of making detections as early as possible; you may as well > count the total true/false positives/negatives. Scoring in this way tells us > very little about the performance of an algorithm as it attempts to detect > anomalies in real-time. > Best, > Alex > > Alexander Lavin > Software Engineer > Numenta
