Hey Alex, 

Thanks for the feedback. 

So based on that approach, to establish ground truth, all I have to do is 
select anomaly points that designate the start of an anomalous pattern and then 
dump that into the JSON file as [start, end] = [anom_pos - window/2, anon_pos + 
window/2] where window is calculated with the heuristic you mentioned. Is that 
correct?

My one remaining issue is how to produce anomaly scores from my custom detector 
in a way that makes it as comparable as possible to online detectors. By that I 
mean finding a way to use a window based anomaly detector in NAB which is 
tailored to online point by point anomaly detection. 

The window anomaly detection method I’m using reacts to how periodic the 
pattern within the window is since a periodic regular pattern indicates normal 
breathing while flat lines and irregular patterns designate abnormal breathing 
patterns. So to assign an anomaly score of 1 to all points within an anomalous 
window is reasonable (I think) if the frame/window size is kept small enough to 
not pour over to normal patterns. If I were to do that, the anomaly would be 
chosen as the middle point. Would that really be that unfair to the other 
detectors?

Also, one more point of concern. The documentation suggests that the user 
should use the format in appendix F to create a file that can be scored (using 
path III). The format seems to be wrong (timestamp, value, label). The only way 
I’ve managed to get it to work is by including anomaly scores before the label 
column. I looked in the scorer.py code and it definitely requires that field as 
it’s extracted from the panda object. 

So… if I were to use the approach mentioned above, I’d have to create a csv 
file with the amended format where label is populated by ‘1’s inside the 
windows specified in the ground truth anomalies of the JSON file, and 
anomaly_score is populated with ’1’s within the anomalous windows detected by 
the DUT. Correct?

I do want to thank you for the prompt replies. You’ve been really helpful and 
given the fact that my defense is in a week, I really appreciate you taking 
time even on a weekend to help out. 

Thanks!
Nick



> On Apr 20, 2015, at 1:58 AM, Alex Lavin <[email protected]> wrote:
> 
> Nick, yes the combined anomaly json file contains anomaly windows, which 
> define the periods within which detections are true positives. The file is a 
> dictionary where the keys and values are the file names and lists of windows, 
> respectively. Each window is specified by two timestamps in a list.
> 
> You are correct that window sizes are calculated with 
> 0.1*data_size/numOfAnomalies. I would recommend
>  against defining anomaly windows in the method you described, for two main 
> reasons:
> The NAB scoring function relies on the fact that a given anomaly starts 
> precisely at the center of the window. It is a scaled sigmoid, where true 
> positives early in the window score higher than those later; we can assign 
> appropriate values to earlier/later detections.
> Merely checking windows for the existence of an anomaly, as in your method, 
> ignores the value of making detections as early as possible; you may as well 
> count the total true/false positives/negatives. Scoring in this way tells us 
> very little about the performance of an algorithm as it attempts to detect 
> anomalies in real-time.
> Best,
> Alex
> 
> Alexander Lavin
> Software Engineer
> Numenta

Reply via email to