Hi Mark,
I'd like to point you to NAB [1], our benchmark for anomaly detection in 
streaming data. Included in the corpus are 17 data files representing a variety 
of server metrics, where we specifically selected these files for NAB because 
they test detectors for the problems you described.

I've plotted a few examples you may be interested in [2-4], where the red dots 
represent the starting point of true anomalies, and the diamonds mark 
detections by the HTM anomaly detection algorithm (green and red are true and 
false positives, respectively).

On your previous questions...
- We typically say HTM needs 1000 data instances to sufficiently learn the 
temporal patterns such that it can start reliably making predictions (and 
anomaly detections). You'll notice the anomaly scores are relatively high at 
the beginning of a data stream, but settle down after HTM has learned the 
sequences well.
- A very noisy stream will result in FP detections, but this is true of any 
anomaly detection algorithm. To decrease the number of false positives, you can 
increase the threshold on the anomaly likelihood. That is, fewer data points 
will be flagged as anomalous, but this may come at the cost of an increase in 
false negatives.
- The temporal memory has a large capacity for storing patterns of sequences, 
so this depends on what you mean by "prolonged use". The anomaly likelihood 
estimation uses several parameters [5] related to how much previous data is 
used to reestimate the distribution, but tweaking these generally has little 
effect on the resulting detections.

[1] https://github.com/numenta/NAB
[2] 
https://plot.ly/~alavin/3151/anomaly-detections-for-realawscloudwatchec2-cpu-utilization-5f5533csv/
[3] 
https://plot.ly/~alavin/3187/anomaly-detections-for-realawscloudwatchelb-request-count-8c0756csv/
[4] 
https://plot.ly/~alavin/3199/anomaly-detections-for-realawscloudwatchrds-cpu-utilization-e47b3bcsv/
[5] 
https://github.com/numenta/nupic/blob/master/src/nupic/algorithms/anomaly_likelihood.py#L84-106

Cheers,
Alex

Reply via email to