Hi Mark, I'd like to point you to NAB [1], our benchmark for anomaly detection in streaming data. Included in the corpus are 17 data files representing a variety of server metrics, where we specifically selected these files for NAB because they test detectors for the problems you described.
I've plotted a few examples you may be interested in [2-4], where the red dots represent the starting point of true anomalies, and the diamonds mark detections by the HTM anomaly detection algorithm (green and red are true and false positives, respectively). On your previous questions... - We typically say HTM needs 1000 data instances to sufficiently learn the temporal patterns such that it can start reliably making predictions (and anomaly detections). You'll notice the anomaly scores are relatively high at the beginning of a data stream, but settle down after HTM has learned the sequences well. - A very noisy stream will result in FP detections, but this is true of any anomaly detection algorithm. To decrease the number of false positives, you can increase the threshold on the anomaly likelihood. That is, fewer data points will be flagged as anomalous, but this may come at the cost of an increase in false negatives. - The temporal memory has a large capacity for storing patterns of sequences, so this depends on what you mean by "prolonged use". The anomaly likelihood estimation uses several parameters [5] related to how much previous data is used to reestimate the distribution, but tweaking these generally has little effect on the resulting detections. [1] https://github.com/numenta/NAB [2] https://plot.ly/~alavin/3151/anomaly-detections-for-realawscloudwatchec2-cpu-utilization-5f5533csv/ [3] https://plot.ly/~alavin/3187/anomaly-detections-for-realawscloudwatchelb-request-count-8c0756csv/ [4] https://plot.ly/~alavin/3199/anomaly-detections-for-realawscloudwatchrds-cpu-utilization-e47b3bcsv/ [5] https://github.com/numenta/nupic/blob/master/src/nupic/algorithms/anomaly_likelihood.py#L84-106 Cheers, Alex
