Hi Karin, Performance in the field of anomaly detection is indeed quite tricky. We recently released a benchmark for anomaly detection, NAB, that includes a novel way to compute accuracy [1]. The corresponding paper is also available [2].
NAB contains a lot of data files you can look at. In terms of size, for NuPIC we usually say that 500 to 1000 records is required before anomaly detection is reliable. In terms of seasonality one issue is that the system has to see several repeating occurrences before it can reliably make predictions. So if it's an annual pattern you would need several years of data. If it is a daily or weekly pattern it should be pretty straightforward to learn it if you have several days/weeks of data. --Subutai [1] https://github.com/numenta/NAB [2] http://arxiv.org/abs/1510.03336 On Wed, Nov 18, 2015 at 6:13 AM, Karin Valisova <[email protected]> wrote: > Hello guys! > > I am working on a time series analysis thing that has one dimensional data > series as an input and focuses mainly on spotting anomalies. > I'm using nupic, but I want to have a backup plan for situations, where > the data are not appropriate for the network, just to do simple analysis > like detection of the most obvious outliers - ideally before learning the > whole network (which would be easy as I can take a look at various metrics > and draw pretty good conclusions from that). > So I need a set of conditions, based purely on the dataset, to decide if > nupic is usable. The question in fact lies a bit deeper - what are the > necessary attributes of the data, if we want use nupic in general? I can > think size of data sample, should be large enough, how about the degree of > seasonality? > I was thinking about the measurement of seasonality for most common > patterns - like daily and weekly periods and if it's too low then dismiss > the network - but maybe the HTM is able to spot something not obvious? Or > do I expect too much from the algorithm? > > I do realize that the whole concept of performance in the field of > anomaly detection dealing with real time series is a bit hazy, but I would > be really happy to hear your insights and empirical observations on the > matter. > > Thank you! > Karin >
