Re: Necessary conditions of the datasets

Subutai Ahmad Wed, 18 Nov 2015 09:29:06 -0800

Hi Karin,

Performance in the field of anomaly detection is indeed quite tricky. We
recently released a benchmark for anomaly detection, NAB, that includes a
novel way to compute accuracy [1]. The corresponding paper is also
available [2].


NAB contains a lot of data files you can look at. In terms of size, for
NuPIC we usually say that 500 to 1000 records is required before anomaly
detection is reliable.

In terms of seasonality one issue is that the system has to see several
repeating occurrences before it can reliably make predictions. So if it's
an annual pattern you would need several years of data. If it is a daily or
weekly pattern it should be pretty straightforward to learn it if you have
several days/weeks of data.

--Subutai

[1] https://github.com/numenta/NAB
[2] http://arxiv.org/abs/1510.03336


On Wed, Nov 18, 2015 at 6:13 AM, Karin Valisova <[email protected]> wrote:

> Hello guys!
>
> I am working on a time series analysis thing that has one dimensional data
> series as an input and focuses mainly on spotting anomalies.
> I'm using nupic, but I want to have a backup plan for situations, where
> the data are not appropriate for the network, just to do simple analysis
> like detection of the most obvious outliers - ideally before learning the
> whole network (which would be easy as I can take a look at various metrics
> and draw pretty good conclusions from that).
> So I need a set of conditions, based purely on the dataset, to decide if
> nupic is usable. The question in fact lies a bit deeper - what are the
> necessary attributes of the data, if we want use nupic in general? I can
> think size of data sample, should be large enough, how about the degree of
> seasonality?
> I was thinking about the measurement of seasonality for most common
> patterns - like daily and weekly periods and if it's too low then dismiss
> the network - but maybe the HTM is able to spot something not obvious? Or
> do I expect too much from the algorithm?
>
> I do realize that the whole concept of performance in the field of
>  anomaly detection dealing with real time series is a bit hazy, but I would
> be really happy to hear your insights and empirical observations on the
> matter.
>
> Thank you!
> Karin
>

Re: Necessary conditions of the datasets

Reply via email to