Hi Mark,

Thanks for your answers! It's really helping me grasp the possibilities in
anomaly detection.

My use case is to provide an improvement to current monitoring capabilities
that are mostly based on threshold values of things like server metrics,
response times and some limited insight in system load. Some parts of the
environment that affect its behavior can't be monitored however. Another
problem is that not all monitors correspond to clearly defined software
requirements and vice versa, but that should not be part of this scope.

I hope that despite this limitation (and a few others) I can improve the
monitoring capabilities by incorporating some variation of the NuPIC
framework. Current problems as you might guess are: an abundance of useless
automatically generated notifications, a severe lack in trend monitoring
and the risk of complications as a result of an issue that was noticed too
late.

I see that with Grok it should be possible to implement this use case on at
least the server metrics, as it immediately provides functions that simply
don't exist in the current setup. However, I'd also like to explore the
possibilities of detecting less obvious anomalies through the combination
of metrics, to my knowledge that's currently beyond the scope of Grok.

Assuming the data is pretty noisy and complex and it would take 1000-3000
samples for anomaly likelihood to work, that would probably take about a 4
days worth of samples if I were to use Grok.

My doubt about prolonged use of NuPIC is that 'old' patterns might be
considered unwanted in some cases, so anomalies might not be detected even
though there is unwanted behavior.

It'd be nice to hear other people's views!

Met vriendelijke groet,

Casper Rooker
casper.roo...@gmail.com

On Wed, Oct 14, 2015 at 4:07 PM, Marek Otahal <markota...@gmail.com> wrote:

> Hi Casper,
>
> great to see your interest with HTM! Some answers below...
>
> On Wed, Oct 14, 2015 at 2:56 PM, Cas <casper.roo...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm working on a proof of concept for enabling anomaly detection in a
>> monitored environment. I've decided to incorporate NuPIC in the PoC. I'm
>> trying to gain insight in the pro's and con's of using the framework. I'm
>> particularly interested in the workings of the anomaly score and anomaly
>> likelihood.
>>
>> I have some questions:
>>
>> What part of the framework outputs anomaly scores and anomaly likelihood?
>>
> Anomaly class, in src/nupic/algorithms/anomaly.py and anomali_likelihood.py
>
>
>> What factors determine the minimum sample size for anomaly likelihood?
>>
> There is some burn-in period for likelihood to be able to evaluate the
> estimates, but not limited only for anomaly, all parts of HTM need some
> time to settle -
> for the SpatialPooler to be able to create stable and "high quality" SDR
> representations, same for TP; in my experience on complex data this is
> about 1000-3000 samples.
>
> Is it possible to judge if an input stream is too noisy for useful anomaly
>> detection?
>>
> IMHO you should be almost always fine. (consider you are talking about
> point-anomalies?) Unless they happen significantly ofthen HTM will detect
> them. If they do happen significantly ofthen, it is not an anomaly anymore.
> The likelihood model can eg. train to "become used to" a white noise, and
> then a const line would be considered anomalous, for a period of time.
>
>
>> Are there insights on the accuracy and reliability of anomaly detection
>> after prolonged use? I imagine HTM regions could become too saturated with
>> pattern variations to provide consistent anomaly detection over time.
>>
> In my experience your accuracy would improve, see the example above. What
> will get worse is the speed, as HTM implementation slows down as more
> segments etc are created.
>
> Btw, can you share some insight in what your use-case is?
>
> Cheers, Mark
>
>>
>> I hope you can help me out.
>>
>> With regards,
>>
>> Casper Rooker
>> casper.roo...@gmail.com
>>
>
>
>
> --
> Marek Otahal :o)
>

Reply via email to