Tom, I'm curious about the statistical models you used for the correct
anomaly detection. Could you explain a little bit more which models you
used, how exactly you extracted an anomaly score from them,  etc?

Thanks!
Pedro.


On Thu, Dec 12, 2013 at 6:26 AM, Tom Tan <[email protected]> wrote:

> Hi Fergal and Mark,
>
> Thanks very much for your pointers and insights.  I still would like to
> see if CLA can do better in discerning patterns and further detecting
> anomalies in these patterns.
>
> Let’s look at the hot gym anomaly example again.  It is obvious that gym
> consumption is low during 2-4 am.  I grep every day 2AM energy consumption
> from rev-center-hourly.csv and put them into rec-2AM.csv (attached).  Then
> I plotted the 2AM consumption (attached graph).  As the plot shows, on vast
> majority of days, the energy consumption hovered around 5.  Some days it
> went up, but never dipped below 4.5.  One would reasonably assume 4.5 to 10
> maybe even to 15 is the normal range.  On days consumption went up to >20
> or went down <4.5 should be considered abnormal.
>
> The CLA, however, didn’t give high anomaly score in either >20 or <4.5
> case.  The highest anomaly score was merely 0.3 when consumption (~25) was
> 5 times the normal.
>
> Your suggested approach appears valid for the zero consumption case, but I
> am not sure how it deals with aforementioned scenario where the normal
> pattern is highly skewed.  Furthermore, the suggested approach assumes we
> know the normal range beforehand (hence to be able to create an encoder to
> deal with 0).  Imaging we were not dealing with one hot gym, but ten of
> thousands servers in a data center. Human knowing the normal range for each
> server during low activity hours is not practical.  So questions is can CLA
> learn the range and pick a suitable encoder automatically?
>
>
> Regards,
> Tom
>
>
>
>
>
>
>
> INFO:__main__:Anomaly detected at [2010-10-13 02:00:00]. Anomaly score:
> 0.300000
>
>
>
>
> On Dec 10, 2013, at 10:53 PM, Tom Tan <[email protected]> wrote:
>
> Hi,
>
> I am very intrigued by Nupic CLA and its potentials.  I was trying to use
> CLA algorithm to perform anomaly detection.  My data set is similar to that
> of the hotgym example - the usage is high during the day/business hours and
> low, but never zero, during night/non-business hours (sorry I can’t share
> my data set).  The zero usage means outage and should be considered as an
> anomaly regardless when it happens.  The problem is CLA failed to raise
> anomaly score when outage/zero usage happening during the non-business
> hours.
>
> I managed to reproduce the problem using the the hot gym anomaly example.
>
> I made following change to "extra/hotgym/rec-center-hourly.csv"
>
> 4373,4374c4373,4374
> < 12/31/10 1:00,0
> < 12/31/10 2:00,0
> ---
> > 12/31/10 1:00,4.9
> > 12/31/10 2:00,5
>
> that means zero energy usage during the 1 & 2 AM, which should be
> abnormal.  And corresponding CLA score are 0 (shown below)
>
> INFO:__main__:Anomaly detected at [2010-12-31 01:00:00]. Anomaly score:
> 0.000000.
> INFO:__main__:Anomaly detected at [2010-12-31 02:00:00]. Anomaly score:
> 0.000000.
>
> When I used 24 “traditional” statistical models, each for an hour of the
> day, I was able to detect zero usage and report as an anomaly.  CLA doesn’t
> appear to be superior in this case.
>
> Can CLA model be tuned to account for scenarios like this?
>
> Regards,
> Tom
>
>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>


-- 
Pedro Tabacof,
Unicamp - Eng. de Computação 08.
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to