Tom, I'm curious about the statistical models you used for the correct anomaly detection. Could you explain a little bit more which models you used, how exactly you extracted an anomaly score from them, etc?
Thanks! Pedro. On Thu, Dec 12, 2013 at 6:26 AM, Tom Tan <[email protected]> wrote: > Hi Fergal and Mark, > > Thanks very much for your pointers and insights. I still would like to > see if CLA can do better in discerning patterns and further detecting > anomalies in these patterns. > > Let’s look at the hot gym anomaly example again. It is obvious that gym > consumption is low during 2-4 am. I grep every day 2AM energy consumption > from rev-center-hourly.csv and put them into rec-2AM.csv (attached). Then > I plotted the 2AM consumption (attached graph). As the plot shows, on vast > majority of days, the energy consumption hovered around 5. Some days it > went up, but never dipped below 4.5. One would reasonably assume 4.5 to 10 > maybe even to 15 is the normal range. On days consumption went up to >20 > or went down <4.5 should be considered abnormal. > > The CLA, however, didn’t give high anomaly score in either >20 or <4.5 > case. The highest anomaly score was merely 0.3 when consumption (~25) was > 5 times the normal. > > Your suggested approach appears valid for the zero consumption case, but I > am not sure how it deals with aforementioned scenario where the normal > pattern is highly skewed. Furthermore, the suggested approach assumes we > know the normal range beforehand (hence to be able to create an encoder to > deal with 0). Imaging we were not dealing with one hot gym, but ten of > thousands servers in a data center. Human knowing the normal range for each > server during low activity hours is not practical. So questions is can CLA > learn the range and pick a suitable encoder automatically? > > > Regards, > Tom > > > > > > > > INFO:__main__:Anomaly detected at [2010-10-13 02:00:00]. Anomaly score: > 0.300000 > > > > > On Dec 10, 2013, at 10:53 PM, Tom Tan <[email protected]> wrote: > > Hi, > > I am very intrigued by Nupic CLA and its potentials. I was trying to use > CLA algorithm to perform anomaly detection. My data set is similar to that > of the hotgym example - the usage is high during the day/business hours and > low, but never zero, during night/non-business hours (sorry I can’t share > my data set). The zero usage means outage and should be considered as an > anomaly regardless when it happens. The problem is CLA failed to raise > anomaly score when outage/zero usage happening during the non-business > hours. > > I managed to reproduce the problem using the the hot gym anomaly example. > > I made following change to "extra/hotgym/rec-center-hourly.csv" > > 4373,4374c4373,4374 > < 12/31/10 1:00,0 > < 12/31/10 2:00,0 > --- > > 12/31/10 1:00,4.9 > > 12/31/10 2:00,5 > > that means zero energy usage during the 1 & 2 AM, which should be > abnormal. And corresponding CLA score are 0 (shown below) > > INFO:__main__:Anomaly detected at [2010-12-31 01:00:00]. Anomaly score: > 0.000000. > INFO:__main__:Anomaly detected at [2010-12-31 02:00:00]. Anomaly score: > 0.000000. > > When I used 24 “traditional” statistical models, each for an hour of the > day, I was able to detect zero usage and report as an anomaly. CLA doesn’t > appear to be superior in this case. > > Can CLA model be tuned to account for scenarios like this? > > Regards, > Tom > > > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > > -- Pedro Tabacof, Unicamp - Eng. de Computação 08.
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
