Hi Tom, have a look on the all avaliable settings here: https://github.com/numenta/nupic/blob/master/py/nupic/encoders/scalar.py#L157
first, you'll want to set w reasonably high, something like 31 or more should be ok. Then I'd focus on either radius, or resolution params. Setting radius=0.1 will make 4.5 and 4.6 a completely different patterns: 111111100000000000000000000000000000000000000000000 000000111111100000000000000000000000000000000000000 while resolution =0.1 will make them distinguishible: 111111100000000000000000000000000000000000000000000 000111111100000000000000000000000000000000000000000 Also, setting the range (minValue, maxValue) to reasonable (minimum possible) values will help, that is, of 4..15 are common values, and 4.5, or 25 is already an outliner, I'd set the range to something like 4..30 ? Depending on how you use the CLA, these parameters are set either in constructor of the encoder, or in OPF model file. Hope you get some nice results from the anomaly, let us know :) Cheers, Mark On Sat, Dec 14, 2013 at 7:16 PM, Tom Tan <[email protected]> wrote: > Hi Mark, > > This is great help. I used ScalarEncoder. Could you point me where to > adjust the scalar encoder? This is my try with Nupic. Please pardon my > ignorance. > > I’ll keep you posted on the results. > > Regards, > Tom > > On Dec 13, 2013, at 4:20 AM, Marek Otahal <[email protected]> wrote: > > Tom, > > I like the way it's going, the questions are more and more interesting.. > > To me, it looks more like a principle question: is the CLA supposted to > handle that? How can we force it to do so? > > Imagine your cat, you see it every day, but when it loses 1-200 hair(s) > you wouldn't notice (because our eyes don't work on that level of details.) > On the other hand, I think it's correct to assume to detect the anomaly in > the scenario you describe. Have you given enough (various) datapoints, so > that the values >20 are truly unexpected? Maybe it would be useful to > create a simplified dataset and experiment with it? > > > To the problem, where you never see anomally score above 0.3. I believe > this has to do with the "sensor and avaliable details ", that is encoder > and its settings in Nupic. You're probably going with the ScalarEncoder, > right? > Now you'd want to set 'resolution=0.1' to be able to differentiate the 5, > 4.5 and 4.8. > I didn't study anomaly much, but afaik the anomaly is computed as Hamming > distance between the predicted and actual output (somehow scaled to 0..1 > range). So if your representation of 5, and 4.5 differ in just 3 bits, you > cant expect anomaly score higher that, say 30%. > There's another param in the scalar encoder to set the number of bits how > a two different patterns should differ. You could experiment with that as > well. > > Cheers, Mark > > > > > On Thu, Dec 12, 2013 at 9:26 AM, Tom Tan <[email protected]> wrote: > >> Hi Fergal and Mark, >> >> Thanks very much for your pointers and insights. I still would like to >> see if CLA can do better in discerning patterns and further detecting >> anomalies in these patterns. >> >> Let’s look at the hot gym anomaly example again. It is obvious that gym >> consumption is low during 2-4 am. I grep every day 2AM energy consumption >> from rev-center-hourly.csv and put them into rec-2AM.csv (attached). Then >> I plotted the 2AM consumption (attached graph). As the plot shows, on vast >> majority of days, the energy consumption hovered around 5. Some days it >> went up, but never dipped below 4.5. One would reasonably assume 4.5 to 10 >> maybe even to 15 is the normal range. On days consumption went up to >20 >> or went down <4.5 should be considered abnormal. >> >> The CLA, however, didn’t give high anomaly score in either >20 or <4.5 >> case. The highest anomaly score was merely 0.3 when consumption (~25) was >> 5 times the normal. >> >> Your suggested approach appears valid for the zero consumption case, but >> I am not sure how it deals with aforementioned scenario where the normal >> pattern is highly skewed. Furthermore, the suggested approach assumes we >> know the normal range beforehand (hence to be able to create an encoder to >> deal with 0). Imaging we were not dealing with one hot gym, but ten of >> thousands servers in a data center. Human knowing the normal range for each >> server during low activity hours is not practical. So questions is can CLA >> learn the range and pick a suitable encoder automatically? >> >> >> Regards, >> Tom >> >> >> >> >> >> >> >> INFO:__main__:Anomaly detected at [2010-10-13 02:00:00]. Anomaly score: >> 0.300000 >> >> >> >> >> On Dec 10, 2013, at 10:53 PM, Tom Tan <[email protected]> wrote: >> >> Hi, >> >> I am very intrigued by Nupic CLA and its potentials. I was trying to use >> CLA algorithm to perform anomaly detection. My data set is similar to that >> of the hotgym example - the usage is high during the day/business hours and >> low, but never zero, during night/non-business hours (sorry I can’t share >> my data set). The zero usage means outage and should be considered as an >> anomaly regardless when it happens. The problem is CLA failed to raise >> anomaly score when outage/zero usage happening during the non-business >> hours. >> >> I managed to reproduce the problem using the the hot gym anomaly example. >> >> I made following change to "extra/hotgym/rec-center-hourly.csv" >> >> 4373,4374c4373,4374 >> < 12/31/10 1:00,0 >> < 12/31/10 2:00,0 >> --- >> > 12/31/10 1:00,4.9 >> > 12/31/10 2:00,5 >> >> that means zero energy usage during the 1 & 2 AM, which should be >> abnormal. And corresponding CLA score are 0 (shown below) >> >> INFO:__main__:Anomaly detected at [2010-12-31 01:00:00]. Anomaly score: >> 0.000000. >> INFO:__main__:Anomaly detected at [2010-12-31 02:00:00]. Anomaly score: >> 0.000000. >> >> When I used 24 “traditional” statistical models, each for an hour of the >> day, I was able to detect zero usage and report as an anomaly. CLA doesn’t >> appear to be superior in this case. >> >> Can CLA model be tuned to account for scenarios like this? >> >> Regards, >> Tom >> >> >> >> > > > -- > Marek Otahal :o) > > > -- Marek Otahal :o)
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
