On Tue, Nov 3, 2015 at 1:23 AM, Wakan Tanka <[email protected]> wrote:
> Hello NuPIC, > > Here > > http://lists.numenta.org/pipermail/nupic_lists.numenta.org/2015-November/012139.html > > is discussion about correct interpretation of NuPIC output which I would > like to extend. First I will provide short summary and then ask another > question. > > Consider following output: > > step,original,prediction,anomaly score > 175,0,0.0,0.32500000000000001 > 176,62,52.0,0.65000000000000002 > 177,402,0.0,1.0 > 178,0,0.0,0.125 > 179,402,0.0,1.0 > 180,0,0.0,0.0 > 181,3,402.0,0.050000000000000003 > 182,50,52.0,0.10000000000000001 > 183,68,13.0,0.90000000000000002 > > This is output of one step ahead prediction without using inference > shifter. It basically mean that the prediction made at step N is for step > N+1. Or in another words if the prediction is perfectly right then > prediction value at step N should correspond to the original value at step > N+1. > > Anomaly score can be viewed as confidence of prediction. For example, > NuPIC might only be 23% confident in the best prediction it gives, in which > case the anomaly score could be very high. This is the case of step 179 > where prediction is 0 and the original value on step 180 is 0. Note that > anomaly score on step 179 is 1.0. It means that NuPIC was not confident in > prediction, despite that the prediction was correct. > The anomaly score (any form of it) is not a good estimate of confidence. The CLA Classifier gives multiple predictions with probabilities associated. You should use this instead. But the percent it gives is based on what it has seen before, so if it only saw some state one time before then it should predict the value it saw next with 100% confidence. But in reality you probably want a confidence much lower since you've only seen the value one time. But the classifier confidence will be a much better indicator than the anomaly score. Imagine these sequences: ABCD ABCE ABCF If each of these is learned, then when you see ABC, the temporal memory will be perfectly predicted so far, yielding an anomaly score of 0.0, and will be predicting D, E, and F simultaneously (in the temporal memory). The classifier will also be predicting those values and will have some differing probabilities. Whichever value (D, E, or F) happens to have the highest probability will be the "best prediction" but will have only 33-40% confidence. > > Opposite situation happens on step 180 where prediction is 0 and the > original value on step 181 is 3. Note that anomaly score on step 180 is 0. > That means that NuPIC was quiet confident in prediction but it was not > correct. > > > Questions: > 1. Does anomaly score on given line also counts with the original value on > given line? For example anomaly score on this line > 181,3,402.0,0.050000000000000003 > take into account that 3 is the original value? Or it is computed without > respect to this value? > The anomaly score will tell you whether or not the original value on the same line was expected or not. But even if it is low, it doesn't mean that the prediction has high confidence (see explanation above). > > 2. Is it possible to compute some kind of debug information reading > prediction and anomaly score? I mean something like this from NuPIC > perspective: > I'm 23% sure that next value will be 10 > I'm 27% sure that next value will be 20 > I'm 50% sure that next value will be 30 > The model results from the CLAModel will have all predictions and associated probabilities. I think that is what you will want. What are you running to generate these results? If I see how you are generating it I can show how to get this extra information. > > 3. Is OK to predict data for zero steps forward if I'm just interested in > the prediction accuracy? > I don't follow. Predicted 0 steps ahead means that you want to predict the value you just got? > > 4. Does NuPIC make some kind of look back? I mean if NuPIC was at step 180 > confident that next value will be 0 but later it shows that it was mistake > does NuPIC somehow recount the anomaly score from step 180 for further data > processing? Or this is done automatically in HTM? > Everything is constantly learning so it will perform better on future values but it doesn't go back an give updated anomaly scores when it has more information. > > > PS: I've cross posted this question on SO to reach more people here > http://stackoverflow.com/questions/33495388/how-to-correctly-interpret-nupic-output-vol-2 > >
