Hello Scott,
On 11/03/2015 08:08 PM, Scott Purdy wrote:
On Tue, Nov 3, 2015 at 1:23 AM, Wakan Tanka <[email protected]
<mailto:[email protected]>> wrote:
Hello NuPIC,
Here
http://lists.numenta.org/pipermail/nupic_lists.numenta.org/2015-November/012139.html
is discussion about correct interpretation of NuPIC output which I
would like to extend. First I will provide short summary and then
ask another question.
Consider following output:
step,original,prediction,anomaly score
175,0,0.0,0.32500000000000001
176,62,52.0,0.65000000000000002
177,402,0.0,1.0
178,0,0.0,0.125
179,402,0.0,1.0
180,0,0.0,0.0
181,3,402.0,0.050000000000000003
182,50,52.0,0.10000000000000001
183,68,13.0,0.90000000000000002
This is output of one step ahead prediction without using inference
shifter. It basically mean that the prediction made at step N is for
step N+1. Or in another words if the prediction is perfectly right
then prediction value at step N should correspond to the original
value at step N+1.
Anomaly score can be viewed as confidence of prediction. For
example, NuPIC might only be 23% confident in the best prediction it
gives, in which case the anomaly score could be very high. This is
the case of step 179 where prediction is 0 and the original value on
step 180 is 0. Note that anomaly score on step 179 is 1.0. It means
that NuPIC was not confident in prediction, despite that the
prediction was correct.
The anomaly score (any form of it) is not a good estimate of confidence.
The CLA Classifier gives multiple predictions with probabilities
associated. You should use this instead. But the percent it gives is
based on what it has seen before, so if it only saw some state one time
before then it should predict the value it saw next with 100%
confidence. But in reality you probably want a confidence much lower
since you've only seen the value one time. But the classifier confidence
will be a much better indicator than the anomaly score. Imagine these
sequences:
ABCD
ABCE
ABCF
If each of these is learned, then when you see ABC, the temporal memory
will be perfectly predicted so far, yielding an anomaly score of 0.0,
and will be predicting D, E, and F simultaneously (in the temporal
memory). The classifier will also be predicting those values and will
have some differing probabilities. Whichever value (D, E, or F) happens
to have the highest probability will be the "best prediction" but will
have only 33-40% confidence.
Clear. Just one question: Does encoding of D, E or F in SDR also plays
role in this step? If D an E are encoded as semantically similar in SDR
and NuPIC sees sequence of DDDDDD and ten predict E it will have lower
anomaly score than if it sees sequence of DDDDDD and predict F? Or if it
sees DDDDDD is it considered the similar (from semantic encoding point
of view) as if it sees DDDDDE sequence?
The "classifier confidence" term that you're using is the same as
"anomaly likelihood" which is Subutai Ahmad using in Anomaly Detection
Using the CLA?
Opposite situation happens on step 180 where prediction is 0 and the
original value on step 181 is 3. Note that anomaly score on step 180
is 0. That means that NuPIC was quiet confident in prediction but it
was not correct.
Questions:
1. Does anomaly score on given line also counts with the original
value on given line? For example anomaly score on this line
181,3,402.0,0.050000000000000003
take into account that 3 is the original value? Or it is computed
without respect to this value?
The anomaly score will tell you whether or not the original value on the
same line was expected or not. But even if it is low, it doesn't mean
that the prediction has high confidence (see explanation above).
Clear. Just for curiosity: is there any other case except the one which
you've described (the best prediction from previous step was chosen the
one which has relatively low confidence) that this can happen?
2. Is it possible to compute some kind of debug information reading
prediction and anomaly score? I mean something like this from NuPIC
perspective:
I'm 23% sure that next value will be 10
I'm 27% sure that next value will be 20
I'm 50% sure that next value will be 30
The model results from the CLAModel will have all predictions and
associated probabilities. I think that is what you will want. What are
you running to generate these results? If I see how you are generating
it I can show how to get this extra information.
I've been following Matt tutorials which are based on OPF:
model = ModelFactory.create(modelParams)
model.run
How many prediction probabilities are in one step? Is it constant or it
is varying?
3. Is OK to predict data for zero steps forward if I'm just
interested in the prediction accuracy?
I don't follow. Predicted 0 steps ahead means that you want to predict
the value you just got?
Yes, just for some debugging purposes to see if NuPIC is able to
understand data. Maybe using prediction for some steps ahead with
inference shifter will be better idea. This leads me to question that if
it is more difficult for NuPIC to predict larger steps than one step? I
suppose it is more difficult only from system resources perspective
because when you've prediction for 10 steps it means that NuPIC also
need to know what is happening in 1-9 steps forward?
4. Does NuPIC make some kind of look back? I mean if NuPIC was at
step 180 confident that next value will be 0 but later it shows that
it was mistake does NuPIC somehow recount the anomaly score from
step 180 for further data processing? Or this is done automatically
in HTM?
Everything is constantly learning so it will perform better on future
values but it doesn't go back an give updated anomaly scores when it has
more information.
I expressed badly sorry. What I mean is if NuPIC during constant
learning also incorporates in the future prediction the past predictions
vs real values? Suppose following example: NuPIC has predicted that next
value will be
A for 10%
B for 30%
C for 23%
D for 37%
(I summary always 100%?)
Based on this the next prediction is the one with highest confidence, in
this case D. Suppose two cases:
1. next value will be A (the one with lowest confidence)
2. next value will be E (the one that was nor predicted).
Does the NuPIC in first case somehow counts with those 10% while it will
continue to learn? Or it is simply considered as bad prediction during
further learning no matter how much confidence it gets in the past?
Thank you very much