Hello NuPIC,
I would like to clarify a few questions:

1a. In "Predicting a Sine Waves with NuPIC" video at 30:00 Matt says that the predicted data are a bit shifted (I suppose this is due to step 1 that was chosen) compared to input data. Matt also says that as NuPIC sees more and more data the predicted wave should have less bumps and also anomaly score should have less spikes. In my experiments I have huge anomaly score at the beginning (nearly 1.0 for the whole 1st period of sine) which seems logic because it took time of size 1st period of sine for NuPIC to see repeating pattern, everything before 1st period is new/unknown for NuPIC and the high anomaly score is result of that. But how you explain following: I often see in my experiments high anomaly score but nice predicting wave (absence of bumps) and vice versa, parts where predicted wave does not copy the input data nicely and has a lot of bumps have small anomaly score, is this OK if yes why? How actually anomaly score works, what is the correlation between anomaly score and predicted wave?

1b. Somewhere I've also read (or saw in videos) if NuPIC does not know what to predict (or does not have enough data) it simply repeats after the input. Can I somehow know when NuPIC repeats and when predicts, should I worry about this?

1c. In my sine experiments I see nice prediction at the beginning (I suppose NuPIC is repeating) then some bumps (I suppose NuPIC is predicting) and then again nice prediction and then again some bumps. My (very simplified) understanding of this is here (in ASCII art):

In first 1/4 of period everything is new (and thus the anomaly score is high) for NuPIC and it is repeating what it sees:
/

Then in 2/4 of period it start to see some repeating pattern:
/\

After the first period it strengthening confidence of what it sees (two times same pattern):
/\/

As it is constantly learning it does not know at the moment how data will looks like next (so it still report some anomaly score). Maybe the repeating pattern is following: 1 period of frequency N + 1 period of frequency N/2 as shown here:
   __
/\/  \__/

This leads me to following conclusion: There will be never such enough data (you know, maybe frequency change will happen after 2nd 3rd .... 1000th .... period, who knows) that NuPIC can say for 100% sure that something has 0 anomaly score and it was totally expected. There will be always some unpredictability. But in my sine experiments I've often see that anomaly score equals to 0. My understanding is: anomaly score was computed only after whole dataset was seen. But if anomaly score was calculated after whole dataset was seen why there is no anomaly score of 0 in whole file except 1st period? So how and when it was computed?


1d. I've found this interesting article
https://github.com/subutai/nupic.subutai/blob/master/swarm_examples/README.md
which refers to Jeff Hawkins mail about improving the prediction
http://lists.numenta.org/pipermail/nupic_lists.numenta.org/2013-June/000327.html
It is 2 years old stuff. Jeff mentioned there data encoding (sampling rate, noise...) and also evolutionary algorithm that would experiment with different parameters until it a found a set that worked well solving the exact problem specified. He said that: "It is our intention to put this PSO code into NuPIC but we haven't been able to do that yet" so I would like to ask if this problem is already solved and PSO is in NuPIC now? Isn't swarm process (finding the best model parameters) what there is called PSO? Should I worry about tunning prediction if yes how, or it is part of NuPIC nowadays?

1e. Should I know the details of HTM or some other low level stuff to fully understand those questions?

Thank you

PS: I'm non native English speaker so please bare with me ;)


Best Regards

Name: Wakan Tanka a.k.a. Wakatana a.k.a. MackoP00h
Location: Europe
Contact:
[email protected]
http://stackoverflow.com/users/1616488/wakan-tanka
https://github.com/wakatana
https://twitter.com/MackoP00h

Reply via email to