Hello NuPIC,
How do you evaluate a correctness and accurancy of a prediction? Or if
you have multiple predictions for same data how do you compare which
prediction was more accurate? I've seen that there is NAB [1] but to be
honest I did not get deep into so I do not know if it might help or not.
AFAIK when you want to do such things the correlation should work fine,
in this case correlation between original and predicted data. But
correlation works only when you have linear data, it would not work e.g.
on hotgym example where you have repeating cycles, peaks, maybe random
events in particular days etc. So my intuitive approach was to calculate
absolute difference [2] of original and predicted value and then
calculate mean of those values. The lower the mean is the better the
prediction is. Then I've realized that there is standard deviation [3]
which can be calculated from those absolute differences. Next step would
be pick up all values which have absolute differences of original and
predicted value:
1. above mean + standard deviation
2. bellow mean - standard deviation
This should give me an overview of how many values falls in this
interval and how many is doesn't. The dataset where more values falls in
the interval is dataset with better prediction.
Does this make sense?
[1]
http://numenta.com/blog/nab-a-benchmark-for-streaming-anomaly-detection.html
[2] https://en.wikipedia.org/wiki/Absolute_difference
[3] http://www.mathsisfun.com/data/standard-deviation.html