I'll answer what I can... On Fri, Nov 7, 2014 at 9:56 AM, Nicholas Mitri <[email protected]> wrote: > 1. Why does prediction lag instead of lead until a large number of samples > has been processed? I remember reading about that in the ML and it having > something to do with HTM passing through observed values as-is when it can’t > predict well. Can someone elaborate on that please both in terms of how its > implemented and the rationale behind it? It tends to produce very misleading > plots especially when the anomaly score isn’t usually high enough to indicate > the pass-through events.
For the plot, the prediction results are shifted by 1 so that they align by timestamp. Before the model learns enough to make decent predictions, you are right that it usually just predicts the value it just saw. It looks like it is lagging because the plots are aligned, and the prediction line is just showing the last value it saw. Once predictions get better, the line get closer and closer to being completely aligned. Perfect predictions would show up on the plot as both lines perfectly overlapping each other. There are more sophisticated ways one could plot this, for example you could change the plot to show the most recent prediction out in front of the data. For simplicity's sake, I didn't do this for the tutorial. > 2. Why is the timestamp included as an encoded field and passed to the > network in the gym example? Is it processed in the same way as the > consumption field or is it only used to align predictions with their > corresponding inputs? For cases with uniform sampling (like the sine > example), can we simply ignore that field and only encode the equivalent of > the consumption field? For the hotgym example, there are daily and weekly temporal patterns. The datetime must be encoded along with other input to get characteristics of time like "time of day" and "day of week". If we didn't encode the time like this, the model would not recognize these patterns because there were not encoded along with the other input. For the sine example, there are no true time-based patterns (no daily, hourly, weekly patterns, etc.). So there is no need to encode time in the input. It is a sequencial pattern, but adding an encoded timestamp to the input wouldn't help with predictions, because there are no time patterns. The only pattern is the sine cycle itself. ------ Matt Taylor OS Community Flag-Bearer Numenta
