Thanks Matt! That’s very helpful.
> On Nov 7, 2014, at 8:10 PM, Matthew Taylor <[email protected]> wrote: > > I'll answer what I can... > > On Fri, Nov 7, 2014 at 9:56 AM, Nicholas Mitri <[email protected]> wrote: >> 1. Why does prediction lag instead of lead until a large number of samples >> has been processed? I remember reading about that in the ML and it having >> something to do with HTM passing through observed values as-is when it can’t >> predict well. Can someone elaborate on that please both in terms of how its >> implemented and the rationale behind it? It tends to produce very misleading >> plots especially when the anomaly score isn’t usually high enough to >> indicate the pass-through events. > > For the plot, the prediction results are shifted by 1 so that they > align by timestamp. Before the model learns enough to make decent > predictions, you are right that it usually just predicts the value i. > just saw. It looks like it is lagging because the plots are aligned, > and the prediction line is just showing the last value it saw. Once > predictions get better, the line get closer and closer to being > completely aligned. Perfect predictions would show up on the plot as > both lines perfectly overlapping each other. > > There are more sophisticated ways one could plot this, for example you > could change the plot to show the most recent prediction out in front > of the data. For simplicity's sake, I didn't do this for the tutorial. > >> 2. Why is the timestamp included as an encoded field and passed to the >> network in the gym example? Is it processed in the same way as the >> consumption field or is it only used to align predictions with their >> corresponding inputs? For cases with uniform sampling (like the sine >> example), can we simply ignore that field and only encode the equivalent of >> the consumption field? > > For the hotgym example, there are daily and weekly temporal patterns. > The datetime must be encoded along with other input to get > characteristics of time like "time of day" and "day of week". If we > didn't encode the time like this, the model would not recognize these > patterns because there were not encoded along with the other input. > > For the sine example, there are no true time-based patterns (no daily, > hourly, weekly patterns, etc.). So there is no need to encode time in > the input. It is a sequencial pattern, but adding an encoded timestamp > to the input wouldn't help with predictions, because there are no time > patterns. The only pattern is the sine cycle itself. > > ------ > Matt Taylor > OS Community Flag-Bearer > Numenta >
