On Thu, Aug 29, 2013 at 6:53 PM, Jeff Hawkins <[email protected]> wrote:
> Ok, but I suspect it is beyond most people’s interest level, I don’t want > to confuse anyone. But for those that are interested…. > Quite interesting to me, at least. Thanks! Having spent a number of evenings now implementing pieces of the plumbing of this thing in Java, to understand the problem space better, I'd noticed that the information about why a cell was active (input or predictive) had no way to propagate between layers, and thought that might be one of those things that doesn't come out in the wash. > The neurons in the CLA can be in a “predictive state”. Biologically this > is a cell that is depolarized. > > The neurons in the CLA can be in an “active state”. Biologically this is > equivalent to firing or generating one or more spikes.**** > > These two states are sufficient for learning sequences, but not for > temporal pooling.**** > > The addition of temporal pooling requires a third state which I don’t like > because it is a little tricky to make it work with real neurons. > On the up-side, if you're representing no-state, predictive-state and active-state, you're using two bits, and you have one unused combination of two bits available to represent that last state. > **** > > When we first implemented the CLA we started with sequence memory and > everything worked fine. After a bunch of testing we added temporal > pooling. With temporal pooling the cells learn to predict their feed > forward activation earlier and earlier. It works like this. First a cell > becomes active due to a feed forward input. It then forms synapses that > allow it to predict its activity one step in advance. Later it becomes > active one step in advance and then forms synapses that allow it to predict > its activity two steps in advance, and so on. (The system doesn’t require > discreet steps but it is easier to think about it that way.) Over repeated > training, a cell learns to be active over longer and longer sequences of > patterns. This is cool for a number of reasons. A cell will learn to be > active for as much time as it can correctly predict its future activity. > If the world consists of a few long repeatable sequences then cells will be > active over long periods of time. The data determines how much pooling a > cell can do. The more pooling that can be done at one level of the > hierarchy the easier the job of the next level. It also suggests why we > can learn new tasks very quickly (i.e. learn a new sequence) but to master > something, to make something second nature, requires many repetitions. I > mentioned this in On Intelligence when I said with practice knowledge gets > represented lower and lower in the hierarchy. As a region gets better at > temporal pooling it frees the memory in the next region for more advanced > inference. > > ** ** > > The problem is cells that are pooling over time must be active/spiking, > not just depolarized as in sequence learning. When cells become active by > pooling in advance of feed forward activation, it messes up the sequence > memory. The CLA can’t tell the difference between activation because of a > real world feed forward input and activation because of pooling. What > happens is the CLA doesn’t wait for real input and sequences runaway > forward in time. > Are we talking about within one layer, or across layers here? It makes sense that two interconnected layers don't have a way to share that info, since the activated state from one layer becomes a single bit of information to the next. I don't know that I can step through enough of what happens in my head yet to see how that happens in a single layer. But what does look like a hole is: Say Cell A is active due to input, and Cell B is predictively activated, not because the immediate next input will activate it, but because the input *after that* will; there is no way to differentiate a prediction for further in the future than the next step from a wrong prediction about the next step. Is that the issue, or am I just off in outer space? That actually brings up a question I was wondering about: When we talk about making predictions several steps in advance, how is that actually done - by snapshotting the state, then hypothetically feeding the system's predictions into itself to get further predictions and then restoring the state; or is it simply that you end up with cells which will only be predictively active if several future iterations of input are likely to *all* activate them? I've been drinking from the firehose on this topic, but I'm quite new to it, so if I'm horribly misunderstanding something, feel free to thwack me over the head :-) <snip> If you have followed all of this you see that the mini-burst hypothesis > solves the issues of pooling in a hierarchy and it is supported by a lot > biological evidence. It is a pretty cool explanation for why we see > mini-bursts in layer 5 cells. My only worry is that the evidence for > mini-bursting in layer 3 cells is spotty. If everyone said all layer 3 > cells are intrinsically bursting like forward projecting layer 5 cells I > would be much happier. All in all the theory holds together remarkably > well and I don’t have another one, so I am sticking with it for now.**** > > ** ** > > Of course none of this matters for the SW implementation, but I have found > over and over again that if you stray from the biology you will get lost. > Seems like it ought to be possible to model this in software with and without mini-bursting in layer three and at least be able to say if one approach definitely doesn't work. I don't think that would answer a question of biology, but it would be a way to concretely poke around in a solution space that has some chance of being fruitful. Thanks, -Tim
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
