(Disclaimer: Jeff's not a huge fan of this idea, due to its implementation complexity, but it seems relevant.)
Non-binary activation There is a confusing part of CLA in that we talk about 'firing' but are not necessarily equating this directly to spikes. For example a cell in the SP either turns on or not depending on overlap, and global competition etc. This doesn't quite mimic the behavior of a cell and its receptive field where you see random low level firing much of the time, then increased firing when a relevant stimulus enters the field, and decreased firing as it leaves. I like to simplify much of what is going on as feature detectors reporting a degree of confidence that the feature has been detected, and this is naturally a scalar value. There are a couple important benefits of scalar activation however: 1. You can propagate a signal at different intensities such that you can set of a chain of 'imagined' or predicted states without crossing a threshold for motor output, or strongly reinforcing connections. 2. You can allow low intensity signals to reverberate through the system allowing for loops and feedback. In my mind there will be more and more reasons to have different 'activation states' which could all boil down to a need to have scalar activations. Of course this has significant computational and memory impacts which is why (IIRC) it's done as is today. Ian On Fri, Aug 30, 2013 at 2:31 PM, Jeff Hawkins <[email protected]> wrote: > The problem is cells that are pooling over time must be active/spiking, > not just depolarized as in sequence learning. When cells become active by > pooling in advance of feed forward activation, it messes up the sequence > memory. The CLA can’t tell the difference between activation because of a > real world feed forward input and activation because of pooling. What > happens is the CLA doesn’t wait for real input and sequences runaway > forward in time.**** > > ** ** > > Are we talking about within one layer, or across layers here?**** > > ** ** > > >> Within one layer. The CLA models just one layer of cells.**** > > ** ** > > I don't know that I can step through enough of what happens in my head yet > to see how that happens in a single layer. But what does look like a hole > is: Say Cell A is active due to input, and Cell B is predictively > activated, not because the immediate next input will activate it, but > because the input *after that* will; there is no way to differentiate a > prediction for further in the future than the next step from a wrong > prediction about the next step. Is that the issue, or am I just off in > outer space?**** > > ** ** > > >> Sorry, I couldn’t follow your question. **** > > It might be useful to step back as remind ourselves what problems we are > trying to solve. Within each region of the cortex we want to do the > following.**** > > ** ** > > 1) Inference of sequences.**** > > We know the cortex recognizes sequences of patterns. All audition (e.g. > spoken language, music) is only recognized when played in the correct > order. The same is true for vision and touch, they are temporal inference > problems. (We can recognize some still images but vision is mostly > temporal.)**** > > ** ** > > 2) Prediction while inferring.**** > > We know that the cortex is constantly predicting what is going to happen > next. We know this because we recognize when things change. It appears we > make multiple predictions simultaneously. The properties of SDRs solve the > multiple prediction problem beautifully. The cortex needs to detect > anomalies so it can direct attention to unpredicted input.**** > > ** ** > > 3) Temporal pooling.**** > > This is when cells learn to fire continuously during part or all of a > sequence. We know there are cells that do this for multiple phenomena in > vision and audition Temporal pooling is not as widely understood, but many > people have reached the same hypothesis that temporal pooling plays a large > part of how we build invariant representations. We don’t know if temporal > pooling is occurring everywhere in the cortex but I work on the assumption > that it does. Temporal pooling makes the hierarchy work much better too. > It means each level in the hierarchy can learn as much spatial and temporal > context as it can, freeing up the next level to work in more complex > patterns. If we didn’t have temporal pooling then the output of layer 3 > would be changing at the same rate as the input, we need to get to slower > more stable concepts as we ascend the hierarchy.**** > > ** ** > > The CLA is a great model of how a layer of cells does 1 and 2. It only > requires that the active cells are spiking and the predictive cells are > depolarized. I am confident the CLA is close to how real cells do inference > and prediction.**** > > ** ** > > Things get tricky when we try to add temporal pooling, to get a layer of > cells to do 1, 2, and 3. My assumption is that a single layer, layer 3 at > a minimum, has to do inference, prediction, and temporal pooling. This > could be an incorrect assumption. Some scientists divide layer 3 into 3a > and 3b. Some scientists designate a separate layer 2 and some don’t. > Perhaps these sub layers are doing different things, some inference and > then others pooling. So when discussing temporal pooling we need to keep > in mind that the attempt to do it all in one layer of cells might be > wrong. It is speculative. I think it would be simpler and more elegant if > we can show that a single layer cells can infer, predict, and do temporal > pooling. That is what I am trying to do, but that might be wrong.**** > > ** ** > > So how does the “proposal” for temporal pooling work? Say we have a > sequence A-B-C-D-E that repeats. When E cells first become active they > form synapses to cells that were just active during D. So now the E cells > will become predictively active when they see the D cells again. However, > when the E cells became predictively active due to the D cells pattern C > was just active so E cells now form synapses (on a different segment) with > C cells. After this the E cells become predictively active when they see > either C or D cells. The process can repeat. At the end, a particular E > cell has a dendrite segment that recognizes D, another dendrite segment > that recognizes C, another segment that recognizes B, etc. So now the E > cells will be predictively active when they see A, B, C, or D. We would > see the E cells fire continuously during the ABCDE sequence, but only at E > is the cell responding to its feed forward receptive field.**** > > ** ** > > As I said earlier, this all looks good except that it requires us to > change the “predictive state” from being depolarized to steady firing, and > the active state from being steady firing to mini-bursting. That is a > relatively fine distinction that makes me uncomfortable. There are other > weirdnesses that I haven’t resolved. For example this would say the > mini-burst occurs at the end of a sequence, where there is more evidence > they occur at the beginning.**** > > ** ** > > A TOTALLY DIFFERENT APPROACH**** > > ** ** > > One time I was talking to Murray Sherman and he suggested a completely > different approach to temporal pooling. It is much much simpler but dumber > (not Murray, he is smart).**** > > ** ** > > Excitatory synapses can be divided into ionotropic and metabotropic. The > former only involve the flow of ions across the cell membrane. They are > quick to start and quick to stop. The latter invoke a metabolic pathway > (chemistry, proteins, stuff like that). They are slower. A metabotropic > synapse will depolarize a cell for up to a third or half of a second for > one incoming spike, whereas the effect of a spike at an ionotropic synapse > lasts just a few milliseconds. The synapses near the cell body (the inputs > to a region, the SP synapses) are the slow type. The synapses on the > distal dendrites (our TP synapses) are the fast type. What this means is > that a fast changing input to layer 3 will result in a slower changing > response. This will definitely lead to slower and slower responses as you > ascend the hierarchy.**** > > ** ** > > This approach is dumb because it involves no learning. It pools anything > and everything that occurs even one in sequence. It is also hard to see > how we can get cells that stay active for several seconds during sequences. > **** > > ** ** > > CONCLUSION**** > > Because the temporal pooling question remains unresolved I prefer to > ignore it for now. We chose to work on problems of prediction and anomaly > detection in a single region that don’t require temporal pooling. But I > don’t want to discourage others from working on it.**** > > ** ** > > ** ** > > ** ** > > ** ** > > That actually brings up a question I was wondering about: When we talk > about making predictions several steps in advance, how is that actually > done - by snapshotting the state, then hypothetically feeding the system's > predictions into itself to get further predictions and then restoring the > state; or is it simply that you end up with cells which will only be > predictively active if several future iterations of input are likely to > *all* activate them?**** > > ** ** > > >> When NuPIC makes predictions multiple steps in advance, we use a > separate classifier, external to the CLA. We store the state of the CLA in > a buffer for N steps and then classify those states when the correct data > point arrives N steps later.**** > > Jeff**** > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > >
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
