"Could you expand a little on what biological problem you're referring to here?
-Mike" Ok, but I suspect it is beyond most people's interest level, I don't want to confuse anyone. But for those that are interested.. The neurons in the CLA can be in a "predictive state". Biologically this is a cell that is depolarized. The neurons in the CLA can be in an "active state". Biologically this is equivalent to firing or generating one or more spikes. These two states are sufficient for learning sequences, but not for temporal pooling. The addition of temporal pooling requires a third state which I don't like because it is a little tricky to make it work with real neurons. When we first implemented the CLA we started with sequence memory and everything worked fine. After a bunch of testing we added temporal pooling. With temporal pooling the cells learn to predict their feed forward activation earlier and earlier. It works like this. First a cell becomes active due to a feed forward input. It then forms synapses that allow it to predict its activity one step in advance. Later it becomes active one step in advance and then forms synapses that allow it to predict its activity two steps in advance, and so on. (The system doesn't require discreet steps but it is easier to think about it that way.) Over repeated training, a cell learns to be active over longer and longer sequences of patterns. This is cool for a number of reasons. A cell will learn to be active for as much time as it can correctly predict its future activity. If the world consists of a few long repeatable sequences then cells will be active over long periods of time. The data determines how much pooling a cell can do. The more pooling that can be done at one level of the hierarchy the easier the job of the next level. It also suggests why we can learn new tasks very quickly (i.e. learn a new sequence) but to master something, to make something second nature, requires many repetitions. I mentioned this in On Intelligence when I said with practice knowledge gets represented lower and lower in the hierarchy. As a region gets better at temporal pooling it frees the memory in the next region for more advanced inference. The problem is cells that are pooling over time must be active/spiking, not just depolarized as in sequence learning. When cells become active by pooling in advance of feed forward activation, it messes up the sequence memory. The CLA can't tell the difference between activation because of a real world feed forward input and activation because of pooling. What happens is the CLA doesn't wait for real input and sequences runaway forward in time. For pooling to work the CLA needs to distinguish between cell activation due to feed forward input and cell activation due to pooling. We need two different states for an active cell. There is an elegant biological solution to this but the evidence is equivocal. The solution is: when a cell is activated due to feedforward input it generates a short burst of action potentials, three to five. It does this once and then stops. When a cell is activated by pooling it generates a series of spaced out spikes. Believe it or not there are quite a few papers that suggest this could be happening. There is evidence of short bursts prior to a steady firing pattern. The mini-bursts are in the literature, easy to find. I spoke to several scientists and they report seeing them. Some claim they see them at the beginning of every trace. However, others say they never see the mini-busts. The best evidence for mini-bursts is in layer 5 cells (yes the motor ones that also project up the hierarchy). These cells are called "intrinsically bursting" cells to reflect this behavior. For temporal pooling to work I think we also need to see this mini-bursting behavior in layer 3. Mini-bursts are seen in layer 3 but not by everybody. The evidence is much spottier. It is possible that all layer 3 cells exhibit this behavior and scientists are not reporting them. Perhaps there are different classes of layer 3 cells and only some mini-burst. I wish the evidence was more conclusive. For the mini-bursting hypothesis to be correct a cell has to behave differently when receiving a mini-burst than when receiving regular spaced spikes. Here too the evidence is good. The synapses that form on distal dendrite branches (sequence and pooling memory synapses) are far more effective when they get a burst of quick spikes in a row. A thin dendrite amplifies the effect of multiple spikes because thin dendrites don't leak current quickly and they have low capacitance. Thus a burst of spikes on multiple synapses may be necessary for our dendrite segment coincidence detector to work. A single spike won't do it. If a cell produces single spikes(not mini-bursts) when activated by a distal dendrite branch then sequences won't run away. This is what we need, it solves our problem! Conversely, axons that project up the hierarchy form synapses on proximal dendrites (the SP synapses). Here, because the synapses are close to the big cell body and the dendrites have large diameters there is large current leakage and low capacitance. It has been shown that the first arriving spike on a proximal synapse has a large effect (depolarization) but subsequent spikes in a mini-burst have a much diminished effect. This is good because we don't want the spatial pooler in the higher region to be overly influenced by the mini-bursts. We want the SP to look at all active axons equally, those that are mini-bursting and those that are single spiking via pooling. This is another nice validation of the theory. If you have followed all of this you see that the mini-burst hypothesis solves the issues of pooling in a hierarchy and it is supported by a lot biological evidence. It is a pretty cool explanation for why we see mini-bursts in layer 5 cells. My only worry is that the evidence for mini-bursting in layer 3 cells is spotty. If everyone said all layer 3 cells are intrinsically bursting like forward projecting layer 5 cells I would be much happier. All in all the theory holds together remarkably well and I don't have another one, so I am sticking with it for now. Of course none of this matters for the SW implementation, but I have found over and over again that if you stray from the biology you will get lost. Jeff From: nupic [mailto:[email protected]] On Behalf Of Michael Ferrier Sent: Thursday, August 29, 2013 11:40 AM To: NuPIC general mailing list. Subject: Re: [nupic-dev] Inter-layer plumbing >> There is a biological problem with pooling the way we implemented that I never resolved. So it is a work in progress. Hi Jeff, Could you expand a little on what biological problem you're referring to here? Thanks! -Mike _____________ Michael Ferrier Department of Cognitive, Linguistic and Psychological Sciences, Brown University [email protected] On Thu, Aug 29, 2013 at 2:29 PM, Jeff Hawkins <[email protected]> wrote: Here are some thoughts about how to connect CLA's in a hierarchy. Here are some things we know about the brain. - Layer 3 in the cortex is the primary input layer. (Sometimes input goes to layer 4 and layer 3, but layer 4 projects mostly to layer 3 and layer 4 doesn't always exist. So layer 3 is the primary input layer. It exists everywhere. We will ignore layer 4 for now.) - I believe the CLA represents a good model of what is happening in layer 3. - The output (i.e. axons) of layer 3 cells project up the hierarchy connecting to the proximal dendrites (SP) of the next region's layer 3. - This isn't the complete picture. The axons of cells in layer 5 (the ones that project to motor areas) spit in two and one branch also projects up the hierarchy to layer 3 in the next region. If we aren't trying to incorporate motor behavior then we can ignore layer 5 and say input goes from layer 3 to layer 3 to layer 3, etc. Or CLA to CLA to CLA, etc. Each cell in layer 3 projects to the next region, so the input to a region is the output of all the cells in the previous region's layer 3. If we consider our default CLA size there would be 64K input bits to the next level in the hierarchy. Because of the distributed nature of knowledge it isn't necessary that all cells in layer 3 project to the next region, as long as a good portion do we should be ok. But assume they all do. 64K is a lot of input bits but the SP in the receiving region can take any number of bits and map them onto any number of columns. That is one of the nice features of the SP, it can map an input of any dimension and sparsity to an number of columns. That's it for the "plumbing". Now comes the tricky part. We, and many others, believe that a large part of how we recognize things in different forms is the brain assumes that patterns that occur next to each other in time represent the same thing. This is where the term "temporal pooler" comes from. We want cells to respond to a sequence of patterns that occur over time even though the individual patterns don't have common bits. The classic case are cells in V1 that respond to a line moving across the retina. These cells have learned to fire for a sequence of patterns (a line in different positions as it moves is a sequence). The cell remains active during the sequence. Thus the outputs of a region are changing more slowing than the inputs to a region. This basic idea is assumed to be happening throughout the cortex. Temporal pooling also makes more output bits active at the same time. So instead of just 40 cells active out of 64K you might have hundreds. The CLA was designed to solve the temporal pooling problem. When we were working on vision problems the temporal pooler was the key thing we were testing. We have disabled this feature when using the CLA in a single region because makes the system slower. The temporal pooler without the "pooling" is still needed for sequence learning. There is a biological problem with pooling the way we implemented that I never resolved. So it is a work in progress. Conclusion: to connect two CLAs together in a hierarchy, all the cells in the lower region become the input to the next region. But there are some difficult issues you might need to understand to get good results depending on the problem. Jeff From: nupic [mailto:[email protected]] On Behalf Of Tim Boudreau Sent: Wednesday, August 28, 2013 4:29 PM To: NuPIC Subject: [nupic-dev] Inter-layer plumbing Is there a general notion of how layers should be wired together, so that one layer becomes input to the next layer? It seems like input into one layer is pretty straightforward - in ascii art: bit bit bit bit bit bit bit bit | | | | | ------proximal dendrite w/ boost factor---> column But it's less clear - If we have the hierarchy input -> layer 1 -> layer 2, what constitutes an input bit to layer 2 - the activation of some combination of columns from layer 1? - How information about activation in level 2 should reinforce connections in layer 1 Any thoughts? -Tim -- http://timboudreau.com _______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
