I found it a little difficult to follow all the points made by Kevin and Fergal but I can try to add a few observations that hopefully add clarity.
Sparse just means that a small percentage of neurons are active at any point in time. As far as I know this is a universal observation and I am not aware of anyone claiming otherwise anywhere in neocortex. Is any neuroscientist suggesting that activations are not sparse anywhere in neocortex? A separate issue is what does each neuron represent. In HTM and the CLA we adopt the position that each neuron learns to have some "semantic" meaning. This is not a new idea, we adopted it because it makes a lot of theoretical sense and is supported empirically. The neuron representations do not need to be orthogonal. The literature on sparse coding generally assumes that the neurons form an over-complete basis set, meaning the neuron's meanings are not orthogonal. It would be nearly impossible for messy brain tissue to ensure orthogonality and it isn't even desirable. If you drop the sparse constraint then nothing would work. Dense codes would overlap even when the objects they represent are not similar. It then becomes necessary to look at all the bits to detect a pattern. Real neurons can't do this and if some neurons failed you would get bad results. We could make the CLA work with dense codes but I predict it would become brittle and lose its generalization properties. I believe that sparse activations and semantic meaning are both essential. If this is not clear or you disagree then it would be helpful to know why you feel this way. The meaning of any particular neuron can be very difficult to determine. I believe you are saying that both Gallant and Newsome record from neurons where they can't detect what an individual neuron "represents" but if they look at an ensemble of active neurons they can correlate that ensemble to something in the animal's input. This is not inconsistent with the way the CLA defines SDRs. Recall that each neuron learns a spatial/temporal/motor feature of the world. It can be nearly impossible to detect what a neuron represents by presenting various stimuli to an animal while recording from a neuron. This is especially true the further up the hierarchy you go. Gallant has spent a lifetime trying to do this for region V4. He usually presents static images to a monkey ignoring time and motor. I have talked to Gallant about this and he agrees that it is a problem but there isn't much he can do about it. He can only record from a cell for brief period of time and there is a gigantic space of possible patterns he can expose the animal to. Plus there is no known way to include motor behavior. Here is an example from V1. Scientists present static image patches to an animal and determine what a cell responds best to. They label this the "receptive field" of the cell, such as a line at a particular orientation in a particular part of the visual field. The experiment is highly repeatable so they are pretty sure they know what this cell likes. They now show a movie of natural scenes to the animal. In this movie there are times when the retina is exposed to a pattern that exactly matches the receptive field of the cell. However, the cell only responds to this input occasionally. And when it does respond, the size of the receptive field is much larger than measured before. The true receptive field of this cell cannot be determined by a static image or a moving gradient. The true receptive field requires moving natural images. Now imagine a cell in V4. Its true receptive field involves not only spatial and temporal patterns but also motor behavior. It is nearly impossible to discover what the cell responds to by presenting static images. It still represents something but we can't determine what it is. Now for an advanced CLA topic that I don't think we have described anywhere before. If you make synapses forget more slowly but keep the incrementing rate the same, then a column will learn to respond to more than one spatial pattern. A bit in our SDR will have two different meanings. Sometimes it means A and sometimes it means B. When A happens it starts to forget B and when B happens it starts to forget A but the forgetting doesn't happen fast enough to eliminate A or B. We have tested this extensively and it works quite well. It increases the number of unique things your SDRs can represent, but comes with a slight increase in the possibility of making an improper match in the temporal pooler. I have no idea if this phenomenon happens in real brains but I don't see why it couldn't. If this were happening in parts of the cortex then it would make it even harder to learn the receptive field of a cell. It would also show why looking at an ensemble of neurons would be unambiguous. - Jeff ---------- Forwarded message ---------- From: Archie, Kevin <[email protected]> Date: Wed, Oct 16, 2013 at 8:02 AM Subject: Re: [nupic-dev] sparse input, really? To: "NuPIC general mailing list." <[email protected]> Fergal, Thanks for the (almost alarmingly) quick and thoughtful reply. I'll have to think about it all some more. - Kevin On Oct 16, 2013, at 9:48 AM, Fergal Byrne wrote: Hi Kevin, Good question. I'm pretty sure that SDR is a crucial central idea in Jeff's theory, but let's leave that aside for now. There's another way of looking at your question, inverting it so to speak. Perhaps we have big brains because we need the sparseness in order to process information the way we do. Certainly in NuPIC there seems to be a threshold of 5-600 columns (of the otherwise typical size) in order to have the Spatial Pooler and the Temporal Pooler work really well. Below this size the sparseness is hard to establish, and the TP hasn't enough active connections to operate well. Above this size the capacity is soon so high that it's hard to bang up against it. The model we use in NuPIC is binary (active or not, connected or not, etc), and timestep-based. These are simplifications of real neurons, which have many signalling styles and which operate asynchronously. Given the limitations of measurement of living neocortex, it's unlikely that you could capture the true neuron-by-neuron, millisecond-by-millisecond "network traffic" so it's hard to claim that the signalling is either very sparse or very dense. The observation of neurons carrying "several signals" may be explained by a high rate of change of input, inhibition and intra-region activity, which could cause some overlap in the apparent state of each neuron at any given time. The evolving shape of the signal in this case could be said to encode part of the data. The analogue of this in HTM/CLA is a specific sequence of SDR's, each of which could be regarded as a freeze-frame of the pattern of activity across the region. In the CLA (unlike most other networks), the sequences are just as important as the individual patterns. Looked at in this way, any fast-changing series of SDR's would appear "dense" if measured at intervals significantly longer than the "timestep" of the sequence. So, it's possible that these observations are not in contradiction to the HTM/CLA theories, but are the result of the method of measurement. We do know that inhibitory interneurons act to sparsify activation patterns, and we have empirical evidence that (the computational analogue of) this is key to getting NuPIC to work. This is why we believe the representations are sparse. Any argument in favour of non-sparseness should therefore have one or both of a neuroscientific and computational basis. Regarding orthogonality, you could view an SDR as being a multi-bit (or fuzzy) "dense(r) orthogonal" representation, if you observe that closely-related inputs give rise to closely-matching outputs. If you "wiggle" the inputs one input field at a time, and combine the active bits in the SDR's, you can see that each such set of bits represents a multi-bit representation of some aspect of the input. Any higher-level columns "seeing" these bits will be detecting a similar value of a similar feature. Unlike with the other networks you mention, the "orthogonality" is going to be learned by the SP process, rather than imposed by the structure of the inputs. Regards, Fergal Byrne On Wed, Oct 16, 2013 at 3:04 PM, Archie, Kevin <[email protected]> wrote: > > I've been sitting on this question for a while, and it came to mind again a couple of days ago when I heard Jack Gallant talk about some work by his student Alex Huth. He was showing multiple simultaneous recordings from prefrontal cortex (I think) and each neuron was carrying several signals, that (paraphrasing roughly) couldn't be extracted by looking at individual neurons but could be teased out by extracting components from the network activity. (John Maunsell and Bill Newsome also gave talks that similarly showed single neurons firing in response to lots of things, and pulling out the meaning required the context of the network.) The sense I was getting: this is not sparse coding. > > In traditional neural network models (Hopfield-ish associative memories, perceptron networks and the like), generally what you need is not really sparseness but orthogonality. Sparseness is one way to get that, but it's a space-time tradeoff: you can often build a sparse representation quickly if you have plenty of space. There are other ways to get orthogonality, and a dense representation would be making a different tradeoff -- and big brains being metabolically expensive, space is a nontrivial constraint. A speculation I heard some years ago (and I wish I remember from whom; Google yields some echoes but no clear origin) is that the hippocampus and entorhinal cx are busy during sleep building more compact orthogonal representations of the day's input for use by higher association areas. > > Pretty clearly the sensory periphery uses sparse representations, and similarly for areas with really-motor motor outputs. (Extreme example: V1 certainly uses sparse representation. V1 is really freakin' big.) Probably some sparse representations persist in, say, anterior temporal, parietal, and frontal cx, but I would suspect that compact orthogonal representations would be important in higher (and smaller) areas. Of course, my suspicions are not evidence, I'm ten years mostly away from the neurophysiology literature, and data beats my speculations. Is there direct evidence that higher cortical areas traffic exclusively or primarily in sparse representations? > > That's the brain theory side. On the more immediately practical side: has anyone tried using compact orthogonal representations with NuPIC? Any success (or failure) stories? I don't even have a guess to what extent SDR is necessary versus just customary. > > Thanks, > > - Kevin > > p.s. Apologies for the theoretical bent of this question. Too many years hanging out in universities have left me tending to think too much rather than just getting started. > > ________________________________ > > The material in this message is private and may contain Protected Healthcare Information (PHI). If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail. > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org -- Fergal Byrne Brenter IT [email protected] +353 83 4214179 Formerly of Adnet [email protected] http://www.adnet.ie _______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org _______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org _______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
