Hi guys, I've recently been thinking a lot about a tentative set of ideas which may be considered a refinement or extension of the current CLA. I feel that I'm on to something, but things are still floating around in my head too much to pin down, so I thought I'd attempt to write them down and share them with you. They're just some ideas, so I'm actually demanding your criticism of anything which contradicts the neuroscience or which you know to be a dead end!
The current CLA involves a subset of the real neocortex connection structure, mainly involving the unidirectional feedforward connections, one way of mapping inputs to activation, one type of inhibition functionality, and a restricted type of prediction. The extensions I describe are intended to fill this out to achieve two purposes: first, add functionality which exists in the brain (and in other neural nets); and secondly, provide a plausible neural mechanism to explain how the brain does it. I'll start with some issues faced by the current theory, which I'm hoping will be addressed or at least be better understood if we extend the CLA: 1a. The algorithm in the SP does not support reconstruction of inputs from SDR's of activity (or prediction). This (or something equivalent) is important for hierarchy and motor performance. The phenomenon that Jeff refers to as "unrolling" a sequence to trigger lower-level sequences requires that you send activation down the hierarchy and reconstruct activations in those layers. The example he uses is his speech - his memory of ideas triggers sequences of sentences, each triggers a sequence of words, then phonemes, then muscle actions, etc. The reason for this problem is that the SP effects only one part of connection tuning between input (or lower region) activity and the resulting column activities. Synapse permanences are improved based on good correlations between positive activities when data is present. There is no reverse pathway, which would "cause" the data based on the active columns. 1b. Input encoders are not truly adaptive. In the body, the path from senses to cortex is multi-stage and complex. The input appearing at the cortex has been through a number of processes, which in combination are designed (and learn) to provide the best possible texture for first-stage processing in the cortex. We know that the stages nearest the cortex on this pathway are in a feedback loop which improves the overall "encoder" in some way. The thalamus is no doubt a very important part of this. In comparison, our presentation to the CLA has minimal work done on it. This severely limits what can be done with the data, as it gives the CLA a lot of work to do. In particular, the CLA is not feeding back any information to the encoder stage about how it feels about the data. I'm leaving aside the setup of encoders, which is more like genetic design and is not improved by online learning. This and the previous issue are clearly related (hence the numbering). 2a. Predictions require external engineering to extract them. At the moment, we chose beforehand what fields to predict, and how many steps ahead. We then add a piece of equipment (the classifier) which records what inputs coincide with active cells, and generates a statistical prediction based on that. 2b. The CLA is not really pooling temporally. While the code which generates the predictive cell patterns is called the TP, it's only executing part of the pooling which is really done in the neocortex. The real neocortex is producing a representation of (some or all of) the actual sequence(s) of SDR patterns. The TP, along with the classifier, are used to extract individual slices of this representation, but the "whole thing" is not available to be presented to a higher region, and so any hierarchy cannot easily form a slower-moving representation of the fast-changing data below. 3. We can't reconstruct sequences on command. This is due to a combination of the reconstruction problem and the temporal pooling problem. OK, to solve these problems, we need to add a lot of new connections and set them to work. This is clearly going to add a lot to the memory and CPU cost of the CLA, but we only need to switch it on if we're getting extra value from it, so it's like many of the other settings we already have for configuring the models. Here are my proposed extensions: 1. Make the feedforward pathway truly adaptive. The real brain has a complicated way to do this, no doubt involving return connections at least as far as the thalamus, and also some kind of processing both in the neocortex and in the thalamus. We can replicate this with a simple mechanism which combines the current feedforward connections with new functionality, which adapts the encoding (if sensory), tunes the feedforward connection and also supports reconstruction (of sensory data and lower-region activation). a) Smart Encoding I'll present this first as it could be added as a standalone option to NuPIC. The current AdaptiveEncoder is quite unsophisticated. It starts with a set min-max, and changes one of those abruptly when it receives a scalar out-of-range. All the intermediate buckets are rescaled instantly, thus threatening to break the learning of previous patterns. A smart encoder, on the other hand, adapts gradually to the changing statistics of the data, and smoothly migrates intermediate bucket values at minimal loss to the previous learning. The algorithm is quite simple, and should not lead to large costs in time or memory. In return, I believe this encoder will outperform the old one significantly. Basically, an encoder is composed of a fixed array of "bits" which are each initialised with a centroid and a radius. In the old encoder these are derived by the encoder by subtracting the min from the max and dividing by the number of bits, but in this encoder each bit maintains its own. As the data comes in, each bit calculates whether it has been hit by the value of the field it's encoding. The min and max bits will fire on any out-of-range bits, clamping the encoding. So, far, this replicates a standard (nonadaptive encoder). The bits also keep a running average of how often they are on, which NuPIC calls a duty cycle counter. This number tells each bit how to adapt. The simplest case is for a max-bit which is getting lots of out-of-range activations. It detects that its duty cycle count is high for the encoder (and higher than its neighbour) so it should shift its centroid upwards and extend its radius. Likewise for the min-bit. Conversely, a max-bit which is not getting fired is sitting too high on the number line, so it should drift its centroid down a little. Intermediate bits will also experience different duty cycle levels. Here, the bits should either squash together (if they're firing too much) or spread out (if too little) according to a kind of pressure-tension balance between the bits. There will be a gradient of duty-cycle counts across the bits, and this can be used to spread the bits out so that their duty cycles approach the global average. The adaption process can be carried out every k steps (another parameter which can be swarmed), so we can control the cost. The rate of movement of the bits can also be a parameter, depending on the data. b) Bidirectional and Per-Cell Feedforward Connections At the moment, the SP goes through the columns, and for each column looks at the connected bits to see which are on. it counts these up to give an integer activation potential. It then picks the highest 2% of these and activates those columns. To learn, these active columns are again visited, and the permanences adjusted based on coincidence between this activation and the on-bits in the input. We know this is a simplification which is intended to reflect the fact that the cells in a column share correlated feedforward response. However, this is only because the pipe of feedforward axons passes up through the column and thus potentially synapses on all the cells in a correlated way. The actual synapses are per cell and not per column. If we add these synapses to the cells (as well as the columns), we get some new functionality. Now, these synapses are learning the connections between on-bits in the input and this particular cell becoming active. This connection now tells us which bits are important to this cell, which is the information we want to get back when the cell is predictive. The dendrite permanence values are a histogram of the bits which are associated with this cell becoming active. You can use this to perform a much better job of reconstructing the input from either a predictive state or an activation pattern. This can be achieved simply by reversing the direction of the cell-bit connection as follows: Create a "virtual dendrite" on the input bit (or lower region cell). For each active (or predictive cell), copy the synapse from each bit to the bit's virtual dendrite. Now feed the activation (or prediction) values into the virtual dendrites. The result will be an "activation pattern" across the lower region or input bits. You can do inhibition on this to return the highest probability SDR, or do other things depending on the purpose of the exercise. Note that a lot of this will work better if (at certain stages) you treat activations and connections as more fine-grained than binary. You can always force things to binary when it's of use. c) Real Temporal Pooling I'm leaving this one as a cliff-hanger until tomorrow. I have to go see Leinster pound Connaught into the rugby pitch at the RDS! Meanwhile, your thoughts on the above would be appreciated. Regards Fergal Byrne On Sat, Oct 26, 2013 at 1:09 PM, Fergal Byrne <[email protected]>wrote: > Haha Jeff, > > I'd actually had "a couple" (it was Friday night my time, after all), but > I tell the same story to everyone who's interested in understanding the > Irish, regardless of the time of day or the blood alcohol level. I do > seriously regard these kinds of self-descriptive jokes as windows on > cultures, which can be regarded as shared learning. It's a more plausible > explanation of Jung's collective unconscious. > > One of your (Americans') shared cultural artefacts is the idea of gun > ownership, which is quite incomprehensible to people in Europe. This is > because we've been constantly reinforcing the notion of a gun as a threat > to us in the hands of a member of the public, something that only > officially designated people should be trusted with (even then, the police > in Ireland and the UK are almost exclusively unarmed). You guys have a > completely different set of associations, which your culture feeds you > constantly. This is also tied up with the attitudes towards authority, the > place of the individual, and several other very high level concepts. > > You could regard this kind of "cultural memory" as the ultimate level in a > distributed memory hierarchy. It's a sensorimotor memory as well, as it is > learned in part by the culture taking actions which explore the > consequences of the learned attitudes and updating the memory based on that > exploration. Even within an apparently monolithic culture, there are > "subcultures" which have variations in these attitudes. So the culture is > itself a hierarchy, sitting on top of and composed of the collective > memories of individuals. > > The same process is likely behind the development of languages and > dialects (hierarchy again), mythologies and religions, political ideologies > and so on. > > Regards, > > Fergal Byrne > > > > On Sat, Oct 26, 2013 at 6:08 AM, Jeff Hawkins <[email protected]>wrote: > >> Hah! And how many pints did you have before writing this email?**** >> >> Jeff**** >> >> ** ** >> >> *From:* nupic [mailto:[email protected]] *On Behalf Of *Fergal >> Byrne >> *Sent:* Friday, October 25, 2013 5:11 PM >> >> *To:* NuPIC general mailing list. >> *Subject:* Re: [nupic-dev] motor implementation**** >> >> ** ** >> >> Hi Jeff,**** >> >> ** ** >> >> This is great as it's so far not contained in any of your talks or >> (detailed) writings so far. I've always used the fact that we explore the >> world in order to learn it as a basis for how we learn it. This is a >> crucial understanding about the "philosophy of mind" which your theory >> engenders, and is perhaps misunderestimated (perhaps the greatest >> contributions to the world given by George Junior Bush) by many scholars of >> the brain. **** >> >> ** ** >> >> In the same way that Eskimos (in fact don't) have 40 words for snow, >> Irish people in fact have dozens of words for manipulating truth and >> reality. That's why we have so many nobel prizes for literature. Irish >> people excel in doing sensorimotor explorations of linguistic reality. ** >> ** >> >> ** ** >> >> My favourite example is the answer to "how much did you have to drink >> last night?" which leads to the "Irish drink numeral system."**** >> >> ** ** >> >> 1) If less than 3 pints you answer "I wasn't out last night"**** >> >> 2) 3-7 pints you say "A couple"**** >> >> 3) 7-Your limit you say "a few"**** >> >> 4) Anything significantly beyond is a "skinful" or a "load of pints"**** >> >> ** ** >> >> This is obviously a rather higher level version of what Jeff is talking >> about, but it's the analogue of bobbing your head to get better parallax >> vision.**** >> >> ** ** >> >> Regards**** >> >> ** ** >> >> Fergal Byrne**** >> >> ** ** >> >> **** >> >> ** ** >> >> On Fri, Oct 25, 2013 at 9:52 PM, Jeff Hawkins <[email protected]> >> wrote:**** >> >> In every region of the cortex there are cells in Layer 5 that project >> someplace else in the brain that is related to motor behavior. (At least >> in every region people have looked.) In the classic “motor” regions the >> layer 5 cells project to muscles, or spinal cord, in vision areas they >> project to the superior colliculus which controls eye movement, etc. This >> tells us that most (if not all) regions of the cortex are playing a role in >> motor behavior. This is one reason why I think we can attack the >> sensorimotor problem by initially modeling a single region.**** >> >> **** >> >> The axons from the layer 5 motor cells split and send a branch up the >> cortical hierarchy. So the “motor command” coming from layer 5 is also an >> input to the next region. When the next region learns patterns and makes >> predictions part of its input is the motor commands that the lower region >> is sending to motor areas.**** >> >> **** >> >> One twist is the feedforward motor signal is gated in the thalamus. So >> it doesn’t always go to the next region. Presumably this is controlled as >> part of attention.**** >> >> **** >> >> Remember that layer 3 is the primary feedforward inference layer, it is >> the model for the CLA. My guess is that layer 5 and layer 3 are entrained >> by columns and therefore layer 5 is learning a sequence similar to layer 3. >> **** >> >> **** >> >> I believe that the layer 5 cells associatively link to subcortical motor >> centers. They are just like the cells in the layer 3 CLA, they represent >> the state of the system, but they learn how control behavior by >> association. I can explain this better but it takes more time than I have >> now.**** >> >> **** >> >> I can walk through some simple examples of how a region sits on top of a >> body which has sensors and innate motor behavior. The region learns to >> model the patterns of the body as it interacts with the world through its >> innate behaviors. Then the region’s layer 5 cells associatively link to >> control the innate behavior. Now the region is able to control behavior. >> If the region learns complex patterns that result in desirable outcomes it >> can replay those complex patterns (essentially new complex behaviors) to >> make the desired outcome happen again. This is a form of “reinforcement >> learning”.**** >> >> **** >> >> Where I am struggling is how to set goals and how the system can adjust >> its behavior as it plays back learned behaviors.**** >> >> **** >> >> Jeff**** >> >> **** >> >> Here is a paper on how the layer 5 neurons split. >> http://shermanlab.uchicago.edu/files/rwg&sms%20BRR%202010.pdf **** >> >> **** >> >> **** >> >> *From:* nupic [mailto:[email protected]] *On Behalf Of *Chetan >> Surpur >> *Sent:* Friday, October 25, 2013 10:36 AM >> *To:* NuPIC general mailing list. >> *Cc:* NuPIC general mailing list. >> *Subject:* Re: [nupic-dev] motor implementation**** >> >> **** >> >> Hi Jeff,**** >> >> **** >> >> Just as briefly, would you mind describing from how region 5 helps >> accomplish this? Unless you want to save it as a surprise for your talk :) >> **** >> >> **** >> >> Thanks,**** >> >> Chetan**** >> >> **** >> >> On Fri, Oct 25, 2013 at 10:30 AM, Jeff Hawkins <[email protected]> >> wrote:**** >> >> Aseem, >> I will be giving an informal talk on this topic at the next hackathon, >> but >> in brief, very brief... >> >> The CLA today has no motor component. It is like an ear listening to >> sounds >> but with no ability to interact with the world. Most sensory perception >> is >> not like that. Most of the changes on our sensors come from our own >> actions. Imagine standing in a house. If your eyes couldn't move and your >> body couldn't move you would not be able to learn what the house is like. >> You couldn't learn the patterns in the world. Only be moving do you >> discover the structure of the house. Movement leads to sensory changes. >> The brain learns sensorimotor patterns. "when I see this and turn left I >> will see that". The same is true for touch. Even hearing is largely >> controlled by our own motions. The only thing I am hearing right now is >> the >> sounds of the keys on my keyboard. My cortex is predicting to hear those >> sounds. If they changed even slightly I would notice the difference. >> >> Motor behavior is how we learn most of the structure of the world. >> Jeff >> >> -----Original Message----- >> From: nupic >> [mailto:[email protected]<[email protected]>] >> On Behalf Of Aseem >> Hegshetye >> Sent: Friday, October 25, 2013 5:37 AM >> To: [email protected] >> Subject: [nupic-dev] motor implementation >> >> Hi, >> Jeff Hawkins said he is working on sensorimotor design. >> How will implementation of motor layer 5 help in data prediction. >> Would it be like CLA signalling anomalies like cortex gives motor >> commands >> or are you planning on manipulating some parameters at user end based on >> the >> predictions from given inputs. >> thanks >> Aseem Hegshetye >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org **** >> >> **** >> >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org**** >> >> >> >> **** >> >> ** ** >> >> -- **** >> >> >> Fergal Byrne**** >> >> ** ** >> >> Brenter IT**** >> >> [email protected] +353 83 4214179**** >> >> Formerly of Adnet [email protected] http://www.adnet.ie**** >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> >> > > > -- > > Fergal Byrne > > <http://www.examsupport.ie>Brenter IT > [email protected] +353 83 4214179 > Formerly of Adnet [email protected] http://www.adnet.ie > -- Fergal Byrne <http://www.examsupport.ie>Brenter IT [email protected] +353 83 4214179 Formerly of Adnet [email protected] http://www.adnet.ie
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
