Last week I was attended to this great project at Nupic through being sent your white paper. I could not agree more with your approach and the thoughts of Jeff Hawkins. Especially thank you very much for your positive attitude, Jeff.
In this mail I want to present my idea of how you could implement hierarchy and attention in such a way that the neural machinery becomes a machine that can imagine things, especially temporal things, that is temporal images. I think one should call these temporal images consciousness, though not self-consciousness. Remark that this system will also not be able to speak yet, no matter how big and fast you make it.
The text is edited cut-and-paste from earlier work trying to fit it to Nupic. Where one reads "us" or "humans" one should read "Nupic's HTM". There is much redundancy in the text. I could make a summary for programmers or such like. Just ask. It would be very nice if this gets implemented.
(Please do not try to read my book; it is too badly written; better ask me to write on something. If you cannot resist, skip chapters 5, 6, and 7.)
I attached two figures to help explain this text: 1 (img58.gif): A hierarchical neural network split in two halves2 (img65.gif): Intra- and inter-cortical projections in our brains. II, III, and IV
refer to layers II, III, and IV of our cerebrum. Straight lines are intra-cortical projections. Bend lines are inter-cortical projections. Take two pattern associators. Imagine them to have an input side at one end, and an output side at the other end. Lay them on top of each other, but in opposite directions with regard to each others input and output side. Then reconnect them such, that each one of them tends to mirror, mimic, or ‘imitate,’ the other, but in the opposite direction with regard to each other (see attached figure). To the input of the first neural network are attached sensory devices like eyes and ears. This neural network I call the deconstructing neural network. The input from the second network is derived from the attention mechanism. This neural network I call the constructing neural network. At the split in this network a process which I call mirroring takes place. It is essential that the neural activation in each halve of the split neural network is flowing in the opposite direction. In the split neural network, taken as a whole, I take the sensory end to be the bottom, and the other end, that is the side of the attention mechanism, to be the top of the neural network. Neural activation flowing from the senses to the attention mechanism can also be referred to as forward projection, while downward activation from the attention mechanism to the senses is often referred to as backward projection. This terminology is common in medical, neuro-physiological literature. Imagine that the attention mechanism does no more than to find a small number of most active neurons in the top of the deconstructing neural network, and then it activates some neurons at the same place, but in the constructing network, while depressing all other neurons at this highest level. As a result there are, from moment to moment, only a few neurons significantly more active at the highest neural level of both the constructing neural network and the deconstructing neural network, with still many neurons being activated in the lower neural levels. Since the deconstructing neural network is mirroring the constructing network, it will tend to look the same and vice versa. But the neural activation in each network is flowing in the opposite direction. This is the principle. One of the consequences is that we can attend to something through the constructing neural network, while something can attract its attention through the deconstructing neural network. I cannot say for sure what the best mechanism for mirroring between neural networks is. I can think of a number of possibilities. My guess is that in our own brains the networks selectively and temporally sensitize or desensitize each other, thereby setting out a preferred route for activation within each network. By sensitizing I do, in this context, mean to temporarily facilitate the activation of a certain neuron. (In Nupic: create a predictive state). This may, of course, indirectly lead to a more permanent sensitivity of the neuron with regard to other neurons. One difference with ordinary neural activation is that mirroring connections have no memory. Their inter-neural strength, or weight, or sensitivity, is fixed, except, I guess, in a global sense, for example with regard to emotional arousal. Above all their weight is relatively small, so that it won’t influence our perception more than necessary. Although the sensitivity between the two neurons in the two halves of the split neural network is itself fixed, their sensitivity with regard to neurons within their own halve of the neural network, of course, is not. So, if a neuron in one network is temporarily sensitized by the other network, then this neuron will become an easier route for neural activation within the first network. As a result chances are that relatively more activation will flow through this neuron. Thereby its sensitivity toward neurons leading to this increased activation will also increase. The net result of both perception and learning is mirroring. One can view mirroring as generating error-signals, the error signal being the difference between the actual and the desired mirroring state. This is a biologically possible implementation of error back-propagation. One could also think of a direct influence of the error signal, having it really activate or deactivate the other neural network a little, instead of only facilitating activation. To me this seems a bit counter intuitive, but the effect will probably not be very different, considering that there is always a lot of neural noise in our brains. Thereby this has (probably) been proven to work a long time ago by Stephen Grossberg (Adaptive Resonance Theory). In our brains the amount of mirroring is possibly flexible and controllable. The more difficulties the attention mechanism has with attending to something, the more the deconstructing neural network should mirror the constructing neural network in order to help the attention mechanism to gain a hold—with all its complicating consequences regarding the trustworthiness of our perception. The deconstructing and the constructing neural network are each others backward propagating networks. What is propagated backward is not an error signal but still, the difference between the two networks may lead to an error signal, which in turn leads to the mirroring of the two networks. Simplifying the working of this neural machinery, we can view the deconstructing neural network as a pattern associator, with at one end the sensory input from our eyes and ears, and at the other, ‘output’ end the attention mechanism. In this view the attention mechanism provides the teaching input for this neural network. The particularity here is, that this teaching input does not come from outside the thinking apparatus, neither is there a homunculus. The attention mechanism causes the ultimate competition between output neurons by letting only one, or a few of them, win—in case of human vision, about five may win. In the other direction the constructing neural network has the attention mechanism at its input, and the sensory devices at its output. As such the sensory input is the teaching input of the constructing neural network. It is more or less taught to reproduce the perceived outside world within itself. This attentive neural network is split in such a way that it is not only a pattern associator, but also an auto-associator. It can imagine, and even hallucinate things—although the latter is not what we want of course. Essential to this design is, amongst others, that, in the constructing neural network, the activation pattern is remembered as if it came into being only by the activation of the attention mechanism—which is, both historically and actually, not true because of the mirroring mechanism. In the deconstructing neural network, the activation pattern is remembered as if it came into being only by activation through the senses—which is not true either. A consequence of the former is that the network can reproduce an image, without the aid of the deconstructing network, by the attention mechanism somehow, ‘willingly’ attending to it. If this happens, then we, through the deconstructing network, will tend to attend to something which looks, sounds, feels, tastes, or smells like something in the outside world, or, if it is not there, it will lead to imaginations—or even hallucinations. Just think of ice cream when it’s hot, and you know what I mean. The other way a consequence is also that our senses can, through the deconstructing neural network, make us ‘unwillingly’ attend to something, without the aid of the constructing neural network. The images I am speaking about in this context are mostly not conscious-kind-off images. Personally I like to say that we can experience these images, but then I define “to experience” as “to know that something is absent”. If you know that something is absent, then you know what is missing. As such it is present again, and you can use your attention to become conscious of something. The easier it is for the attention mechanism to attend, the stronger the mirroring can be, and the better the thing attended to can be remembered. In plain English: the things you know best are remembered best. This could make that you may fail to see small changes. On the other hand, if there is some kind of feed-back mechanism at work that tries to keep the difficulty of the attention mechanism attending to something constant, then one would, to the contrast, see more peculiarities and differences with regard to things one knows best. Let us make my attention mechanism such, that it is able to attend to more than one thing at a time. If we take human vision as an example, the attention mechanism should be able to attend to five things at a time. Five each other mutually excluding attention mechanisms can together make the neural network attend to five things at a time. The effect will be that it finds the five most active neurons and then excites their mirror neurons in the constructing neural network more than they might already be activated. As such there is a pattern of attention. This attention pattern is evidently far less complex than the sensory pattern with which it is associated. As such, there is not only a complex, sensory pattern at the bottom of the network—plus the distributed representation following this—but also a far more simple attention pattern. These two are bound to each other, though not in as simple a way as would have been the case with singular attention. Such an attention pattern can be remembered and retrieved consciously and almost instantly. The attention pattern can be stored and retrieved much faster, and more permanently than sensory activation patterns. After all, we do not have a truly photographic memory. This remembering leads to us subjectively associating things with each other in a classical sense. I think the remembering of the attention pattern (in other words: the things attended to) also leads to "chunking" this pattern into a single unit that can become part of the next attention pattern. This chunking probably is a natural consequence of learning in the mirroring process. Human beings can remember the neurologically rather simple attention pattern in a matter of seconds. Whatever set of visual elements, if we can already recognize each element in it, and if the set contains less than, or equal to, five elements, then the set as a whole can be remembered almost instantly. This said we do have difficulty with more than three elements. In non-human animals the same mechanism holds, but in them this kind of fast learning is, I think, largely dependent on stimuli like food and punishment. We, humans, can partly ‘act on’ our attention mechanism ourselves through other means not described here yet. The elements of the attention pattern may refer to prototypical instances of things, but they may also have their own identity, or something in between. This is like the difference in language between nouns and names. They may also entail abstract notions like "in-or-out-ness", or whatever a neural network can distinguish. Disregarding narrative abilities, which we have trough the use of language, our instant, long term memory of concrete events must be largely the consequence of this remembering of the attention pattern. It is not easy to remember something which we know little about. We must first train the deconstructing and the constructing neural network, in order to be able to remember something easily, or even to perceive something consciously. Subjectively the attention mechanism is also an association mechanism. Since the attention mechanism can attend to five things at a time, we may conceive of attentional hierarchies as consisting of five intermingled hierarchies. This has very interesting consequences. It allows for new, more ore less sensible discoveries to happen within our brains. This, for instance, is a prerequisite for language: The intermingling of attentional activation hierarchies allows for the possibility of a mixture of words (read: a (part of a) sentence) to gain a new meaning which goes beyond each of the words. Think of metaphors. Where attentional stimulation on the perceptive side of a neural network leads to imagination or hallucination, on the motor side this usually immediately leads to bodily activity. ==Physiology I attached a drawing of how all this is to be place in the human cerebrum. If we divide the cortex into the six well known cortical layers and draw the cortical projections between some of them, than we roughly arrive at something like the attached Figure. Only layer II, III, and IV are named here, and of course the hierarchy involves more steps, and contains many more branches. I think that each of the 100 or so cortical areas is a distinct piece in a neural level, having its place somewhere in the hierarchical tree of the larger neural network. In this picture straight lines are projections within a neural level. Bend lines are projections between neural levels. Deconstructing neural activation flows from left to right. Constructing activation flows from right to left. We see that both of these activation streams connect to the other at a certain interval. We see that the lessons, which the deconstructing and the constructing neural network teach each other, must take place somewhere in or between cortical layer II and cortical layer III. For a complex of reasons I think that, attention initially stems from the hippocampus and associated structures, and therefore I guess that the difference between reptiles and mammals is largely that the latter has lower-level neural networks that are capable of learning, being the cerebral cortex, and the former has not. This physiology makes it likely that emotional arousal and instincts can interfere rather directly with the attention-mechanism and gain lots of control on our seemingly voluntary actions. Except for sexuality and aggression this is especially noticeable in non-humans. It is in accordance with some famous cases of human brain-damage too—I think of “H.M.”—if you take into account that, if you lose lots of the attention-mechanism at a later age, lower-level neural groups will by then have been trained to take over much of its function. So, in effect you ‘only’ lose your temporal, ‘declarative,’ or ‘epic’ memory with regard to new things—that is anterograde amnesia. Next to the attention mechanism mentioned above there are many other mechanisms which deserve the name ‘attention mechanism’ equally well. Take for instance the many mechanisms that make us physically direct our head and eyes toward something. I further more think that there are some inborn mechanisms which make that we are mentally adapted to a three-dimensional world at birth. Although we can mentally focus at something with the attention mechanism as described above, we can probably do this in other ways too. For this there is, amongst others, in the older parts of our brains, a kind of two- or three-dimensional neural focusing mechanism, which helps us see a three dimensional world [Stephen M. Kosslyn: 1994]. ==Connecting the mirroring network to our muscles. Motor movement is so very much integrated with perception, that imagining motor action on the the sensory side of neural network helps to coordinate this motor action much. The integration of motor and sensory neural networks also works the other way round. For instance, as a violin-player, when I want to read music, it helps me a lot to move my bowing arm as if playing. The simple—or over-simplified—idea is this. On the perceptive side of a neural network we connect our senses to the constructing neural network, while the deconstructing neural network is not connected to anything—but it is needed for mirroring. On the motor side it is exactly the other way round. Here we connect, so to say, our muscles to the constructing neural network, while the deconstructing neural network is (probably) not connected to anything. To understand how our cerebrum is connected to our arms and legs it is important to understand that the basic tasks of movement are accomplished by phylogenetically older, ‘lower’ level, ‘reptile’ neural structures. This control starts with simple reflexes in the spinal cord. In the brain stem we next find the medulla oblongata, the pons varolii, and the mid-brain or mesencephalon. In this area is also located the reticular formation, which has an important influence on consciousness and arousal. Further more, most sensory information goes through these structures too. The medulla consists of many nuclei. They take care of simple antagonist movements, such as moving one joint of an arm up and down. Other nuclei regulate things like our heartbeat, breathing, the dilatation of our blood vessels, sneezing, hiccuping, coughing, vomiting, etcetera. One step higher the pons varolii regulates swallowing, chewing, some eyeball-movements, taste, facial expression, salivation, and more. Above this the mid-brain regulates the movement of our eyes and head based on visual or auditory information. Many nuclei in the brain stem which regulate motor actions receive sensory information. At the lowest level this is mainly information which relates directly to the muscles which are regulated, such as muscle tension. In the mid-brain information from our eyes and ears is taken into account too. In the middle of our brains we find the thalamus. The thalamus is the principal relay station of our brains. Certainly for sensory impulses it also does a lot of interpretation. All these sensory impulses reach the thalamus through all lower brain structures mentioned earlier. Interpretation and ‘conscious’ recognition of pain and temperature is though to take place in the thalamus. It probably also contains a kind of focusing mechanism for our vision. Again higher we find the basal ganglia, also named the cerebral nuclei. It consists mainly of the corpus striatum. The striatum itself is divided into the lentiform nucleus and the caudate nucleus. The lentiform nucleus is again subdivided into the corpus putamen, and the globus pallidus. Lateral to the putamen we find the claustrum. All these neural formations are heavily interconnected with each other, with the thalamus, and with the cerebral cortex. The basal ganglia control large unconscious movements, like swinging arms while walking, and also walking itself. With many animals one might not immediately notice it if they would not have a cerebral cortex! In birds their cerebral cortex is hardly developed. Voluntary motor functions in birds stem from their well developed basal ganglia. Next to this the basal ganglia—probably the pallidus—also regulate muscle tone—if one plans to do something, then the right muscles need to be stimulated a little before actual use. At the tails of the caudate nuclei we find the amygdaloid nuclei. The hippocampus, amygdaloid nuclei, and parts of the thalamus and hypothalamus together form the limbic system. The limbic system regulates emotion, survival, and our memory. It is responsible for all kinds of involuntary movement, and complex emotional behavior such as rage. The amygdaloid nuclei play an important role in aggression and sexuality. My personal guess is that the hippocampus plays an important role with regard to the attention mechanism. The cerebellum is connected to each of the three parts of the brain stem as mentioned above. It is also rather directly coupled to our cerebrum. The cerebellum takes care of what is sometimes named ‘subconscious’ movements of skeletal muscles. Important are coordination, maintenance of posture, and balance. Important herein is proprioception. That is our sense of position of our body parts relative to each other. From this the cerebellum makes decisions on muscle contractions with regard to desired movement. The result is smooth, coordinated movement. With regard to balancing our body there are direct connections from the balancing-organ in our inner ear to our cerebellum From the above we may conclude that the neural network of the cerebrum, to which I try to confine myself, has, in daily practice, little to do with how our muscles do something with our body. We even see that the phylogenetically older parts of our brain can (not surprisingly, since reptiles have no more) more or less let us live on their own, and that the cerebrum is just one more, regulating layer. As regulating layer, however, it can instruct, and with regard to the cerebellum also teach, lower level neural structures, i.e. we can train ourselves. Teaching is probably very limited. The cerebellum takes care of precise adjustment of movements. I don’t think our basal ganglia have learning abilities. Maybe it is different in birds? In essence, all that these lower brain structures do is built-in from birth. There may be crude forms of adaptation, but nothing which I have to take into account here. What is important to me is that this apparatus can already do so very much automatically. -- Regards, Bert Frederiks
<<attachment: img58.gif>>
<<attachment: img65.gif>>
<<attachment: BertF.vcf>>
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
