Chris,

 

- On the order of patterns.

The CLA sequence memory as currently implemented in NuPIC requires that the
order of patterns be consistent.  It can learn a melody but the order of the
notes must be the same each time.  However, saccades and walking through a
building are examples were the order of patterns is not consistent.  When
you look at something your eyes don't follow a set order of fixations.  When
you walk through a building you sometimes go one way and sometimes another.
These unordered patterns would be unintelligible if the cortex didn't also
know what motor command was just executed.

 

The insight I had was that the existing CLA sequence memory mechanism will
build a predictive model of unordered changes if part of the context it has
is a copy of the recently executed motor command.  So if a layer of cells
knows that they eyes just moved right by x degrees it can learn to predict
the next input regardless of the order of fixations.

 

In the end there only seems to be two ways to model changes.  Either you
know the motor behavior that resulted in change (and therefore can model
changes that are not in a high order sequence).  Or, two, the changes are in
order and you can model them as high order sequences.  Hence, layer 4 and
layer 3.

 

- Attention

The difference between looking "at the whole shirt in general" and studying
"specific features" is covert attention.  Imagine you stood in front of a
shirt four feet away.  When looking at the "whole shirt in general" your
eyes will move about and fixate at different locations on the shirt.  One
fixation might be on the label another on a pocket, etc.  But your
perception will be stable and you will perceive the whole shirt, not just
the label or pocket.  Now imagine you want to read the label so your eyes
fixate on the label and now you perceive the writing on the label.  In both
cases, "whole shirt" and "label" there is a point where the input to the
retinas is identical.  However in one case you perceive the whole shirt and
the other you perceive the writing on the label.  How does the exact same
input result in different perceptions?  The explanation is that when
attending to the label the cortex is ignoring part of the input.  It is
generally believed this happens in the thalamus which acts like a gate
between the sense organs and the cortex (as well as a gate between cortical
regions).  The thalamus can selectively pass some information forward and
block others.  If the cortex only gets part of the retinal input that is
what it will perceive.

 

None of this is directly related to combining of SP and TP.  But the
combining of SP and TP explains how we can have a layer 4 and layer 3 and
they can work together.  So it is essential to making the whole thing work.

 

Jeff

 

From: nupic [mailto:[email protected]] On Behalf Of Chris
Jernigan
Sent: Wednesday, January 22, 2014 12:23 PM
To: NuPIC general mailing list.
Subject: Re: [nupic-discuss] New Temporal Pooler Proposal

 

Hi Jeff,

 

   This idea of combining the SP and TP is interesting. Sometimes I think
about the visual process of recognizing objects in our environment. Your
example of walking through the rooms of your house in a different order
reminded me of some thoughts I've had. 

 

When you are looking around your office or your room and you see a familiar
object, there is a part of your brain that immediately sees the whole object
without studying it. Your brain knows that object. But say you are unsure
about this object, say a shirt, and want to make sure it's yours. Your eyes
don't just look at the whole shirt in general. You study specific features
of it through a series of saccades. With each saccade your brain verifies
the familiarity of different features; the fabric, the color, the label, the
size, etc. This forms a temporal pattern, albeit a very small one, but your
cortex still recognizes it regardless of the order in which you see each
specific feature. 

 

My question is, is this similar to your idea of combining the temporal and
spatial poolers?

 

-Chris

 

On Jan 22, 2014, at 1:59 PM, Jeff Hawkins <[email protected]> wrote:





Temporal Pooling, or TP, was described in the HTM/CLA whitepaper.  However,
the mechanism I proposed for TP always had problems (both biological and
theoretical).  It was close but we could never get it to work cleanly.  I
now have a new proposal for how temporal pooling works.  It is more elegant
and more powerful.  I have not worked through all the details yet, but
several people asked to hear my current thinking on this so that is what
this email is about. Matt has also put this note on the NuPIC wiki
<https://github.com/numenta/nupic/wiki/New-Ideas-About-Temporal-Pooling>
here.

 

When working on the new temporal pooling mechanism I had further insights
that led me to a better understanding of why the cortex has multiple layers
of cells and how they interact.  This is a major extension of HTM theory and
I will also briefly describe that.  The new ideas on temporal pooling and
cortical layers in this email are untested; please consider them as
speculative and unproven.

 

First a little background.  The CLA consists of three components.

 

1) Spatial Pooler

This SP converts a sparse distributed input into a new SDR with a fixed
number of bits and a relatively fixed sparseness.  Each bit output by the SP
corresponds to a column of cells.

2) Sequence memory

The CLA sequence memory learns sequences of SDRs.  It uses the columns to
represent inputs uniquely in different contexts.

3) Temporal Pooler

The TP forms a stable representation over sequences.

 

(Unfortunately we got into the habit of using the term "temporal pooler" for
both the sequence memory and temporal pooling proper.  In this document
temporal pooling will only refer to forming stable representations over a
sequence of patterns.)

 

The basic idea of temporal pooling is patterns that occur adjacent in time
probably have a common underlying cause and therefore the brain forms a
stable representation for a series of input patterns.  An example is a
spoken word.  If we hear a word several times we learn the sequence of
sounds and then form a stable representation for the word.   The input to
the ears is changing but elsewhere there are cells that are stable
throughout the word.  Another example is when looking at an image of a
familiar face.  Several times a second your eyes fixate on a different part
of the image causing a complete change of input.  Despite this changing
input stream your perception is stable.  Several levels up in the cortical
hierarchy there are cells that are selective for the particular person you
are seeing and these cells stay active even though the input from the eyes
are changing.  (The most well-known of these experiments involve cells that
are selective for images of celebrities, such as Jennifer Aniston.  They
were found while fully conscious humans had their brains exposed prior to
surgery.)  Of course, a single cell cannot learn to recognize an entire
face.  This requires a hierarchy where each level in the hierarchy is
temporal pooling.

 

 

In both these cases cells remain active for multiple distinct feedforward
input patterns.  Cells learn to recognize and respond to different
feedforward patterns when those patterns occur one after another in time.
Temporal pooling is a deduced property, we can be confident it is happening
throughout the cortex.

 

 

New Idea Number 1

Temporal pooling occurs between layers of cells, not just between regions.

In On Intelligence I wrote that temporal pooling occurs between regions in
the cortical hierarchy.  As you ascend the hierarchy from region 1 to region
2, region 2 forms a stable representation of the changing patterns in region
1.  Conversely, when a stable pattern in region 2 projects back to region 1
it invokes sequences of patterns in region 1.

 

                      <image007.png>  <image008.png>

 

In this diagram the dots represent SDRs and the row of dots represents a
sequence of SDRs over time.

 

I now believe that temporal pooling is also occurring between layers of
cells within a region.  The canonical feedforward flow of information in a
cortical region is from layer 4 to layer 3 then to layer 4 of the next
higher region.  Layer 4 also projects to layer 5 and then to layer 6.  I
believe layers 5 and 6 are using the same basic mechanism and I am making
some progress in understanding them.   For now I will restrict my comments
to layers 4 and 3Here is what I think is happening in layers 4 and 3.

 

                             <image009.png>

 

Why have two layers of sequence memory, 4 and 3, in a region?  What is the
difference between layer 4 and layer 3?

 

A region of cortex is trying to build a predictive model of the changing
input it receives.  Sensory input changes because of two fundamental
reasons.  One is because your body and sensors move, the other is because
objects in the world change on their own.  For example if you are walking
alone in a house then all the changes that occur on your sensors are because
you are moving your eyes, head, and body.  If you stood still and didn't
move your eyes there would be no changes in your sensory input.  Another
example is as you are look at a picture the changes occurring on your
retinas are solely because your eyes are moving several times a second.  The
second reason sensory data can change is because objects in the world are
changing on their own.  For example if there were a dog walking in the house
with you or if it barked it would cause changes on your sensors that were
not caused by your own movement.

 

As a general rule, sub-cortical neurons that generate behavior have split
axons.  One branch generates the behavior and the other is sent to the
cortex.  If the cortex didn't get copies of motor commands you couldn't
function.  Every time you moved your eyes or turned your head it would
appear as if the world was spinning and shifting.

 

I believe layer 4 models the changes in the input due to the body's
behavior.  Layer 3 models the changes in the input that cannot be predicted
by layer 4.  I am able to show that if you take the standard CLA sequence
memory and feed it both sensory data and motor commands (such as a sparse
coding of a saccade's direction and distance) it can learn to predict
changes due to behavior.  For example it can learn to predict what the eyes
will see after a saccade (something known to occur in V1 and elsewhere).  If
layer 4 can successfully predict changes due to the body's own behavior then
the representations in the layer 4 sequence memory will be very sparse (one
cell per column).  If we TP over these changing patterns then the
representation in layer 3 will be stable.  If you were looking at a still
image layer 4 would change with each saccade and layer 3 would be relatively
stable.  The representation in layer 3 would be independent of where you are
fixating on the image.  (Again, to form a fully stable representation of an
image requires a hierarchy of regions.)

 

Any change that a layer cannot predict will be passed on as a change in the
next layer.  Any change that a layer can predict will be pooled and not
result in a change in the next layer.  Put another way, any change that
cannot be predicted will continue up the hierarchy, first layer 4 to layer 3
then region to region.  If you were looking at a walking dog, layer 4 would
provide an input to layer 3 that after temporal pooling would be partially
stable (the image of the dog) and partly not stable because the dog is
moving.  Layer 3 would try to learn the pure high order sequence of walking.
All these examples require a hierarchy, but applied to simple problems you
might not need a hierarchy.

 

Note that in layer 4 the order of patterns does not have to be repeatable.
It does not have to be a high order sequence.  The order and direction of
saccades does not have to follow a set pattern.   Similarly if I was walking
through a house the order in which I do it, turning left or right, can vary.
Layer 4 can handle this because it has a copy of the motor command
generating the change.  Layer 3 on the other hand does not get a copy of a
motor command.  The only way it can model the data is to look for high order
sequences.

 

The concept of modeling changes due to our own behavior can be applied to
touch, audition, and vision.  It is a powerful idea that explains how we
form representations of the world that are not independent of our sensor or
body positions and why the world seems stable even though the patterns on
our sensors are rapidly changing.  It explains how you perceive objects
independent of where the object is and how you are currently sensing it.
For example, imagine you reach into your purse to grab a pair of eye
glasses.  The actual sensations on your skin are a short sequence of edges,
corners, and surfaces.  But you perceive the entire glasses.   This is
directly analogous to moving your eyes over an image.

 

I am not going to go through exactly how I believe layer 4 works here.  I
will do that another time.  But it looks like that by changing the
contextual information available to the standard CLA sequence memory you
will get the desired result.  Today NuPIC is equivalent to layer 3.

 

This concept of layer 4 and layer 3 modeling different aspects of an input
stream requires temporal pooling between layers, not just between regions.
Layer 3 has to form a stable representation (temporal pooling) of predicted
changes in layer 4.

 

New Idea Number 2

Temporal Pooling can be combined with Spatial Pooling.

You might have noticed in the diagram above I labeled the first operation of
a layer of cells "Spatial/Temporal Pooling".  It took me a long time to
realize that a small change to the spatial pooler will allow it to do
temporal pooling as well.  And we need to do both spatial and temporal
pooling as information moves from layer to layer and region to region.  The
required small change is to have the cells in a column to stay active longer
so they learn to recognize multiple input patterns over time.

 

The standard SP does these steps.

- Receive as input a sequence of SDRs.

- Use a competitive process to learn a set of common spatial patterns in the
input sequence.

- Assign a column of cells to be active for each spatial pattern in the set.

- Ensure each cell in a column learns the same feedforward response
properties.

 

The trick to adding TP to the SP is the following.  When the input to the SP
was correctly predicted in the previous layer we want the cells in the
column to remain active long enough to learn to respond to the next input
pattern.  If the input to the SP was not predicted in the previous layer
then we don't want cells in the column to remain active longer than the
current input (the existing SP behavior).  Temporal pooling is achieved by
extending the activity of the cells in the column.  But only when the input
to the column was predicted in the previous layer.

 

It is easy to do this in software, but how might this happen with real
neurons?  (If you don't care about the biology you might want to skip the
next section.)

 

New Idea Number 3

Temporal pooling uses metabotropic synapses to distinguish predicted vs.
non-predicted inputs.

One of the key requirements of temporal pooling is that we only want to do
it when a sequence is being correctly predicted.  For example, we don't want
to form a stable representation of a sequence of random transitions.

 

The old TP mechanism, the one in the white paper, proposed that a cell would
learn to fire longer and longer in advance by predicting further back in
time.  In this way a cell would stay active as long as the sequence was
predictable. There were several problems with this method that I couldn't
resolve.

 

The new proposal, described above, says that when a column of cells becomes
active due to feedforward input the cells in the column will remain active
for longer than normal if the feedforward input was from cells that
predicted their activity.  If the input cells were not predicted then the
cells in the column should stay active only briefly.

 

There is a biological mechanism for this that fits well, but I have not been
able to verify all the details.  Here is the biological mechanism.  Active
synapses open and close ion channels, this happens rapidly on the order of a
few milliseconds.  The effect of an active synapse on the destination cell
is short lived.  However many synapses are paired with another type of
receptor called a metabotropic receptor.  If the metabotropic receptor is
activated it will have a long duration effect on the destination cell, from
100s of milliseconds to several seconds.  Metabotropic receptors can provide
the means for keeping a cell active for a second or more, exactly what we
need for temporal pooling.

 

Metabotropic receptors are not always activated.  They require a short burst
of action potentials to be invoked.  As little as two action potentials
10msec apart are sufficient.  Without that short burst the a metabotropic
receptor will remain inactive.  There is a lot of literature on metabotropic
receptors and these properties.   They are common in cortical neurons and
the locations where they are not present also makes sense from a theory
point of view.

 

Does a cell that is in a predictive state (depolarized) generate a small
burst of action potentials when it first fires?  That is what is required
for the new TP mechanism to work.  There is some evidence for this.  Layer 5
cells are well known for starting with a short burst of action potentials
under certain conditions.  They are sometimes labeled "intrinsically
bursting neurons" to reflect this.  Some scientists have reported short
bursts in layer 4 and layer 3 cells but not consistently so, unlike layer 5.
We wouldn't expect to see bursts in layer 4 cells in an anesthetized animal
which is not generating behavior.  The majority of experiments where
individual cells are recorded are done with animals that are anesthetized
and unable to move.  Evidence for short bursts of action potentials under
the correct conditions is the biggest missing piece of the new proposed TP
mechanism.

 

The new TP mechanism, the combination of SP and TP, and the ideas for layers
4 and 3 are compelling.  They explain a lot of things.  So I am going to try
hard to find the biological mechanisms that support these ideas.  A lot of
neuroscience details match up well but there are more I need to verify.

 

This year I want to empirically test the layer 4 ideas combined with
temporal pooling.  It should be possible to build a powerful hierarchical
vision system.  If we restrict it to spatial images and use saccades for
training we could build the entire thing with a hierarchy of layer 4-only
regions.  Training such a system might be slow but inference should be fast.
Perhaps the NuPIC community could do this in collaboration with some Grok
engineers.

 

Jeff H

 

_______________________________________________
nupic mailing list
 <mailto:[email protected]> [email protected]
 <http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org>
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

 

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to