Re: [nupic-dev] Inter-layer plumbing

Jeff Hawkins Thu, 29 Aug 2013 15:54:42 -0700

"Could you expand a little on what biological problem you're referring to
here?


-Mike"

 

Ok, but I suspect it is beyond most people's interest level,  I don't want
to confuse anyone.  But for those that are interested..

 

The neurons in the CLA can be in a "predictive state".  Biologically this is
a cell that is depolarized.

The neurons in the CLA can be in an "active state".  Biologically this is
equivalent to firing or generating one or more spikes.

These two states are sufficient for learning sequences, but not for temporal
pooling.

The addition of temporal pooling requires a third state which I don't like
because it is a little tricky to make it work with real neurons.

 

When we first implemented the CLA we started with sequence memory and
everything worked fine.  After a bunch of testing we added temporal pooling.
With temporal pooling the cells learn to predict their feed forward
activation earlier and earlier.  It works like this.  First a cell becomes
active due to a feed forward input.  It then forms synapses that allow it to
predict its activity one step in advance.  Later it becomes active one step
in advance and then forms synapses that allow it to predict its activity two
steps in advance, and so on.  (The system doesn't require discreet steps but
it is easier to think about it that way.)  Over repeated training, a cell
learns to be active over longer and longer sequences of patterns.  This is
cool for a number of reasons.  A cell will learn to be active for as much
time as it can correctly predict its future activity.  If the world consists
of a few long repeatable sequences then cells will be active over long
periods of time.  The data determines how much pooling a cell can do.  The
more pooling that can be done at one level of the hierarchy the easier the
job of the next level.  It also suggests why we can learn new tasks very
quickly (i.e. learn a new sequence) but to master something, to make
something second nature, requires many repetitions.  I mentioned this in On
Intelligence when I said with practice knowledge gets represented lower and
lower in the hierarchy.  As a region gets better at temporal pooling it
frees the memory in the next region for more advanced inference.

 

The problem is cells that are pooling over time must be active/spiking, not
just depolarized as in sequence learning.  When cells become active by
pooling in advance of feed forward activation, it messes up the sequence
memory.  The CLA can't tell the difference between activation because of a
real world feed forward input and activation because of pooling.  What
happens is the CLA doesn't wait for real input and sequences runaway forward
in time.

 

For pooling to work the CLA needs to distinguish between cell activation due
to feed forward input and cell activation due to pooling. We need two
different states for an active cell.

 

There is an elegant biological solution to this but the evidence is
equivocal.  The solution is: when a cell is activated due to feedforward
input it generates a short burst of action potentials, three to five.  It
does this once and then stops.  When a cell is activated by pooling it
generates a series of spaced out spikes.  Believe it or not there are quite
a few papers that suggest this could be happening.  There is evidence of
short bursts prior to a steady firing pattern.  The mini-bursts are in the
literature, easy to find.  I spoke to several scientists and they report
seeing them. Some claim they see them at the beginning of every trace.
However, others say they never see the mini-busts.  The best evidence for
mini-bursts is in layer 5 cells (yes the motor ones that also project up the
hierarchy).  These cells are called "intrinsically bursting" cells to
reflect this behavior.  For temporal pooling to work I think we also need to
see this mini-bursting behavior in layer 3.  Mini-bursts are seen in layer 3
but not by everybody. The evidence is much spottier.  It is possible that
all layer 3 cells exhibit this behavior and scientists are not reporting
them.  Perhaps there are different classes of layer 3 cells and only some
mini-burst.   I wish the evidence was more conclusive.

 

For the mini-bursting hypothesis to be correct a cell has to behave
differently when receiving a mini-burst than when receiving regular spaced
spikes.  Here too the evidence is good.

 

The synapses that form on distal dendrite branches (sequence and pooling
memory synapses) are far more effective when they get a burst of quick
spikes in a row.  A thin dendrite amplifies the effect of multiple spikes
because thin dendrites don't leak current quickly and they have low
capacitance.  Thus a burst of spikes on multiple synapses may be necessary
for our dendrite segment coincidence detector to work.  A single spike won't
do it.  If a cell produces single spikes(not mini-bursts) when activated by
a distal dendrite branch then sequences won't run away.  This is what we
need, it solves our problem!

 

Conversely, axons that project up the hierarchy form synapses on proximal
dendrites (the SP synapses).  Here, because the synapses are close to the
big cell body and the dendrites have large diameters there is large current
leakage and low capacitance.  It has been shown that the first arriving
spike on a proximal synapse has a large effect (depolarization) but
subsequent spikes in a mini-burst have a much diminished effect.  This is
good because we don't want the spatial pooler in the higher region to be
overly influenced by the mini-bursts.  We want the SP to look at all active
axons equally, those that are mini-bursting and those that are single
spiking via pooling.  This is another nice validation of the theory.

 

If you have followed all of this you see that the mini-burst hypothesis
solves the issues of pooling in a hierarchy and it is supported by a lot
biological evidence.  It is a pretty cool explanation for why we see
mini-bursts in layer 5 cells.  My only worry is that the evidence for
mini-bursting in layer 3 cells is spotty.  If everyone said all layer 3
cells are intrinsically bursting like forward projecting layer 5 cells I
would be much happier.  All in all the theory holds together remarkably
well and I don't have another one, so I am sticking with it for now.

 

Of course none of this matters for the SW implementation, but I have found
over and over again that if you stray from the biology you will get lost.

Jeff

 

 

From: nupic [mailto:[email protected]] On Behalf Of Michael
Ferrier
Sent: Thursday, August 29, 2013 11:40 AM
To: NuPIC general mailing list.
Subject: Re: [nupic-dev] Inter-layer plumbing

 

>> There is a biological problem with pooling the way we implemented that I
never resolved.  So it is a work in progress.

 

Hi Jeff,

 

Could you expand a little on what biological problem you're referring to
here?

 

Thanks!

 

-Mike




_____________
Michael Ferrier
Department of Cognitive, Linguistic and Psychological Sciences, Brown
University
[email protected]

 

On Thu, Aug 29, 2013 at 2:29 PM, Jeff Hawkins <[email protected]> wrote:

Here are some thoughts about how to connect CLA's in a hierarchy.

 

Here are some things we know about the brain.

 

- Layer 3 in the cortex is the primary input layer.  (Sometimes input goes
to layer 4 and layer 3, but layer 4 projects mostly to layer 3 and layer 4
doesn't always exist.  So layer 3 is the primary input layer. It exists
everywhere.  We will ignore layer 4 for now.)

 

- I believe the CLA represents a good model of what is happening in layer 3.

 

- The output (i.e. axons) of layer 3 cells project up the hierarchy
connecting to the proximal dendrites (SP) of the next region's layer 3.

 

- This isn't the complete picture.  The axons  of cells in layer 5 (the ones
that project to motor areas) spit in two and one branch also projects up the
hierarchy to layer 3 in the next region.  If we aren't trying to incorporate
motor behavior then we can ignore layer 5 and say input goes from layer 3 to
layer 3 to layer 3, etc.  Or CLA to CLA to CLA, etc.

 

Each cell in layer 3 projects to the next region, so the input to a region
is the output of all the cells in the previous region's layer 3.  If we
consider our default CLA size there would be 64K input bits to the next
level in the hierarchy.   Because of the distributed nature of knowledge it
isn't necessary that all cells in layer 3 project to the next region, as
long as a good portion do we should be ok.  But assume they all do.

 

64K is a lot of input bits but the SP in the receiving region can take any
number of bits and map them onto any number of columns.   That is one of the
nice features of the SP, it can map an input of any dimension and sparsity
to an number of columns.

 

That's it for the "plumbing".  Now comes the tricky part.

 

We, and many others, believe that a large part of how we recognize things in
different forms is the brain assumes that patterns that occur next to each
other in time represent the same thing.  This is where the term "temporal
pooler" comes from.  We want cells to respond to a sequence of patterns that
occur over time even though the individual patterns don't have common bits.
The classic case are cells in V1 that respond to a line moving across the
retina.  These cells have learned to fire for a sequence of patterns (a line
in different positions as it moves is a sequence).  The cell remains active
during the sequence.  Thus the outputs of a region are changing more slowing
than the inputs to a region.  This basic idea is assumed to be happening
throughout the cortex.  Temporal pooling also makes more output bits active
at the same time.  So instead of just 40 cells active out of 64K you might
have hundreds.

 

The CLA was designed to solve the temporal pooling problem.  When we were
working on vision problems the temporal pooler was the key thing we were
testing.  We have disabled this feature when using the CLA in a single
region because makes the system slower.  The temporal pooler without the
"pooling" is still needed for sequence learning.

 

There is a biological problem with pooling the way we implemented that I
never resolved.  So it is a work in progress.

 

Conclusion:  to connect two CLAs together in a hierarchy, all the cells in
the lower region become the input to the next region.  But there are some
difficult issues you might need to understand to get good results depending
on the problem.

Jeff

 

 

 

From: nupic [mailto:[email protected]] On Behalf Of Tim
Boudreau
Sent: Wednesday, August 28, 2013 4:29 PM
To: NuPIC
Subject: [nupic-dev] Inter-layer plumbing

 

Is there a general notion of how layers should be wired together, so that
one layer becomes input to the next layer?

 

It seems like input into one layer is pretty straightforward - in ascii art:

 

bit bit bit bit bit bit bit bit

 |       |   |       |       |

 ------proximal dendrite w/ boost factor---> column

 

But it's less clear

 - If we have the hierarchy input -> layer 1 -> layer 2, what constitutes an
input bit to layer 2 - the activation of some combination of columns from
layer 1?

 - How information about activation in level 2 should reinforce connections
in layer 1

 

Any thoughts?

 

-Tim

 

-- 

http://timboudreau.com


_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] Inter-layer plumbing

Reply via email to