Re: [nupic-dev] Inter-layer plumbing

Jeff Hawkins Sat, 31 Aug 2013 14:13:00 -0700

Ian,

I have nothing against scalar activations.  Here are some more details about
why the CLA doesn't have scalar cell activations today.

- I have no doubt that scalar activations exist in the cortex and that they
can help in some situations.  But.

- It has been shown that some pretty significant cognitive tasks can occur
in the cortex too quickly to allow for scalar activations to help.  E.g. a
human or monkey can recognize a visual object in just a few hundred
milliseconds.  Neurons in a region such as IT (four levels up in the
hierarchy) indicate that the cortex has recognized the object.  If you count
the minimum number of neuron to neuron transitions (add up all the delay
times) there isn't enough time for a second spike to have an effect.
Remember a fast spiking cell might produce spikes every 20msec.  To
represent a scalar cell output you need at least two spikes.  So a realistic
cortical model cannot require scalar cell activations all the time.

- The value of scalar activations is reduced in distributed representations.
This is the biggest reason for me.  The cortex always uses distributed
representations.  In a distributed representation the contribution of any
individual cell is reduced.  Any individual cell could die and the system
keeps working just fine.  The brain represents subtle differences by
changing which cells are active in an SDR.

I don't think it would be too burdensome to add scalar activations to the
CLA, but at the moment I don't see how it will help us make the CLA more
useful or resolve issues.  If we understood how scalar activations would
solve an outstanding CLA issue I wouldn't hesitate to add them.

The SP actually uses scalar activations of a sort.  The connections to each
SP bit produce a scalar depolarization of the cell.  Biologically the cell
which depolarizes the fastest first fires and disables others.  That is
pretty accepted scheme.

Jeff

From: nupic [mailto:[email protected]] On Behalf Of Ian
Danforth
Sent: Friday, August 30, 2013 2:56 PM
To: NuPIC general mailing list.
Subject: Re: [nupic-dev] Inter-layer plumbing

(Disclaimer: Jeff's not a huge fan of this idea, due to its implementation
complexity, but it seems relevant.)

Non-binary activation

There is a confusing part of CLA in that we talk about 'firing' but are not
necessarily equating this directly to spikes. 

For example a cell in the SP either turns on or not depending on overlap,
and global competition etc.

This doesn't quite mimic the behavior of a cell and its receptive field
where you see random low level firing much of the time, then increased
firing when a relevant stimulus enters the field, and decreased firing as it
leaves. 

I like to simplify much of what is going on as feature detectors reporting a
degree of confidence that the feature has been detected, and this is
naturally a scalar value. 

There are a couple important benefits of scalar activation however:

1. You can propagate a signal at different intensities such that you can set
of a chain of 'imagined' or predicted states without crossing a threshold
for motor output, or strongly reinforcing connections.

2. You can allow low intensity signals to reverberate through the system
allowing for loops and feedback. 

In my mind there will be more and more reasons to have different 'activation
states' which could all boil down to a need to have scalar activations.

Of course this has significant computational and memory impacts which is why
(IIRC) it's done as is today.

Ian

On Fri, Aug 30, 2013 at 2:31 PM, Jeff Hawkins <[email protected]> wrote:

The problem is cells that are pooling over time must be active/spiking, not
just depolarized as in sequence learning.  When cells become active by
pooling in advance of feed forward activation, it messes up the sequence
memory.  The CLA can't tell the difference between activation because of a
real world feed forward input and activation because of pooling.  What
happens is the CLA doesn't wait for real input and sequences runaway forward
in time.

Are we talking about within one layer, or across layers here?

>> Within one layer.  The CLA models just one layer of cells.

I don't know that I can step through enough of what happens in my head yet
to see how that happens in a single layer.  But what does look like a hole
is:  Say Cell A is active due to input, and Cell B is predictively
activated, not because the immediate next input will activate it, but
because the input *after that* will;  there is no way to differentiate a
prediction for further in the future than the next step from a wrong
prediction about the next step.  Is that the issue, or am I just off in
outer space?

>> Sorry, I couldn't follow your question. 

It might be useful to step back as remind ourselves what problems we are
trying to solve.  Within each region of the cortex we want to do the
following.

1) Inference of sequences.

We know the cortex recognizes sequences of patterns.  All audition (e.g.
spoken language, music) is only recognized when played in the correct order.
The same is true for vision and touch, they are temporal inference problems.
(We can recognize some still images but vision is mostly temporal.)

2) Prediction while inferring.

We know that the cortex is constantly predicting what is going to happen
next.  We know this because we recognize when things change.  It appears we
make multiple predictions simultaneously.  The properties of SDRs solve the
multiple prediction problem beautifully.  The cortex needs to detect
anomalies so it can direct attention to unpredicted input.

3) Temporal pooling.

This is when cells learn to fire continuously during part or all of a
sequence.  We know there are cells that do this for multiple phenomena in
vision and audition  Temporal pooling is not as widely understood, but many
people have reached the same hypothesis that temporal pooling plays a large
part of how we build invariant representations.  We don't know if temporal
pooling is occurring everywhere in the cortex but I work on the assumption
that it does.  Temporal pooling makes the hierarchy work much better too.
It means each level in the hierarchy can learn as much spatial and temporal
context as it can, freeing up the next level to work in more complex
patterns.  If we didn't have temporal pooling then the output of layer 3
would be changing at the same rate as the input, we need to get to slower
more stable concepts as we ascend the hierarchy.

The CLA is a great model of how a layer of cells does 1 and 2. It only
requires that the active cells are spiking and the predictive cells are
depolarized. I am confident the CLA is close to how real cells do inference
and prediction.

Things get tricky when we try to add temporal pooling, to get a layer of
cells to do 1, 2, and 3.  My assumption is that a single layer, layer 3 at a
minimum, has to do inference, prediction, and temporal pooling.  This could
be an incorrect assumption.  Some scientists divide layer 3 into 3a and 3b.
Some scientists designate a separate layer 2 and some don't.  Perhaps these
sub layers are doing different things, some inference and then others
pooling.  So when discussing temporal pooling we need to keep in mind that
the attempt to do it all in one layer of cells might be wrong.  It is
speculative.  I think it would be simpler and more elegant if we can show
that a single layer cells can infer, predict, and do temporal pooling.  That
is what I am trying to do, but that might be wrong.

So how does the "proposal" for temporal pooling work?  Say we have a
sequence A-B-C-D-E that repeats.  When E cells first become active they form
synapses to cells that were just active during D. So now the E cells will
become predictively active when they see the D cells again.  However, when
the E cells became predictively active due to the D cells pattern C was just
active so E cells now form synapses (on a different segment) with C cells.
After this the E cells become predictively active when they see either C or
D cells. The process can repeat.  At the end, a particular E cell has a
dendrite segment that recognizes D, another dendrite segment that recognizes
C, another segment that recognizes B, etc.  So now the E cells will be
predictively active when they see A, B, C, or D.  We would see the E cells
fire continuously during the ABCDE sequence, but only at E is the cell
responding to its feed forward receptive field.

As I said earlier, this all looks good except that it requires us to change
the "predictive state" from being depolarized to steady firing, and the
active state from being steady firing to mini-bursting.  That is a
relatively fine distinction that makes me uncomfortable.  There are other
weirdnesses that I haven't resolved.  For example this would say the
mini-burst occurs at the end of a sequence, where there is more evidence
they occur at the beginning.

A TOTALLY DIFFERENT APPROACH

One time I was talking to Murray Sherman and he suggested a completely
different approach to temporal pooling.  It is much much simpler but dumber
(not Murray, he is smart).

Excitatory synapses can be divided into ionotropic and metabotropic.  The
former only involve the flow of ions across the cell membrane.  They are
quick to start and quick to stop.  The latter invoke a metabolic pathway
(chemistry, proteins, stuff like that).  They are slower.  A metabotropic
synapse will depolarize a cell for up to a third or half of a second for one
incoming spike, whereas the effect of a spike at an ionotropic synapse lasts
just a few milliseconds.  The synapses near the cell body (the inputs to a
region, the SP synapses) are the slow type.  The synapses on the distal
dendrites (our TP synapses) are the fast type.  What this means is that a
fast changing input to layer 3 will result in a slower changing response.
This will definitely lead to slower and slower responses as you ascend the
hierarchy.

This approach is dumb because it involves no learning. It pools anything and
everything that occurs even one in sequence.  It is also hard to see how we
can get cells that stay active for several seconds during sequences.

CONCLUSION

Because the temporal pooling question remains unresolved I prefer to ignore
it for now.  We chose to work on problems of prediction and anomaly
detection in a single region that don't require temporal pooling.  But I
don't want to discourage others from working on it.

That actually brings up a question I was wondering about:  When we talk
about making predictions several steps in advance, how is that actually done
- by snapshotting the state, then hypothetically feeding the system's
predictions into itself to get further predictions and then restoring the
state;  or is it simply that you end up with cells which will only be
predictively active if several future iterations of input are likely to
*all* activate them?

>> When NuPIC makes predictions multiple steps in advance, we use a separate
classifier, external to the CLA. We store the state of the CLA in a buffer
for N steps and then classify those states when the correct data point
arrives N steps later.

Jeff

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] Inter-layer plumbing

Reply via email to