Re: [nupic-dev] Inter-layer plumbing

Jeff Hawkins Mon, 09 Sep 2013 12:26:36 -0700

Chetan,

Hah!  I wondered if anyone would catch me on this, because in some of talks
I do use that example "you can tell what word will be at the end of this
____".  See below for an exmplanation...

Hi Jeff,

Thank you for your observations, they were quite thought-provoking. In fact,
I have a question about one of them:

It can be difficult or even impossible to turn these predictive cells into a
concrete prediction.  They are actually independent attributes being
predicted and there can be anywhere from zero to hundreds of them. There is
no way you can always turn them into a specific prediction.   You can
observe this in yourself.  When listening to someone talk you usually don't
know what word they will say next, but you know if what they say doesn't
make sense.  Your brain is predicting a set of attributes that should occur
next but usually you can't take that union of attributes and say exactly
what word is most likely.

You said that when listening to someone talk, you usually don't know what
word they will say next. It's true that you're not constantly actively
thinking of all the possible next words. But isn't it the case that if you
specifically wanted to, you could produce a list of words that they would
likely say? And even have an intuition for how likely they would say each of
those words?

>>  Yes you could think and make a list  of possible words someone might say
next, but that is a relatively slow cognitive process.  When I am referring
to prediction in the CLA I am referring to an immediate knowledge of what is
going to happen next. To take a multiple prediction and turn it into a list,
one item at a time is outside of the what the CLA does, and in the brain it
is a slow and iterative process.  We could probably do this with a multiple
prediction coming from a CLA too.

It seems to me that the brain is naturally able to take a
multiple-prediction cell state and produce essentially the output of the CLA
classifier, an ordered list of the most likely specific predictions.

>> sometimes.  What is the likely list of words expected after "It can be
difficult or even impossible to ..."?

You can see this in your own thought experiment: you know what's the most
likely specific word that comes at the end of this ______. How is the brain
able to do this, without an external classifier?

>>   When I first did this, I found it surprisingly difficult to come up
with a phrase where everyone would be able to predict a singular word!  I
carefully crafted this sentence to make it work.  I usually something like,
"because you know english and because you are understanding what I am
saying, you will be able to predict what word is at the end of this."   I
use "English", "saying", and especially "word" to narrow down the possible
next words to just one.

Jeff

Thanks,

Chetan

On Fri, Sep 6, 2013 at 6:32 PM, Jeff Hawkins <[email protected]> wrote:

Tim,

I apologize, I am not understanding the question.  Maybe a few observations
would help and then you can rephrase the question.  (The following
description does not mention temporal pooling.  I hope you weren't asking
about that.)

- The CLA learns sequences and recognizes them.  As part of recognizing a
sequence some cells will be predicting they will be active next.  These
cells can be in more than forty columns if the CLA is predicting more than
one possible next input.  It can be difficult or even impossible to turn
these predictive cells into a concrete prediction.  They are actually
independent attributes being predicted and there can be anywhere from zero
to hundreds of them. There is no way you can always turn them into a
specific prediction.   You can observe this in yourself.  When listening to
someone talk you usually don't know what word they will say next, but you
know if what they say doesn't make sense.  Your brain is predicting a set of
attributes that should occur next but usually you can't take that union of
attributes and say exactly what word is most likely.

- Again a prediction in the cells of the CLA is a union of attributes and
not a specific value prediction.  This works great for learning and
inferring on noisy and mixed sequences. It was a breakthrough in our
understanding of what a prediction in the brain is.  But it isn't very
useful for making specific predictions that a product like Grok needs to
make.

- To address this product need we implemented a separate classifier that is
not biological and doesn't have cells or synapses.  The classifier matches
the state of the CLA with the next input value.  The state of the CLA is the
set of cells that are currently active, which represents all inputs up to
the preset interpreted by the memory of the CLA.  I don't recall the details
of how the classifier works but I think for each cell we maintain a
histogram of values that occurred next when the cell was active.  We then
combine the histograms of all the currently active cells to get a
probability distribution of predicted values.  This works really well.

- To make a prediction multiple steps in advance is simple.  We make another
classifier identical to the first but now each cell is storing a histogram
of input values that occur x steps in the future.  (We have to keep a buffer
of x states of the CLA as we wait for the ultimate input value to arrive.)
This works well too.  Ok, forget about the classifier for now.  Back to the
cells in the CLA

- If a cell was predicting to be active and it does become active we do some
hebbian learning.  We only adjust the synapses on the dendrite segment(s)
that generated dendritic spikes that put the cell into its predictive state.
We do Hebbian learning on dendrite segments not the cell as a whole, other
than that it is basic Hebbian learning.  If a synapse on an active dendrite
segment is active we increment its permanence (this synapse was predictive).
If a synapses on an active dendrite segment is inactive we decrement its
permanence (this synapse was not helpful, not predictive).

- A cell that is in a predictive state but doesn't become active is not
wrong.  We don't want to penalize it.  Perhaps it was representing a
transision that occurs only 30% of the time.  Don't want to penalize it 70%
of the time.

- For a while we included a "global decay".  We decremented all synapses a
little bit all the time.  We thought we needed this to weed out old unused
connections.  The results from this were mixed and ultimately we decided we
didn't need it.  Old synapses go away when new ones crowd them out as
explained below.

- One consequence of this learning methodology is that the CLA keeps
learning new patterns until all the dendritic segments have been used.  We
fix the number of dendrite segments per cell at I think 128.  Once a cell
has used all of these it will start reusing or "morphing" them.  A this
point new patterns start crowding out old ones.  Our experiments suggest
that the capacity of the CLA is very large.  It can learn many transitions
before it has to start forgetting old ones.  In theory the number of
segments could be much smaller and the system would still work fine.  It
would just have a smaller capacity  for memorizing sequences.

- It would be a useful exercise to characterize the sequence capacity of the
CLA and see how it changes with different numbers of dendrite segments per
cell.  Anyone want to take this on?

Maybe this answered your question, but if not maybe you can rephrase it in
the context of what I just wrote.

Jeff

From: nupic [mailto:[email protected]] On Behalf Of Tim
Boudreau
Sent: Friday, September 06, 2013 6:47 AM

To: NuPIC general mailing list.

Subject: Re: [nupic-dev] Inter-layer plumbingn

On Fri, Aug 30, 2013 at 8:30 PM, Tim Boudreau <[email protected]> wrote:

On Fri, Aug 30, 2013 at 5:31 PM, Jeff Hawkins <[email protected]> wrote:

>> Sorry, I couldn't follow your question.

Hi, Jeff, et. al.,

I don't mean to harp on this, but this question slipped through the cracks,
and it seems like a pretty fundamental one.  Bad question?  Unanswerable?
Something else?

In a nutshell:  If a good prediction for several steps in advance is
indistinguishable from a bad prediction for one step in advance, how do you
avoid penalizing synapses which make correct predictions for further in
advance than one step?

I understand there's a "classifier" which iterates snapshots and
monday-morning-quarterbacks the synapses, but also that it's a short-term
hack, not the way things are supposed to work.  Here's the more specific
explanation of the question:

Let me try to clarify.  It's really an implementation-in-software question,
but one that must be backed by biology.  Here are some facts as I understand
them:

 - There are three states a cell can be in - not-active,
activated-from-input or predictively-activated.

 - An incorrect prediction results in reducing the permanence of the
associated synapse.

 - A predictively activated cell may be making a prediction about several
steps into the future.

With one bit of information - predictive or not - it is impossible to tell
the target step of a prediction.  Let's use your ABCDE example.  E
eventually forms connections with C so that E is active when C is active.
But E is making a correct prediction for two steps into the future.  But it
is a wrong prediction for one step into the future.  So the C->E synapse
will be weakened because of the incorrect prediction.

Given repeated ABCDE inputs, what it sounds like will happen is that

 - an C->E synapse will form, 

 - when the next C comes up it will predictively-activate

 - when the next input is actually D, the C->E's permanence will decrement
because E did not follow C

 - after a few cycles it will disappear (permanence < threshold = invalid)

 - the next E input will increment it back into validity

 - and so forth, forever, winking into and out of existence

This happens because a single bit of information (predictive or
not-predictive) - is insufficient to determine how many steps into the
future a prediction is *for*.  So a prediction about >1 step into the future
is an incorrect prediction about 1 step into the future, but there is no
data structure that I've read about that carries the information "this is a
prediction for two steps from now".  Which seems like a problem - ?

3) Temporal pooling.

This is when cells learn to fire continuously during part or all of a
sequence.  

It seems like the design for a single layer precludes that - anything which
learns to fire continuously will quickly unlearn it, relearn it, unlearn it,
relearn it.  I could imagine feedback from another layer that can recognize
ABCDE as a whole might reinforce our C->E connection so it does not
disappear.  Is that where this problem gets solved?

That actually brings up a question I was wondering about:  When we talk
about making predictions several steps in advance, how is that actually done
- by snapshotting the state, then hypothetically feeding the system's
predictions into itself to get further predictions and then restoring the
state;  or is it simply that you end up with cells which will only be
predictively active if several future iterations of input are likely to
*all* activate them?

>> When NuPIC makes predictions multiple steps in advance, we use a separate
classifier, external to the CLA. We store the state of the CLA in a buffer
for N steps and then classify those states when the correct data point
arrives N steps later.

So, you develop the ability to say "if this cell is active, it's actually a
prediction about N steps in the future" and just use that against the system
not in learning mode?  Or is the classifier output fed back into the
dendrite segment layout and permanence values of synapses somehow?

Thanks,

-Tim

-- 

http://timboudreau.com

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] Inter-layer plumbing

Reply via email to