Re: [nupic-dev] Inter-layer plumbing

Jeff Hawkins Fri, 06 Sep 2013 18:32:32 -0700

Tim,

I apologize, I am not understanding the question.  Maybe a few observations 
would help and then you can rephrase the question.  (The following description 
does not mention temporal pooling.  I hope you weren’t asking about that.)

- The CLA learns sequences and recognizes them.  As part of recognizing a 
sequence some cells will be predicting they will be active next.  These cells 
can be in more than forty columns if the CLA is predicting more than one 
possible next input.  It can be difficult or even impossible to turn these 
predictive cells into a concrete prediction.  They are actually independent 
attributes being predicted and there can be anywhere from zero to hundreds of 
them. There is no way you can always turn them into a specific prediction.   
You can observe this in yourself.  When listening to someone talk you usually 
don’t know what word they will say next, but you know if what they say doesn’t 
make sense.  Your brain is predicting a set of attributes that should occur 
next but usually you can’t take that union of attributes and say exactly what 
word is most likely.

- Again a prediction in the cells of the CLA is a union of attributes and not a 
specific value prediction.  This works great for learning and inferring on 
noisy and mixed sequences. It was a breakthrough in our understanding of what a 
prediction in the brain is.  But it isn’t very useful for making specific 
predictions that a product like Grok needs to make.

- To address this product need we implemented a separate classifier that is not 
biological and doesn’t have cells or synapses.  The classifier matches the 
state of the CLA with the next input value.  The state of the CLA is the set of 
cells that are currently active, which represents all inputs up to the preset 
interpreted by the memory of the CLA.  I don’t recall the details of how the 
classifier works but I think for each cell we maintain a histogram of values 
that occurred next when the cell was active.  We then combine the histograms of 
all the currently active cells to get a probability distribution of predicted 
values.  This works really well.

- To make a prediction multiple steps in advance is simple.  We make another 
classifier identical to the first but now each cell is storing a histogram of 
input values that occur x steps in the future.  (We have to keep a buffer of x 
states of the CLA as we wait for the ultimate input value to arrive.)  This 
works well too.  Ok, forget about the classifier for now.  Back to the cells in 
the CLA

- If a cell was predicting to be active and it does become active we do some 
hebbian learning.  We only adjust the synapses on the dendrite segment(s) that 
generated dendritic spikes that put the cell into its predictive state.  We do 
Hebbian learning on dendrite segments not the cell as a whole, other than that 
it is basic Hebbian learning.  If a synapse on an active dendrite segment is 
active we increment its permanence (this synapse was predictive).  If a 
synapses on an active dendrite segment is inactive we decrement its permanence 
(this synapse was not helpful, not predictive).

- A cell that is in a predictive state but doesn’t become active is not wrong.  
We don’t want to penalize it.  Perhaps it was representing a transision that 
occurs only 30% of the time.  Don’t want to penalize it 70% of the time.

- For a while we included a “global decay”.  We decremented all synapses a 
little bit all the time.  We thought we needed this to weed out old unused 
connections.  The results from this were mixed and ultimately we decided we 
didn’t need it.  Old synapses go away when new ones crowd them out as explained 
below.

- One consequence of this learning methodology is that the CLA keeps learning 
new patterns until all the dendritic segments have been used.  We fix the 
number of dendrite segments per cell at I think 128.  Once a cell has used all 
of these it will start reusing or “morphing” them.  A this point new patterns 
start crowding out old ones.  Our experiments suggest that the capacity of the 
CLA is very large.  It can learn many transitions before it has to start 
forgetting old ones.  In theory the number of segments could be much smaller 
and the system would still work fine.  It would just have a smaller capacity  
for memorizing sequences.

- It would be a useful exercise to characterize the sequence capacity of the 
CLA and see how it changes with different numbers of dendrite segments per 
cell.  Anyone want to take this on?

Maybe this answered your question, but if not maybe you can rephrase it in the 
context of what I just wrote.

Jeff

From: nupic [mailto:[email protected]] On Behalf Of Tim Boudreau
Sent: Friday, September 06, 2013 6:47 AM
To: NuPIC general mailing list.
Subject: Re: [nupic-dev] Inter-layer plumbingn

On Fri, Aug 30, 2013 at 8:30 PM, Tim Boudreau <[email protected]> wrote:

On Fri, Aug 30, 2013 at 5:31 PM, Jeff Hawkins <[email protected]> wrote:

>> Sorry, I couldn’t follow your question.

Hi, Jeff, et. al.,

I don't mean to harp on this, but this question slipped through the cracks, and 
it seems like a pretty fundamental one.  Bad question?  Unanswerable?  
Something else?

In a nutshell:  If a good prediction for several steps in advance is 
indistinguishable from a bad prediction for one step in advance, how do you 
avoid penalizing synapses which make correct predictions for further in advance 
than one step?

I understand there's a "classifier" which iterates snapshots and 
monday-morning-quarterbacks the synapses, but also that it's a short-term hack, 
not the way things are supposed to work.  Here's the more specific explanation 
of the question:

Let me try to clarify.  It's really an implementation-in-software question, but 
one that must be backed by biology.  Here are some facts as I understand them:

 - There are three states a cell can be in - not-active, activated-from-input 
or predictively-activated.

 - An incorrect prediction results in reducing the permanence of the associated 
synapse.

 - A predictively activated cell may be making a prediction about several steps 
into the future.

With one bit of information - predictive or not - it is impossible to tell the 
target step of a prediction.  Let's use your ABCDE example.  E eventually forms 
connections with C so that E is active when C is active.  But E is making a 
correct prediction for two steps into the future.  But it is a wrong prediction 
for one step into the future.  So the C->E synapse will be weakened because of 
the incorrect prediction.

Given repeated ABCDE inputs, what it sounds like will happen is that

 - an C->E synapse will form, 

 - when the next C comes up it will predictively-activate

 - when the next input is actually D, the C->E's permanence will decrement 
because E did not follow C

 - after a few cycles it will disappear (permanence < threshold = invalid)

 - the next E input will increment it back into validity

 - and so forth, forever, winking into and out of existence

This happens because a single bit of information (predictive or not-predictive) 
- is insufficient to determine how many steps into the future a prediction is 
*for*.  So a prediction about >1 step into the future is an incorrect 
prediction about 1 step into the future, but there is no data structure that 
I've read about that carries the information "this is a prediction for two 
steps from now".  Which seems like a problem - ?

3) Temporal pooling.

This is when cells learn to fire continuously during part or all of a sequence. 

It seems like the design for a single layer precludes that - anything which 
learns to fire continuously will quickly unlearn it, relearn it, unlearn it, 
relearn it.  I could imagine feedback from another layer that can recognize 
ABCDE as a whole might reinforce our C->E connection so it does not disappear.  
Is that where this problem gets solved?

That actually brings up a question I was wondering about:  When we talk about 
making predictions several steps in advance, how is that actually done - by 
snapshotting the state, then hypothetically feeding the system's predictions 
into itself to get further predictions and then restoring the state;  or is it 
simply that you end up with cells which will only be predictively active if 
several future iterations of input are likely to *all* activate them?

>> When NuPIC makes predictions multiple steps in advance, we use a separate 
>> classifier, external to the CLA. We store the state of the CLA in a buffer 
>> for N steps and then classify those states when the correct data point 
>> arrives N steps later.

So, you develop the ability to say "if this cell is active, it's actually a 
prediction about N steps in the future" and just use that against the system 
not in learning mode?  Or is the classifier output fed back into the dendrite 
segment layout and permanence values of synapses somehow?

Thanks,

-Tim

-- 

http://timboudreau.com

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] Inter-layer plumbing

Reply via email to