Re: [nupic-dev] Inter-layer plumbing

Fergal Byrne Mon, 09 Sep 2013 10:14:16 -0700

Hi Tim, Jeff, and Chetan,

We're returning to the engineering setup of the CLA as opposed to what
happens in the brain. In the CLA we are trying to get one layer of one
region to do meaningful learning for a particular task (one chosen
prediction of one value at one number of steps ahead, and/or anomaly
detection). The CLA is actually producing many, many predictions, at all
distances forward in time, for many values (and virtual values), but we use
the classifier as a filter to extract a particular prediction from all
available ones.


In the CLA (and neocortex), there is an answer to your question about a
cell predicting a value n steps ahead but only being able to "see" one step
backwards. Some (or possibly all) of the columns in a given SDR of
activation are indicating the other members of the sequence (past and
future), and become active when both a) the current member is on the input
stream and b) the previous member columns were active on the previous
input.

The SDR thus represents a) the current input values, b) a set of
predictions for the next SDR, each of which has a certain probability of
being correct. The predicting columns are recursively predicting their own
successors, because if activated they will cause their successor SDR's to
predict again.

In the neocortex, higher-level regions interpret the current SDR's as
predicting future events (because time granularity decreases as you go up),
while in the CLA the classifier keeps a tally of what happened after n
steps in the past to build a probability distribution for a concrete
prediction. The classifier can thus be seen as a very specialised
artificial higher-level region, which is predicting a single chosen value a
single number of steps ahead.

So, in the example of predicting the next word in a sentence, we have a
parallel prediction SDR which embodies the union of all plausible next
words (and at higher levels the set of possible clauses, concepts etc) with
co-encoded or co-embodied probability information for each word.

This is why we don't have to actually remember all the words in a sentence.
We remember the sequences hierarchically, and both the sequence of past
words and a probability-tagged tree of future word sequences are
holographically kept active in the single current SDR, which is updated as
each word is heard and replaced with the next SDR in the sequence.

Regards,

Fergal Byrne



On Sun, Sep 8, 2013 at 2:57 AM, Chetan Surpur <[email protected]> wrote:

> Hi Jeff,
>
> Thank you for your observations, they were quite thought-provoking. In
> fact, I have a question about one of them:
>
> It can be difficult or even impossible to turn these predictive cells into
>> a concrete prediction.  They are actually independent attributes being
>> predicted and there can be anywhere from zero to hundreds of them. There is
>> no way you can always turn them into a specific prediction.   You can
>> observe this in yourself.  When listening to someone talk you usually don’t
>> know what word they will say next, but you know if what they say doesn’t
>> make sense.  Your brain is predicting a set of attributes that should occur
>> next but usually you can’t take that union of attributes and say exactly
>> what word is most likely.
>
>
> You said that when listening to someone talk, you usually don't know what
> word they will say next. It's true that you're not constantly actively
> thinking of all the possible next words. But isn't it the case that if you
> specifically wanted to, you could produce a list of words that they would
> likely say? And even have an intuition for how likely they would say each
> of those words?
>
> It seems to me that the brain is naturally able to take a
> multiple-prediction cell state and produce essentially the output of the
> CLA classifier, an ordered list of the most likely specific predictions.
>
> You can see this in your own thought experiment: you know what's the most
> likely specific word that comes at the end of this _______. How is the
> brain able to do this, without an external classifier?
>
> Thanks,
> Chetan
>
>
> On Fri, Sep 6, 2013 at 6:32 PM, Jeff Hawkins <[email protected]> wrote:
>
>> Tim,****
>>
>> I apologize, I am not understanding the question.  Maybe a few
>> observations would help and then you can rephrase the question.  (The
>> following description does not mention temporal pooling.  I hope you
>> weren’t asking about that.)****
>>
>> ** **
>>
>> - The CLA learns sequences and recognizes them.  As part of recognizing a
>> sequence some cells will be predicting they will be active next.  These
>> cells can be in more than forty columns if the CLA is predicting more than
>> one possible next input.  It can be difficult or even impossible to turn
>> these predictive cells into a concrete prediction.  They are actually
>> independent attributes being predicted and there can be anywhere from zero
>> to hundreds of them. There is no way you can always turn them into a
>> specific prediction.   You can observe this in yourself.  When listening to
>> someone talk you usually don’t know what word they will say next, but you
>> know if what they say doesn’t make sense.  Your brain is predicting a set
>> of attributes that should occur next but usually you can’t take that union
>> of attributes and say exactly what word is most likely.****
>>
>> ** **
>>
>> - Again a prediction in the cells of the CLA is a union of attributes and
>> not a specific value prediction.  This works great for learning and
>> inferring on noisy and mixed sequences. It was a breakthrough in our
>> understanding of what a prediction in the brain is.  But it isn’t very
>> useful for making specific predictions that a product like Grok needs to
>> make.****
>>
>> ** **
>>
>> - To address this product need we implemented a separate classifier that
>> is not biological and doesn’t have cells or synapses.  The classifier
>> matches the state of the CLA with the next input value.  The state of the
>> CLA is the set of cells that are currently active, which represents all
>> inputs up to the preset interpreted by the memory of the CLA.  I don’t
>> recall the details of how the classifier works but I think for each cell we
>> maintain a histogram of values that occurred next when the cell was
>> active.  We then combine the histograms of all the currently active cells
>> to get a probability distribution of predicted values.  This works really
>> well.****
>>
>> ** **
>>
>> - To make a prediction multiple steps in advance is simple.  We make
>> another classifier identical to the first but now each cell is storing a
>> histogram of input values that occur x steps in the future.  (We have to
>> keep a buffer of x states of the CLA as we wait for the ultimate input
>> value to arrive.)  This works well too.  Ok, forget about the classifier
>> for now.  Back to the cells in the CLA****
>>
>> ** **
>>
>> - If a cell was predicting to be active and it does become active we do
>> some hebbian learning.  We only adjust the synapses on the dendrite
>> segment(s) that generated dendritic spikes that put the cell into its
>> predictive state.  We do Hebbian learning on dendrite segments not the cell
>> as a whole, other than that it is basic Hebbian learning.  If a synapse on
>> an active dendrite segment is active we increment its permanence (this
>> synapse was predictive).  If a synapses on an active dendrite segment is
>> inactive we decrement its permanence (this synapse was not helpful, not
>> predictive).****
>>
>> ** **
>>
>> - A cell that is in a predictive state but doesn’t become active is not
>> wrong.  We don’t want to penalize it.  Perhaps it was representing a
>> transision that occurs only 30% of the time.  Don’t want to penalize it 70%
>> of the time.****
>>
>> ** **
>>
>> - For a while we included a “global decay”.  We decremented all synapses
>> a little bit all the time.  We thought we needed this to weed out old
>> unused connections.  The results from this were mixed and ultimately we
>> decided we didn’t need it.  Old synapses go away when new ones crowd them
>> out as explained below.****
>>
>> ** **
>>
>> - One consequence of this learning methodology is that the CLA keeps
>> learning new patterns until all the dendritic segments have been used.  We
>> fix the number of dendrite segments per cell at I think 128.  Once a cell
>> has used all of these it will start reusing or “morphing” them.  A this
>> point new patterns start crowding out old ones.  Our experiments suggest
>> that the capacity of the CLA is very large.  It can learn many transitions
>> before it has to start forgetting old ones.  In theory the number of
>> segments could be much smaller and the system would still work fine.  It
>> would just have a smaller capacity  for memorizing sequences.****
>>
>> ** **
>>
>> - It would be a useful exercise to characterize the sequence capacity of
>> the CLA and see how it changes with different numbers of dendrite segments
>> per cell.  Anyone want to take this on?****
>>
>> ** **
>>
>> Maybe this answered your question, but if not maybe you can rephrase it
>> in the context of what I just wrote.****
>>
>> Jeff****
>>
>> ** **
>>
>> *From:* nupic [mailto:[email protected]] *On Behalf Of *Tim
>> Boudreau
>> *Sent:* Friday, September 06, 2013 6:47 AM
>>
>> *To:* NuPIC general mailing list.
>> *Subject:* Re: [nupic-dev] Inter-layer plumbingn****
>>
>> ** **
>>
>> On Fri, Aug 30, 2013 at 8:30 PM, Tim Boudreau <[email protected]>
>> wrote:****
>>
>> On Fri, Aug 30, 2013 at 5:31 PM, Jeff Hawkins <[email protected]>
>> wrote:****
>>
>> >> Sorry, I couldn’t follow your question.****
>>
>> ** **
>>
>> ** **
>>
>> Hi, Jeff, et. al.,****
>>
>> ** **
>>
>> I don't mean to harp on this, but this question slipped through the
>> cracks, and it seems like a pretty fundamental one.  Bad question?
>>  Unanswerable?  Something else?****
>>
>> ** **
>>
>> In a nutshell:  If a good prediction for several steps in advance is
>> indistinguishable from a bad prediction for one step in advance, how do you
>> avoid penalizing synapses which make correct predictions for further in
>> advance than one step?****
>>
>> ** **
>>
>> I understand there's a "classifier" which iterates snapshots and
>> monday-morning-quarterbacks the synapses, but also that it's a short-term
>> hack, not the way things are supposed to work.  Here's the more specific
>> explanation of the question:****
>>
>>  ****
>>
>> Let me try to clarify.  It's really an implementation-in-software
>> question, but one that must be backed by biology.  Here are some facts as I
>> understand them:****
>>
>>  - There are three states a cell can be in - not-active,
>> activated-from-input or predictively-activated.****
>>
>>  - An incorrect prediction results in reducing the permanence of the
>> associated synapse.****
>>
>>  - A predictively activated cell may be making a prediction about several
>> steps into the future.****
>>
>> ** **
>>
>> With one bit of information - predictive or not - it is impossible to
>> tell the target step of a prediction.  Let's use your ABCDE example.  E
>> eventually forms connections with C so that E is active when C is active.
>>  But E is making a correct prediction for two steps into the future.  But
>> it is a wrong prediction for one step into the future.  So the C->E synapse
>> will be weakened because of the incorrect prediction.****
>>
>> ** **
>>
>> Given repeated ABCDE inputs, what it sounds like will happen is that****
>>
>>  - an C->E synapse will form, ****
>>
>>  - when the next C comes up it will predictively-activate****
>>
>>  - when the next input is actually D, the C->E's permanence will
>> decrement because E did not follow C****
>>
>>  - after a few cycles it will disappear (permanence < threshold = invalid)
>> ****
>>
>>  - the next E input will increment it back into validity****
>>
>>  - and so forth, forever, winking into and out of existence****
>>
>> ** **
>>
>> This happens because a single bit of information (predictive or
>> not-predictive) - is insufficient to determine how many steps into the
>> future a prediction is *for*.  So a prediction about >1 step into the
>> future is an incorrect prediction about 1 step into the future, but there
>> is no data structure that I've read about that carries the information
>> "this is a prediction for two steps from now".  Which seems like a problem
>> - ?****
>>
>> ** **
>>
>> 3) Temporal pooling.****
>>
>> This is when cells learn to fire continuously during part or all of a
>> sequence.  ****
>>
>> ** **
>>
>> It seems like the design for a single layer precludes that - anything
>> which learns to fire continuously will quickly unlearn it, relearn it,
>> unlearn it, relearn it.  I could imagine feedback from another layer that
>> can recognize ABCDE as a whole might reinforce our C->E connection so it
>> does not disappear.  Is that where this problem gets solved?****
>>
>>  ****
>>
>> That actually brings up a question I was wondering about:  When we talk
>> about making predictions several steps in advance, how is that actually
>> done - by snapshotting the state, then hypothetically feeding the system's
>> predictions into itself to get further predictions and then restoring the
>> state;  or is it simply that you end up with cells which will only be
>> predictively active if several future iterations of input are likely to
>> *all* activate them?****
>>
>>  ****
>>
>> >> When NuPIC makes predictions multiple steps in advance, we use a
>> separate classifier, external to the CLA. We store the state of the CLA in
>> a buffer for N steps and then classify those states when the correct data
>> point arrives N steps later.****
>>
>> ** **
>>
>> So, you develop the ability to say "if this cell is active, it's actually
>> a prediction about N steps in the future" and just use that against the
>> system not in learning mode?  Or is the classifier output fed back into the
>> dendrite segment layout and permanence values of synapses somehow?****
>>
>> ** **
>>
>> Thanks,****
>>
>> ** **
>>
>> -Tim****
>>
>> ** **
>>
>>
>>
>> ****
>>
>> ** **
>>
>> -- ****
>>
>> http://timboudreau.com****
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>


-- 

Fergal Byrne

ExamSupport/StudyHub [email protected] http://www.examsupport.ie
Dublin in Bits [email protected] http://www.inbits.com +353 83
4214179
Formerly of Adnet [email protected] http://www.adnet.ie

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] Inter-layer plumbing

Reply via email to