Hi Chetan,

At the risk of using either the word "artefact" or "hack," we need to
recognise that we're still working on the single-layer of a single region
which is trying to predict a single value a single number of steps ahead.
We haven't figured out universal ways of feeding this guy its data, and
we're avoiding for now the whole question of how the different layers work
and how to connect CLA's in a hierarchy.

So for now, we have encoders, classifiers, and swarming to help us put
together the missing infrastructure around our tiny 1mm squared layer of a
region. I'm really interested in the idea of a neurologically justified
design involving adaptive encoders which simultaneously incorporate
encoding parameters, classification and swarming in a single structure.
This design will likely have an analogue with inter-layer and inter-region
plumbing.

If that's correct, it'll hopefully give us the initial machinery for
building hierarchies. There are all sorts of computations humans do which
require a hierarchy to even begin to figure out.

Regards,

Fergal Byrne



On Mon, Sep 9, 2013 at 9:07 PM, Chetan Surpur <[email protected]> wrote:

> I see. But unless I'm mistaken, isn't it the case that with the CLA
> classifier, a single continuous variable is discretized into buckets, and
> the probability distribution is done over these categories? So in the end,
> the classifier treats continuous variables and category variables in the
> same fashion?
>
> Also, what do you think of reconstruction / top-down feedback built into
> the CLA being a more natural approach than using the classifier?
>
>
> On Mon, Sep 9, 2013 at 1:00 PM, Jeff Hawkins <[email protected]> wrote:
>
>> Got it.  Bear in mind the classifier was implemented primarily to create
>> a probability distribution over a single continuous variable.  This is a
>> bit different than predicting a category like a word.****
>>
>> Jef****
>>
>> ** **
>>
>> *From:* nupic [mailto:[email protected]] *On Behalf Of *Chetan
>> Surpur
>> *Sent:* Monday, September 09, 2013 12:46 PM
>>
>> *To:* NuPIC general mailing list.
>> *Subject:* Re: [nupic-dev] Inter-layer plumbing****
>>
>> ** **
>>
>> Jeff,****
>>
>> ** **
>>
>> Thanks for your response! I agree that there's a difference between
>> "automatic" knowledge of what you expect to see next, versus the slower
>> process of generating specific predictions before seeing what actually
>> comes next. I specifically wanted to explore the latter process, and
>> whether or not we really do need the CLA classifier for it.****
>>
>> ** **
>>
>> After more thought, I realized that this process is one of generation
>> rather than prediction. When you read the sentence, "The cat drinks _____",
>> your brain is put into a certain state. From this state, if you were asked
>> to generate the next word(s), you would say something like "milk" or
>> "water" more likely than you would say "sidewalk". The words "milk" or
>> "water" seem to come from a higher-level neuronal activation that
>> represents concepts regarding "cats" and "drinking", which your brain then
>> uses to regenerate the lower-level predictions of "milk" or "water". From
>> reading other emails on this mailing list, I believe this used to be called
>> "reconstruction" in the CLA algorithm.****
>>
>> ** **
>>
>> What I wanted to point out is that it seems to me that this process of
>> reconstruction is more natural following the biology than the external CLA
>> classifier, even though it's not yet naturally placed in the current CLA
>> implementation. I wanted to ask if we've settled with the external
>> classifier, or whether that's just a temporary construct in lieu of a more
>> complete implementation of the CLA that has reconstruction and top-down
>> feedback as a first-class citizen.****
>>
>> ** **
>>
>> Thanks,****
>>
>> Chetan****
>>
>> ** **
>>
>> ** **
>>
>> On Mon, Sep 9, 2013 at 12:25 PM, Jeff Hawkins <[email protected]>
>> wrote:****
>>
>> Chetan,****
>>
>> Hah!  I wondered if anyone would catch me on this, because in some of
>> talks I do use that example “you can tell what word will be at the end of
>> this ____”.  See below for an exmplanation...****
>>
>>                                                                    ****
>>
>> Hi Jeff,****
>>
>>  ****
>>
>> Thank you for your observations, they were quite thought-provoking. In
>> fact, I have a question about one of them:****
>>
>>  ****
>>
>> It can be difficult or even impossible to turn these predictive cells
>> into a concrete prediction.  They are actually independent attributes being
>> predicted and there can be anywhere from zero to hundreds of them. There is
>> no way you can always turn them into a specific prediction.   You can
>> observe this in yourself.  When listening to someone talk you usually don’t
>> know what word they will say next, but you know if what they say doesn’t
>> make sense.  Your brain is predicting a set of attributes that should occur
>> next but usually you can’t take that union of attributes and say exactly
>> what word is most likely.****
>>
>>  ****
>>
>> You said that when listening to someone talk, you usually don't know what
>> word they will say next. It's true that you're not constantly actively
>> thinking of all the possible next words. But isn't it the case that if you
>> specifically wanted to, you could produce a list of words that they would
>> likely say? And even have an intuition for how likely they would say each
>> of those words?****
>>
>> >>  Yes you could think and make a list  of possible words someone might
>> say next, but that is a relatively slow cognitive process.  When I am
>> referring to prediction in the CLA I am referring to an immediate knowledge
>> of what is going to happen next. To take a multiple prediction and turn it
>> into a list, one item at a time is outside of the what the CLA does, and in
>> the brain it is a slow and iterative process.  We could probably do this
>> with a multiple prediction coming from a CLA too.****
>>
>>  ****
>>
>> It seems to me that the brain is naturally able to take a
>> multiple-prediction cell state and produce essentially the output of the
>> CLA classifier, an ordered list of the most likely specific predictions.*
>> ***
>>
>> >> sometimes.  What is the likely list of words expected after “It can be
>> difficult or even impossible to …..”?****
>>
>>  ****
>>
>> You can see this in your own thought experiment: you know what's the most
>> likely specific word that comes at the end of this ______. How is the brain
>> able to do this, without an external classifier?****
>>
>> >>   When I first did this, I found it surprisingly difficult to come up
>> with a phrase where everyone would be able to predict a singular word!  I
>> carefully crafted this sentence to make it work.  I usually something like,
>> “because you know english and because you are understanding what I am
>> saying, you will be able to predict what word is at the end of this…”   I
>> use “English”, “saying”, and especially “word” to narrow down the possible
>> next words to just one.****
>>
>>  ****
>>
>> Jeff****
>>
>>  ****
>>
>> Thanks,****
>>
>> Chetan****
>>
>>  ****
>>
>> On Fri, Sep 6, 2013 at 6:32 PM, Jeff Hawkins <[email protected]>
>> wrote:****
>>
>> Tim,****
>>
>> I apologize, I am not understanding the question.  Maybe a few
>> observations would help and then you can rephrase the question.  (The
>> following description does not mention temporal pooling.  I hope you
>> weren’t asking about that.)****
>>
>>  ****
>>
>> - The CLA learns sequences and recognizes them.  As part of recognizing a
>> sequence some cells will be predicting they will be active next.  These
>> cells can be in more than forty columns if the CLA is predicting more than
>> one possible next input.  It can be difficult or even impossible to turn
>> these predictive cells into a concrete prediction.  They are actually
>> independent attributes being predicted and there can be anywhere from zero
>> to hundreds of them. There is no way you can always turn them into a
>> specific prediction.   You can observe this in yourself.  When listening to
>> someone talk you usually don’t know what word they will say next, but you
>> know if what they say doesn’t make sense.  Your brain is predicting a set
>> of attributes that should occur next but usually you can’t take that union
>> of attributes and say exactly what word is most likely.****
>>
>>  ****
>>
>> - Again a prediction in the cells of the CLA is a union of attributes and
>> not a specific value prediction.  This works great for learning and
>> inferring on noisy and mixed sequences. It was a breakthrough in our
>> understanding of what a prediction in the brain is.  But it isn’t very
>> useful for making specific predictions that a product like Grok needs to
>> make.****
>>
>>  ****
>>
>> - To address this product need we implemented a separate classifier that
>> is not biological and doesn’t have cells or synapses.  The classifier
>> matches the state of the CLA with the next input value.  The state of the
>> CLA is the set of cells that are currently active, which represents all
>> inputs up to the preset interpreted by the memory of the CLA.  I don’t
>> recall the details of how the classifier works but I think for each cell we
>> maintain a histogram of values that occurred next when the cell was
>> active.  We then combine the histograms of all the currently active cells
>> to get a probability distribution of predicted values.  This works really
>> well.****
>>
>>  ****
>>
>> - To make a prediction multiple steps in advance is simple.  We make
>> another classifier identical to the first but now each cell is storing a
>> histogram of input values that occur x steps in the future.  (We have to
>> keep a buffer of x states of the CLA as we wait for the ultimate input
>> value to arrive.)  This works well too.  Ok, forget about the classifier
>> for now.  Back to the cells in the CLA****
>>
>>  ****
>>
>> - If a cell was predicting to be active and it does become active we do
>> some hebbian learning.  We only adjust the synapses on the dendrite
>> segment(s) that generated dendritic spikes that put the cell into its
>> predictive state.  We do Hebbian learning on dendrite segments not the cell
>> as a whole, other than that it is basic Hebbian learning.  If a synapse on
>> an active dendrite segment is active we increment its permanence (this
>> synapse was predictive).  If a synapses on an active dendrite segment is
>> inactive we decrement its permanence (this synapse was not helpful, not
>> predictive).****
>>
>>  ****
>>
>> - A cell that is in a predictive state but doesn’t become active is not
>> wrong.  We don’t want to penalize it.  Perhaps it was representing a
>> transision that occurs only 30% of the time.  Don’t want to penalize it 70%
>> of the time.****
>>
>>  ****
>>
>> - For a while we included a “global decay”.  We decremented all synapses
>> a little bit all the time.  We thought we needed this to weed out old
>> unused connections.  The results from this were mixed and ultimately we
>> decided we didn’t need it.  Old synapses go away when new ones crowd them
>> out as explained below.****
>>
>>  ****
>>
>> - One consequence of this learning methodology is that the CLA keeps
>> learning new patterns until all the dendritic segments have been used.  We
>> fix the number of dendrite segments per cell at I think 128.  Once a cell
>> has used all of these it will start reusing or “morphing” them.  A this
>> point new patterns start crowding out old ones.  Our experiments suggest
>> that the capacity of the CLA is very large.  It can learn many transitions
>> before it has to start forgetting old ones.  In theory the number of
>> segments could be much smaller and the system would still work fine.  It
>> would just have a smaller capacity  for memorizing sequences.****
>>
>>  ****
>>
>> - It would be a useful exercise to characterize the sequence capacity of
>> the CLA and see how it changes with different numbers of dendrite segments
>> per cell.  Anyone want to take this on?****
>>
>>  ****
>>
>> Maybe this answered your question, but if not maybe you can rephrase it
>> in the context of what I just wrote.****
>>
>> Jeff****
>>
>>  ****
>>
>> *From:* nupic [mailto:[email protected]] *On Behalf Of *Tim
>> Boudreau
>> *Sent:* Friday, September 06, 2013 6:47 AM****
>>
>>
>> *To:* NuPIC general mailing list.****
>>
>> *Subject:* Re: [nupic-dev] Inter-layer plumbingn****
>>
>>  ****
>>
>> On Fri, Aug 30, 2013 at 8:30 PM, Tim Boudreau <[email protected]>
>> wrote:****
>>
>> On Fri, Aug 30, 2013 at 5:31 PM, Jeff Hawkins <[email protected]>
>> wrote:****
>>
>> >> Sorry, I couldn’t follow your question.****
>>
>>  ****
>>
>>  ****
>>
>> Hi, Jeff, et. al.,****
>>
>>  ****
>>
>> I don't mean to harp on this, but this question slipped through the
>> cracks, and it seems like a pretty fundamental one.  Bad question?
>>  Unanswerable?  Something else?****
>>
>>  ****
>>
>> In a nutshell:  If a good prediction for several steps in advance is
>> indistinguishable from a bad prediction for one step in advance, how do you
>> avoid penalizing synapses which make correct predictions for further in
>> advance than one step?****
>>
>>  ****
>>
>> I understand there's a "classifier" which iterates snapshots and
>> monday-morning-quarterbacks the synapses, but also that it's a short-term
>> hack, not the way things are supposed to work.  Here's the more specific
>> explanation of the question:****
>>
>>  ****
>>
>> Let me try to clarify.  It's really an implementation-in-software
>> question, but one that must be backed by biology.  Here are some facts as I
>> understand them:****
>>
>>  - There are three states a cell can be in - not-active,
>> activated-from-input or predictively-activated.****
>>
>>  - An incorrect prediction results in reducing the permanence of the
>> associated synapse.****
>>
>>  - A predictively activated cell may be making a prediction about several
>> steps into the future.****
>>
>>  ****
>>
>> With one bit of information - predictive or not - it is impossible to
>> tell the target step of a prediction.  Let's use your ABCDE example.  E
>> eventually forms connections with C so that E is active when C is active.
>>  But E is making a correct prediction for two steps into the future.  But
>> it is a wrong prediction for one step into the future.  So the C->E synapse
>> will be weakened because of the incorrect prediction.****
>>
>>  ****
>>
>> Given repeated ABCDE inputs, what it sounds like will happen is that****
>>
>>  - an C->E synapse will form, ****
>>
>>  - when the next C comes up it will predictively-activate****
>>
>>  - when the next input is actually D, the C->E's permanence will
>> decrement because E did not follow C****
>>
>>  - after a few cycles it will disappear (permanence < threshold = invalid)
>> ****
>>
>>  - the next E input will increment it back into validity****
>>
>>  - and so forth, forever, winking into and out of existence****
>>
>>  ****
>>
>> This happens because a single bit of information (predictive or
>> not-predictive) - is insufficient to determine how many steps into the
>> future a prediction is *for*.  So a prediction about >1 step into the
>> future is an incorrect prediction about 1 step into the future, but there
>> is no data structure that I've read about that carries the information
>> "this is a prediction for two steps from now".  Which seems like a problem
>> - ?****
>>
>>  ****
>>
>> 3) Temporal pooling.****
>>
>> This is when cells learn to fire continuously during part or all of a
>> sequence.  ****
>>
>>  ****
>>
>> It seems like the design for a single layer precludes that - anything
>> which learns to fire continuously will quickly unlearn it, relearn it,
>> unlearn it, relearn it.  I could imagine feedback from another layer that
>> can recognize ABCDE as a whole might reinforce our C->E connection so it
>> does not disappear.  Is that where this problem gets solved?****
>>
>>  ****
>>
>> That actually brings up a question I was wondering about:  When we talk
>> about making predictions several steps in advance, how is that actually
>> done - by snapshotting the state, then hypothetically feeding the system's
>> predictions into itself to get further predictions and then restoring the
>> state;  or is it simply that you end up with cells which will only be
>> predictively active if several future iterations of input are likely to
>> *all* activate them?****
>>
>>  ****
>>
>> >> When NuPIC makes predictions multiple steps in advance, we use a
>> separate classifier, external to the CLA. We store the state of the CLA in
>> a buffer for N steps and then classify those states when the correct data
>> point arrives N steps later.****
>>
>>  ****
>>
>> So, you develop the ability to say "if this cell is active, it's actually
>> a prediction about N steps in the future" and just use that against the
>> system not in learning mode?  Or is the classifier output fed back into the
>> dendrite segment layout and permanence values of synapses somehow?****
>>
>>  ****
>>
>> Thanks,****
>>
>>  ****
>>
>> -Tim****
>>
>>  ****
>>
>>
>>
>> ****
>>
>>  ****
>>
>> -- ****
>>
>> http://timboudreau.com****
>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org****
>>
>>  ****
>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org****
>>
>> ** **
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>


-- 

Fergal Byrne

ExamSupport/StudyHub [email protected] http://www.examsupport.ie
Dublin in Bits [email protected] http://www.inbits.com +353 83
4214179
Formerly of Adnet [email protected] http://www.adnet.ie
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to