Hi guys,

I've recently been thinking a lot about a tentative set of ideas which may
be considered a refinement or extension of the current CLA. I feel that I'm
on to something, but things are still floating around in my head too much
to pin down, so I thought I'd attempt to write them down and share them
with you. They're just some ideas, so I'm actually demanding your criticism
of anything which contradicts the neuroscience or which you know to be a
dead end!

The current CLA involves a subset of the real neocortex connection
structure, mainly involving the unidirectional feedforward connections, one
way of mapping inputs to activation, one type of inhibition functionality,
and a restricted type of prediction. The extensions I describe are intended
to fill this out to achieve two purposes: first, add functionality which
exists in the brain (and in other neural nets); and secondly, provide a
plausible neural mechanism to explain how the brain does it.

I'll start with some issues faced by the current theory, which I'm hoping
will be addressed or at least be better understood if we extend the CLA:

1a. The algorithm in the SP does not support reconstruction of inputs from
SDR's of activity (or prediction).

This (or something equivalent) is important for hierarchy and motor
performance. The phenomenon that Jeff refers to as "unrolling" a sequence
to trigger lower-level sequences requires that you send activation down the
hierarchy and reconstruct activations in those layers. The example he uses
is his speech - his memory of ideas triggers sequences of sentences, each
triggers a sequence of words, then phonemes, then muscle actions, etc.

The reason for this problem is that the SP effects only one part of
connection tuning between input (or lower region) activity and the
resulting column activities. Synapse permanences are improved based on good
correlations between positive activities when data is present. There is no
reverse pathway, which would "cause" the data based on the active columns.

1b. Input encoders are not truly adaptive.

In the body, the path from senses to cortex is multi-stage and complex. The
input appearing at the cortex has been through a number of processes, which
in combination are designed (and learn) to provide the best possible
texture for first-stage processing in the cortex. We know that the stages
nearest the cortex on this pathway are in a feedback loop which improves
the overall "encoder" in some way. The thalamus is no doubt a very
important part of this.

In comparison, our presentation to the CLA has minimal work done on it.
This severely limits what can be done with the data, as it gives the CLA a
lot of work to do. In particular, the CLA is not feeding back any
information to the encoder stage about how it feels about the data.

I'm leaving aside the setup of encoders, which is more like genetic design
and is not improved by online learning.

This and the previous issue are clearly related (hence the numbering).

2a. Predictions require external engineering to extract them.

At the moment, we chose beforehand what fields to predict, and how many
steps ahead. We then add a piece of equipment (the classifier) which
records what inputs coincide with active cells, and generates a statistical
prediction based on that.

2b. The CLA is not really pooling temporally.

While the code which generates the predictive cell patterns is called the
TP, it's only executing part of the pooling which is really done in the
neocortex. The real neocortex is producing a representation of (some or all
of) the actual sequence(s) of SDR patterns. The TP, along with the
classifier, are used to extract individual slices of this representation,
but the "whole thing" is not available to be presented to a higher region,
and so any hierarchy cannot easily form a slower-moving representation of
the fast-changing data below.

3. We can't reconstruct sequences on command.

This is due to a combination of the reconstruction problem and the temporal
pooling problem.

OK, to solve these problems, we need to add a lot of new connections and
set them to work. This is clearly going to add a lot to the memory and CPU
cost of the CLA, but we only need to switch it on if we're getting extra
value from it, so it's like many of the other settings we already have for
configuring the models.

Here are my proposed extensions:

1. Make the feedforward pathway truly adaptive.

The real brain has a complicated way to do this, no doubt involving return
connections at least as far as the thalamus, and also some kind of
processing both in the neocortex and in the thalamus. We can replicate this
with a simple mechanism which combines the current feedforward connections
with new functionality, which adapts the encoding (if sensory), tunes the
feedforward connection and also supports reconstruction (of sensory data
and lower-region activation).

a) Smart Encoding

I'll present this first as it could be added as a standalone option to
NuPIC.

The current AdaptiveEncoder is quite unsophisticated. It starts with a set
min-max, and changes one of those abruptly when it receives a scalar
out-of-range. All the intermediate buckets are rescaled instantly, thus
threatening to break the learning of previous patterns.

A smart encoder, on the other hand, adapts gradually to the changing
statistics of the data, and smoothly migrates intermediate bucket values at
minimal loss to the previous learning. The algorithm is quite simple, and
should not lead to large costs in time or memory. In return, I believe this
encoder will outperform the old one significantly.

Basically, an encoder is composed of a fixed array of "bits" which are each
initialised with a centroid and a radius. In the old encoder these are
derived by the encoder by subtracting the min from the max and dividing by
the number of bits, but in this encoder each bit maintains its own.

As the data comes in, each bit calculates whether it has been hit by the
value of the field it's encoding. The min and max bits will fire on any
out-of-range bits, clamping the encoding. So, far, this replicates a
standard (nonadaptive encoder).

The bits also keep a running average of how often they are on, which NuPIC
calls a duty cycle counter. This number tells each bit how to adapt.

The simplest case is for a max-bit which is getting lots of out-of-range
activations. It detects that its duty cycle count is high for the encoder
(and higher than its neighbour) so it should shift its centroid upwards and
extend its radius. Likewise for the min-bit.

Conversely, a max-bit which is not getting fired is sitting too high on the
number line, so it should drift its centroid down a little.

Intermediate bits will also experience different duty cycle levels. Here,
the bits should either squash together (if they're firing too much) or
spread out (if too little) according to a kind of pressure-tension balance
between the bits. There will be a gradient of duty-cycle counts across the
bits, and this can be used to spread the bits out so that their duty cycles
approach the global average.

The adaption process can be carried out every k steps (another parameter
which can be swarmed), so we can control the cost. The rate of movement of
the bits can also be a parameter, depending on the data.

b) Bidirectional and Per-Cell Feedforward Connections

At the moment, the SP goes through the columns, and for each column looks
at the connected bits to see which are on. it counts these up to give an
integer activation potential. It then picks the highest 2% of these and
activates those columns.

To learn, these active columns are again visited, and the permanences
adjusted based on coincidence between this activation and the on-bits in
the input.

We know this is a simplification which is intended to reflect the fact that
the cells in a column share correlated feedforward response. However, this
is only because the pipe of feedforward axons passes up through the column
and thus potentially synapses on all the cells in a correlated way. The
actual synapses are per cell and not per column.

If we add these synapses to the cells (as well as the columns), we get some
new functionality. Now, these synapses are learning the connections between
on-bits in the input and this particular cell becoming active. This
connection now tells us which bits are important to this cell, which is the
information we want to get back when the cell is predictive. The dendrite
permanence values are a histogram of the bits which are associated with
this cell becoming active.

You can use this to perform a much better job of reconstructing the input
from either a predictive state or an activation pattern. This can be
achieved simply by reversing the direction of the cell-bit connection as
follows:

Create a "virtual dendrite" on the input bit (or lower region cell).
For each active (or predictive cell), copy the synapse from each bit to the
bit's virtual dendrite.
Now feed the activation (or prediction) values into the virtual dendrites.
The result will be an "activation pattern" across the lower region or input
bits. You can do inhibition on this to return the highest probability SDR,
or do other things depending on the purpose of the exercise.

Note that a lot of this will work better if (at certain stages) you treat
activations and connections as more fine-grained than binary. You can
always force things to binary when it's of use.

c) Real Temporal Pooling

I'm leaving this one as a cliff-hanger until tomorrow. I have to go see
Leinster pound Connaught into the rugby pitch at the RDS!

Meanwhile, your thoughts on the above would be appreciated.

Regards

Fergal Byrne


On Sat, Oct 26, 2013 at 1:09 PM, Fergal Byrne
<[email protected]>wrote:

> Haha Jeff,
>
> I'd actually had "a couple" (it was Friday night my time, after all), but
> I tell the same story to everyone who's interested in understanding the
> Irish, regardless of the time of day or the blood alcohol level. I do
> seriously regard these kinds of self-descriptive jokes as windows on
> cultures, which can be regarded as shared learning. It's a more plausible
> explanation of Jung's collective unconscious.
>
> One of your (Americans') shared cultural artefacts is the idea of gun
> ownership, which is quite incomprehensible to people in Europe. This is
> because we've been constantly reinforcing the notion of a gun as a threat
> to us in the hands of a member of the public, something that only
> officially designated people should be trusted with (even then, the police
> in Ireland and the UK are almost exclusively unarmed).  You guys have a
> completely different set of associations, which your culture feeds you
> constantly. This is also tied up with the attitudes towards authority, the
> place of the individual, and several other very high level concepts.
>
> You could regard this kind of "cultural memory" as the ultimate level in a
> distributed memory hierarchy. It's a sensorimotor memory as well, as it is
> learned in part by the culture taking actions which explore the
> consequences of the learned attitudes and updating the memory based on that
> exploration. Even within an apparently monolithic culture, there are
> "subcultures" which have variations in these attitudes. So the culture is
> itself a hierarchy, sitting on top of and composed of the collective
> memories of individuals.
>
> The same process is likely behind the development of languages and
> dialects (hierarchy again), mythologies and religions, political ideologies
> and so on.
>
> Regards,
>
> Fergal Byrne
>
>
>
> On Sat, Oct 26, 2013 at 6:08 AM, Jeff Hawkins <[email protected]>wrote:
>
>> Hah! And how many pints did you have before writing this email?****
>>
>> Jeff****
>>
>> ** **
>>
>> *From:* nupic [mailto:[email protected]] *On Behalf Of *Fergal
>> Byrne
>> *Sent:* Friday, October 25, 2013 5:11 PM
>>
>> *To:* NuPIC general mailing list.
>> *Subject:* Re: [nupic-dev] motor implementation****
>>
>> ** **
>>
>> Hi Jeff,****
>>
>> ** **
>>
>> This is great as it's so far not contained in any of your talks or
>> (detailed) writings so far. I've always used the fact that we explore the
>> world in order to learn it as a basis for how we learn it. This is a
>> crucial understanding about the "philosophy of mind" which your theory
>> engenders, and is perhaps misunderestimated (perhaps the greatest
>> contributions to the world given by George Junior Bush) by many scholars of
>> the brain. ****
>>
>> ** **
>>
>> In the same way that Eskimos (in fact don't) have 40 words for snow,
>> Irish people in fact have dozens of words for manipulating truth and
>> reality. That's why we have so many nobel prizes for literature. Irish
>> people excel in doing sensorimotor explorations of linguistic reality. **
>> **
>>
>> ** **
>>
>> My favourite example is the answer to "how much did you have to drink
>> last night?" which leads to the "Irish drink numeral system."****
>>
>> ** **
>>
>> 1) If less than 3 pints you answer "I wasn't out last night"****
>>
>> 2) 3-7 pints you say "A couple"****
>>
>> 3) 7-Your limit you say "a few"****
>>
>> 4) Anything significantly beyond is a "skinful" or a "load of pints"****
>>
>> ** **
>>
>> This is obviously a rather higher level version of what Jeff is talking
>> about, but it's the analogue of bobbing your head to get better parallax
>> vision.****
>>
>> ** **
>>
>> Regards****
>>
>> ** **
>>
>> Fergal Byrne****
>>
>> ** **
>>
>>  ****
>>
>> ** **
>>
>> On Fri, Oct 25, 2013 at 9:52 PM, Jeff Hawkins <[email protected]>
>> wrote:****
>>
>> In every region of the cortex there are cells in Layer 5 that project
>> someplace else in the brain that is related to motor behavior.  (At least
>> in every region people have looked.)  In the classic “motor” regions the
>> layer 5 cells project to muscles, or spinal cord, in vision areas they
>> project to the superior colliculus which controls eye movement, etc.  This
>> tells us that most (if not all) regions of the cortex are playing a role in
>> motor behavior.  This is one reason why I think we can attack the
>> sensorimotor problem by initially modeling a single region.****
>>
>>  ****
>>
>> The axons from the layer 5 motor cells split and send a branch up the
>> cortical hierarchy.  So the “motor command” coming from layer 5 is also an
>> input to the next region.  When the next region learns patterns and makes
>> predictions part of its input is the motor commands that the lower region
>> is sending to motor areas.****
>>
>>  ****
>>
>> One twist is the feedforward motor signal is gated in the thalamus.  So
>> it doesn’t always go to the next region.  Presumably this is controlled as
>> part of attention.****
>>
>>  ****
>>
>> Remember that layer 3 is the primary feedforward inference layer, it is
>> the model for the CLA.  My guess is that layer 5 and layer 3 are entrained
>> by columns and therefore layer 5 is learning a sequence similar to layer 3.
>> ****
>>
>>  ****
>>
>> I believe that the layer 5 cells associatively link to subcortical motor
>> centers.  They are just like the cells in the layer 3 CLA, they represent
>> the state of the system, but they learn how control behavior by
>> association.  I can explain this better but it takes more time than I have
>> now.****
>>
>>  ****
>>
>> I can walk through some simple examples of how a region sits on top of a
>> body which has sensors and innate motor behavior.  The region learns to
>> model the patterns of the body as it interacts with the world through its
>> innate behaviors.  Then the region’s layer 5 cells associatively link to
>> control the innate behavior.  Now the region is able to control behavior.
>> If the region learns complex patterns that result in desirable outcomes it
>> can replay those complex patterns (essentially new complex behaviors) to
>> make the desired outcome happen again.  This is a form of “reinforcement
>> learning”.****
>>
>>  ****
>>
>> Where I am struggling is how to set goals and how the system can adjust
>> its behavior as it plays back learned behaviors.****
>>
>>  ****
>>
>> Jeff****
>>
>>  ****
>>
>> Here is a paper on how the layer 5 neurons split.
>> http://shermanlab.uchicago.edu/files/rwg&sms%20BRR%202010.pdf  ****
>>
>>  ****
>>
>>  ****
>>
>> *From:* nupic [mailto:[email protected]] *On Behalf Of *Chetan
>> Surpur
>> *Sent:* Friday, October 25, 2013 10:36 AM
>> *To:* NuPIC general mailing list.
>> *Cc:* NuPIC general mailing list.
>> *Subject:* Re: [nupic-dev] motor implementation****
>>
>>  ****
>>
>> Hi Jeff,****
>>
>>  ****
>>
>> Just as briefly, would you mind describing from how region 5 helps
>> accomplish this? Unless you want to save it as a surprise for your talk :)
>> ****
>>
>>  ****
>>
>> Thanks,****
>>
>> Chetan****
>>
>>  ****
>>
>> On Fri, Oct 25, 2013 at 10:30 AM, Jeff Hawkins <[email protected]>
>> wrote:****
>>
>> Aseem,
>> I will be giving an informal talk on this topic at the next hackathon,
>> but
>> in brief, very brief...
>>
>> The CLA today has no motor component. It is like an ear listening to
>> sounds
>> but with no ability to interact with the world. Most sensory perception
>> is
>> not like that. Most of the changes on our sensors come from our own
>> actions. Imagine standing in a house. If your eyes couldn't move and your
>> body couldn't move you would not be able to learn what the house is like.
>> You couldn't learn the patterns in the world. Only be moving do you
>> discover the structure of the house. Movement leads to sensory changes.
>> The brain learns sensorimotor patterns. "when I see this and turn left I
>> will see that". The same is true for touch. Even hearing is largely
>> controlled by our own motions. The only thing I am hearing right now is
>> the
>> sounds of the keys on my keyboard. My cortex is predicting to hear those
>> sounds. If they changed even slightly I would notice the difference.
>>
>> Motor behavior is how we learn most of the structure of the world.
>> Jeff
>>
>> -----Original Message-----
>> From: nupic 
>> [mailto:[email protected]<[email protected]>]
>> On Behalf Of Aseem
>> Hegshetye
>> Sent: Friday, October 25, 2013 5:37 AM
>> To: [email protected]
>> Subject: [nupic-dev] motor implementation
>>
>> Hi,
>> Jeff Hawkins said he is working on sensorimotor design.
>> How will implementation of motor layer 5 help in data prediction.
>> Would it be like CLA signalling anomalies like cortex gives motor
>> commands
>> or are you planning on manipulating some parameters at user end based on
>> the
>> predictions from given inputs.
>> thanks
>> Aseem Hegshetye
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org ****
>>
>>  ****
>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org****
>>
>>
>>
>> ****
>>
>> ** **
>>
>> -- ****
>>
>>
>> Fergal Byrne****
>>
>> ** **
>>
>> Brenter IT****
>>
>> [email protected] +353 83 4214179****
>>
>> Formerly of Adnet [email protected] http://www.adnet.ie****
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
>
> --
>
> Fergal Byrne
>
> <http://www.examsupport.ie>Brenter IT
> [email protected] +353 83 4214179
> Formerly of Adnet [email protected] http://www.adnet.ie
>



-- 

Fergal Byrne

<http://www.examsupport.ie>Brenter IT
[email protected] +353 83 4214179
Formerly of Adnet [email protected] http://www.adnet.ie
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to