I found it a little difficult to follow all the points made by Kevin and
Fergal but I can try to add a few observations that hopefully add clarity.

Sparse just means that a small percentage of neurons are active at any point
in time.  As far as I know this is a universal observation and I am not
aware of anyone claiming otherwise anywhere in neocortex.  Is any
neuroscientist suggesting that activations are not sparse anywhere in
neocortex?

A separate issue is what does each neuron represent. In HTM and the CLA we
adopt the position that each neuron learns to have some "semantic" meaning.
This is not a new idea, we adopted it because it makes a lot of theoretical
sense and is supported empirically.

The neuron representations do not need to be orthogonal.  The literature on
sparse coding generally assumes that the neurons form an over-complete basis
set, meaning the neuron's meanings are not orthogonal.  It would be nearly
impossible for messy brain tissue to ensure orthogonality and it isn't even
desirable.

If you drop the sparse constraint then nothing would work.  Dense codes
would overlap even when the objects they represent are not similar.  It then
becomes necessary to look at all the bits to detect a pattern.  Real neurons
can't do this and if some neurons failed you would get bad results.  We
could make the CLA work with dense codes but I predict it would become
brittle and lose its generalization properties.

I believe that sparse activations and semantic meaning are both essential.
If this is not clear or you disagree then it would be helpful to know why
you feel this way.

The meaning of any particular neuron can be very difficult to determine.  I
believe you are saying that both Gallant and Newsome record from neurons
where they can't detect what an individual neuron "represents" but if they
look at an ensemble of active neurons they can correlate that ensemble to
something in the animal's input.  This is not inconsistent with the way the
CLA defines SDRs.  Recall that each neuron learns a spatial/temporal/motor
feature of the world.  It can be nearly impossible to detect what a neuron
represents by presenting various stimuli to an animal while recording from a
neuron.  This is especially true the further up the hierarchy you go.
Gallant has spent a lifetime trying to do this for region V4.  He usually
presents static images to a monkey ignoring time and motor.  I have talked
to Gallant about this and he agrees that it is a problem but there isn't
much he can do about it.  He can only record from a cell for brief period of
time and there is a gigantic space of possible patterns he can expose the
animal to.  Plus there is no known way to include motor behavior.

Here is an example from V1.  Scientists present static image patches to an
animal and determine what a cell responds best to.  They label this the
"receptive field" of the cell, such as a line at a particular orientation in
a particular part of the visual field.  The experiment is highly repeatable
so they are pretty sure they know what this cell likes.  They now show a
movie of natural scenes to the animal.  In this movie there are times when
the retina is exposed to a pattern that exactly matches the receptive field
of the cell.  However, the cell only responds to this input occasionally.
And when it does respond, the size of the receptive field is much larger
than measured before.  The true receptive field of this cell cannot be
determined by a static image or a moving gradient.  The true receptive field
requires moving natural images.  Now imagine a cell in V4.  Its true
receptive field involves not only spatial and temporal patterns but also
motor behavior.  It is nearly impossible to discover what the cell responds
to by presenting static images.  It still represents something but we can't
determine what it is.

Now for an advanced CLA topic that I don't think we have described anywhere
before.  If you make synapses forget more slowly but keep the incrementing
rate the same, then a column will learn to respond to more than one spatial
pattern.  A bit in our SDR will have two different meanings.  Sometimes it
means A and sometimes it means B.  When A happens it starts to forget B and
when B happens it starts to forget A but the forgetting doesn't happen fast
enough to eliminate A or B.  We have tested this extensively and it works
quite well.  It increases the number of unique things your SDRs can
represent, but comes with a slight increase in the possibility of making an
improper match in the temporal pooler.  I have no idea if this phenomenon
happens in real brains but I don't see why it couldn't.  If this were
happening in parts of the cortex then it would make it even harder to learn
the receptive field of a cell.  It would also show why looking at an
ensemble of neurons would be unambiguous.

- Jeff

---------- Forwarded message ----------
From: Archie, Kevin <[email protected]>
Date: Wed, Oct 16, 2013 at 8:02 AM
Subject: Re: [nupic-dev] sparse input, really?
To: "NuPIC general mailing list." <[email protected]>


Fergal,

Thanks for the (almost alarmingly) quick and thoughtful reply. I'll have to
think about it all some more.
 
  - Kevin

On Oct 16, 2013, at 9:48 AM, Fergal Byrne wrote:

Hi Kevin,

Good question. I'm pretty sure that SDR is a crucial central idea in Jeff's
theory, but let's leave that aside for now.

There's another way of looking at your question, inverting it so to speak.
Perhaps we have big brains because we need the sparseness in order to
process information the way we do. Certainly in NuPIC there seems to be a
threshold of 5-600 columns (of the otherwise typical
size) in order to have the Spatial Pooler and the Temporal Pooler work
really well. Below this size the sparseness is hard to establish, and the TP
hasn't enough active connections to operate well. Above this size the
capacity is soon so high that it's hard to bang up against it.

The model we use in NuPIC is binary (active or not, connected or not, etc),
and timestep-based. These are simplifications of real neurons, which have
many signalling styles and which operate asynchronously.
Given the limitations of measurement of living neocortex, it's unlikely that
you could capture the true neuron-by-neuron, millisecond-by-millisecond
"network traffic" so it's hard to claim that the signalling is either very
sparse or very dense.

The observation of neurons carrying "several signals" may be explained by a
high rate of change of input, inhibition and intra-region activity, which
could cause some overlap in the apparent state of each neuron at any given
time. The evolving shape of the signal in this case could be said to encode
part of the data. The analogue of this in HTM/CLA is a specific sequence of
SDR's, each of which could be regarded as a freeze-frame of the pattern of
activity across the region. In the CLA (unlike most other networks), the
sequences are just as important as the individual patterns.

Looked at in this way, any fast-changing series of SDR's would appear
"dense" if measured at intervals significantly longer than the "timestep" of
the sequence. So, it's possible that these observations are not in
contradiction to the HTM/CLA theories, but are the result of the method of
measurement.

We do know that inhibitory interneurons act to sparsify activation patterns,
and we have empirical evidence that (the computational analogue of) this is
key to getting NuPIC to work. This is why we believe the representations are
sparse. Any argument in favour of non-sparseness should therefore have one
or both of a neuroscientific and computational basis.

Regarding orthogonality, you could view an SDR as being a multi-bit (or
fuzzy) "dense(r) orthogonal" representation, if you observe that
closely-related inputs give rise to closely-matching outputs. If you
"wiggle" the inputs one input field at a time, and combine the active bits
in the SDR's, you can see that each such set of bits represents a multi-bit
representation of some aspect of the input. Any higher-level columns
"seeing" these bits will be detecting a similar value of a similar feature.
Unlike with the other networks you mention, the "orthogonality" is going to
be learned by the SP process, rather than imposed by the structure of the
inputs.

Regards,

Fergal Byrne




On Wed, Oct 16, 2013 at 3:04 PM, Archie, Kevin <[email protected]>
wrote:
>
> I've been sitting on this question for a while, and it came to mind again
a couple of days ago when I heard Jack Gallant talk about some work by his
student Alex Huth. He was showing multiple simultaneous recordings from
prefrontal cortex (I think) and each neuron was carrying several signals,
that (paraphrasing roughly) couldn't be extracted by looking at individual
neurons but could be teased out by extracting components from the network
activity. (John Maunsell and Bill Newsome also gave talks that similarly
showed single neurons firing in response to lots of things, and pulling out
the meaning required the context of the network.) The sense I was getting:
this is not sparse coding.
>
> In traditional neural network models (Hopfield-ish associative memories,
perceptron networks and the like), generally what you need is not really
sparseness but orthogonality. Sparseness is one way to get that, but it's a
space-time tradeoff: you can often build a sparse representation quickly if
you have plenty of space. There are other ways to get orthogonality, and a
dense representation would be making a different tradeoff -- and big brains
being metabolically expensive, space is a nontrivial constraint. A
speculation I heard some years ago (and I wish I remember from whom; Google
yields some echoes but no clear origin) is that the hippocampus and
entorhinal cx are busy during sleep building more compact orthogonal
representations of the day's input for use by higher association areas.
>
> Pretty clearly the sensory periphery uses sparse representations, and
similarly for areas with really-motor motor outputs. (Extreme example: V1
certainly uses sparse representation. V1 is really freakin' big.) Probably
some sparse representations persist in, say, anterior temporal, parietal,
and frontal cx, but I would suspect that compact orthogonal representations
would be important in higher (and smaller) areas. Of course, my suspicions
are not evidence, I'm ten years mostly away from the neurophysiology
literature, and data beats my speculations. Is there direct evidence that
higher cortical areas traffic exclusively or primarily in sparse
representations?
>
> That's the brain theory side. On the more immediately practical side: has
anyone tried using compact orthogonal representations with NuPIC? Any
success (or failure) stories? I don't even have a guess to what extent SDR
is necessary versus just customary.
>
> Thanks,
>
>   - Kevin
>
> p.s. Apologies for the theoretical bent of this question. Too many years
hanging out in universities have left me tending to think too much rather
than just getting started.
>
> ________________________________
>
> The material in this message is private and may contain Protected
Healthcare Information (PHI). If you are not the intended recipient, be
advised that any unauthorized use, disclosure, copying or the taking of any
action in reliance on the contents of this information is strictly
prohibited. If you have received this email in error, please immediately
notify the sender via telephone or return mail.
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org




--

Fergal Byrne

Brenter IT
[email protected] +353 83 4214179 Formerly of Adnet
[email protected] http://www.adnet.ie
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org



_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org


_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to