Dear Scott san,
Thank you for your explanation. Now it becomes clear to me that:
- CLA has two ways to reverse predicted SDR to the original values;
reconstruction and classifier
- Reconstruction is obsoleted, since classifier outperformes it.
I would love to see the contents of the wiki pages.
My current idea for the reverse logic, which I will play with, is to label
the distal dendrite segements
with the input values to the corresponding columns (I'm aware distal
segments belong to cells but not
columns). When we want to reverse predicted SDR, we count how many cells
claim what input values.
The most popular values will be the predicted one. The process is like
"sort -k<label>|uniq -c|sort -g".
So I'm guessing that the above logic may be the same of either
reconstruction or classifier. :-)
While playing with SP, I noticed that, when learning proceeds, the number
of connected synapses
of a column decreases. It is like connected synapses is getting leaner,
and a column can
capture the feature (i.e. its input pattern) more efficiently with less
connected synapses.
So, now it makes sense to me that the optimization explained in another
thread works great,
by which CLA brings all connected synapses at front in the potential
synapse list.
Though we have 256 synapses for one segment, we only need to process
connected synapses
for most of cases. AND the number of connected synapses will decrease as
learning proceeed.
That is really neat.
Then, once statistics of input data changes, the number of connected
synapses will drop
below connectedPerm threshold as getting too lean?, like a zombie segment?
Then, boosting will kick in and nourish the dead column to revive as an
active column,
which is the reclamation of unused columns.
This is my current understanding. Maybe I misunderstood CLA's strategy
behind.
Please correct me for whatever small or large mistakes. :)
Best Regards,
Hideaki Suzuki.
2013/7/16 Scott Purdy <[email protected]>
>
>
>
> On Thu, Jul 4, 2013 at 11:08 PM, Hideaki Suzuki <[email protected]>wrote:
>
>> Dear Jeff san and Scott san,
>>
>> Thank you very much for your kind reply with detailed answers. :)
>> I'm sorry that I couldn't respond sooner.
>>
>>
>> I understood. It is interesting to me that CLA can reverse-map from an
>> SDR to the original values even after all input values are concatenated for
>> the region and the fanout area spans the entire input region.
>>
>
> Are you referring to converting predicted cells back to predicted values?
> If so, we call that process
> reconstruction<https://github.com/numenta/nupic/wiki/Reconstruction> (I
> realize there isn't much info at the link but wanted to point out where the
> info will be once it is public) and do not use it currently. Reconstruction
> basically takes the predicted columns and chooses the input bits they are
> connected to and the encoders each have a method for determining the value
> given the chosen bits. This description might be slightly off but that is
> the general idea with reconstruction.
>
> However, we currently don't do that. Instead, we have the CLA
> classifier<https://github.com/numenta/nupic/wiki/CLA-Classifier>that sits on
> top of the TP and converts the predicted cells to a predicted
> value. This isn't a biologically-inspired component; rather, it is a
> practical method for turning predicted cells into a predicted value and it
> works better than reconstruction for our problems.
>
>>
>> Besides, another thought is, since the entire input area is connected
>> to each column in Grok, I guess the inhibition radius would also become the
>> entire region, which reduces the inhibition logic by a little bit (we can
>> ignore the concept of neighbors). Am I right?
>>
>
> Yes this is correct to my knowledge.
>
>>
>> For the memory usage, thank you for the information. Scott
>> san's explanation sounds reasonable to me.
>> Only 1024 columns (i.e. a very small 32x32 region) with 5% active ones
>> can represent C(1024 51) different patterns, which is 609
>> septenvigintillion(an 87-digit number) according to Wolfram|Alpha. So,
>> most memory must be used for connections and the mappings between
>> input values and SDRs. I'm thinking of evaluating how well it can compress
>> temporal and spatial pattern data into an HTM region, after I can feel
>> confortable with my CLA gadget.
>>
>> Best Regards,
>> Hideaki Suzuki.
>>
>> 2013/7/2 Jeff Hawkins <[email protected]>
>>
>>> *I will try to answer some more. My answers are preceded by >>*
>>>
>>>
>>> - A perceptron can relate various input to various output.****
>>>
>>> People have used a perceptron to convert electrical signals along
>>> the arm muscles to the input for motors,****
>>>
>>> so those who have lost an arm can move artificial arms by the image
>>> of moving their lost arms (you may know).****
>>>
>>>
>>> CLA looks similar by some extent. Learning the electrical signal
>>> input with the output for the motors, and predict.****
>>>
>>> Do you have any detailed comparisons between CLA and perceptron?****
>>>
>>> Is it good to combine them? e.g. can perceptron be a good
>>> classifier to convert SDR to the original value?****
>>>
>>> ** **
>>>
>>> >> Perceptrons are a very old and simple form of neural network. You
>>> don’t hear that term much these days. There are other more modern neural
>>> networks including the currently popular “deep learning networks”. Almost
>>> all neural networks are spatial pattern classifiers. That means they have
>>> no ability to recognize time-based patterns. The CLA learns both spatial
>>> and temporal patterns and therefore it is hard to compare CLA to other
>>> neural networks unless you ignore time. Other differences are the CLA is
>>> an online learning algorithm, the CLA can handle many different types of
>>> inputs, and the CLA learns unsupervised. Other neural networks may have
>>> some of these attributes but most don’t. The CLA is also a biological
>>> theory where most artificial neural networks are not really.****
>>>
>>>
>>> - With boosting, we can have different SDR for the same input, after
>>> feeding the data again and again.****
>>>
>>> If true, we may have multiple SP SDRs for a single input pattern.
>>> Is my understanding okay?****
>>>
>>> If true, is this related to Gestaltzerfall? Possible?****
>>>
>>> ** **
>>>
>>> Yes the same input could have different representations. Not just due
>>> to boosting but also because the active input bit connections for each
>>> column can change over time.****
>>>
>>> ****
>>>
>>> - Do you have a specific reason that boosting is done by
>>> multiplication in the white papaer, rather than addition?****
>>>
>>> >> It might not be important.****
>>>
>>> ****
>>>
>>> - If a column happens to drop all bottom up synapses, can it become
>>> active again, without boosting?****
>>>
>>> ** **
>>>
>>> Yes it can. That is the whole point of boosting :) And when is does
>>> become active again, it will start to form new connections.****
>>>
>>> ****
>>>
>>> - Should we have fanout areas on the input space overlapped? If yes,
>>> how much should we?****
>>>
>>> If we have two columns next to each other, and the two corresponding
>>> fanout areas, connected to them,****
>>>
>>> are overlapped by 50% of radius of the fanout size vertically /
>>> horizontally, one fanout area is overlapped by****
>>>
>>> the other eight fanout areas connected to the columns around in 2D.
>>> This means, one input bit can affected****
>>>
>>> nine columns. If we have more overlaps in fanout areas, one input
>>> bit can affect more number of columns.****
>>>
>>> At extreme, any input pattern will generate the same intensity to
>>> all columns.****
>>>
>>>
>>> Is my understanding okay? Probably, I'm missing something.****
>>>
>>> >> Typically the fan out and fan in areas overlap. How much is
>>> dependent on the statistics of the data. In Grok all columns get input
>>> from the entire input area, but in a vision application you wouldn’t do
>>> this. We use the concept of “potential synapses” where are the cells that
>>> can potentially connect. Normally the potential synapses are a subset of
>>> all cells within a radius, say 50%. So even if two columns receive input
>>> from the entire input space, their 50% of cells within that space are
>>> different. So you never have the exact same input to two columns.****
>>>
>>> ** **
>>>
>>> For the time series problems we do right now we basically concatenate
>>> the encodings from each field. Since there isn't a logical fanout and
>>> overlapping pattern for the potential input bit connections for each
>>> column, we instead randomly select 50% (I think) of the input bits for each
>>> column as the possible connected****
>>>
>>> ** **
>>>
>>> Our scalar encoders use overlapping sets of bits to represent each
>>> bucket (range of the input space) which captures the semantics of scalars
>>> (values closer together will have more overlap).****
>>>
>>>
>>> - Do you have any data or analysis in memory usage?****
>>>
>>> i.e. how much memory CLA uses to lean what it learn.****
>>>
>>> ** **
>>>
>>> We have done some pretty thorough analysis of this. Perhaps we can put
>>> some more in depth information in the wiki but as a quick estimate you can
>>> expect an untrained model to be pretty small and it will grow to be several
>>> MB in memory as it gets more saturated. I am not sure what the max size
>>> is. Also, if you predict 3 different steps, the classifier will be 3x
>>> larger so that can have a pretty big impact on memory usage.
>>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org