Re: [nupic-discuss] Benchmarking CLA: information capacity of Spatial Pooler

Subutai Ahmad Sat, 23 Nov 2013 17:26:36 -0800

Hi Doug,

Thanks for the pointer. I like Kanerva’s work a lot. It is foundational
stuff and I would like our work to build on that theory. I believe that
particular section demonstrates a different type of capacity, i.e. how many
patterns you can store simultaneously in an SDR. That is more related to
this discussion on capacity:


http://lists.numenta.org/pipermail/nupic_lists.numenta.org/2013-November/001823.html

To my knowledge, I don’t think Kanerva really considered the notion of
semantic similarity in his work. This is very important for CLA’s. Also he
tends to use Hamming distance which is incorrect. I believe overlap
distance is the right metric to use for CLA’s. I don’t know his work that
well though. Are my statements true?

—Subutai




On Sat, Nov 23, 2013 at 8:44 AM, Doug King <[email protected]> wrote:

> A lot of the theoretical work on capacity has been done by Kanerva and can
> be referenced here:
> http://www.rni.org/kanerva/sdmchapter-text.pdf
> See the section - 3.3.7. Memory Capacity.
>
>
> On Sat, Nov 23, 2013 at 6:32 AM, Fergal Byrne <[email protected]
> > wrote:
>
>> Hi Marek,
>>
>> We had someone looking at this recently, they made a good stab at
>> modelling this kind of capacity question, but I thought there would be a
>> better way to do this.
>>
>> We should use the standard 2K columns for reference. There are 2.4 *
>> 10^84 possible 2% activation patterns with a 2K region (this does not
>> include the cells per column, which for 32 gives 5.5 * 10^144 patterns!).
>>
>> As we discussed before, we should think in terms of confidence when
>> evaluating capacity. In other words, if we are checking to see if a pattern
>> is represented, what is the probability that it appears by chance rather
>> than actually stored? To calculate this, assume the pattern is not stored,
>> and now calculate how often it will appear as a result of the combination
>> of bits from other stored patterns. This will give us an indication of the
>> number of patterns which need to be stored to give us a certain probability
>> of seeing a false pattern.
>>
>> Let's assume we want 95% confidence that a pattern is real. This means
>> that 5% of the time the pattern is created by chance (ie the other patterns
>> happen to produce all the bits in our pattern). How many patterns need to
>> be there (ie turning on all their bits) for our bits to be on 5% of the
>> time?
>>
>> We assume that each bit is equally likely to be on (this is probably
>> wrong in practice, but we'll need to make the assumption). Then, 1/2048 of
>> the patterns will include any given bit. In other words, 2047/2048 of the
>> time you store a pattern, the bit is off (and is on only if our pattern is
>> the cause).
>>
>> So, let's say we start of with no patterns stored, and we add patterns at
>> random until there is a 5% chance we have a false match with a given
>> pattern. This is the number of patterns which represents the storage
>> capacity limit at 95% confidence.
>>
>> At the beginning, there are no patterns so the probability is 0. For each
>> pattern we add, there is an additional 1/2048 chance that a given bit has
>> been switched on by now. So, after (0.05 / (1/2048)) = 102.4 additions,
>> there is a 5% chance that the bit is on. Assuming independence of bits
>> (again a big assumption), we'd need 39 further sets of such trials before
>> we had a 5% chance of all the bits being on. This is a total of 4096
>> patterns which would need to simultaneously appear before our pattern has a
>> 5% chance of showing up.
>>
>> To generalise, we have N columns and a fraction n are turned on. We want
>> to have a fraction p confidence that our pattern is really present. The
>> number of patterns you have to add to make a given pattern appear is then:
>>
>> (1-p) * N * (n * N) = n * N^2 * (1-p)
>>
>> The capacity of the SDR is thus quadratic in the number of columns, and
>> is proportional to both the sparcity (1/n) and the error tolerance (1-p).
>>
>> Here's a plot of the capacity at 90%, 95% and 99% confidence levels
>> versus region size:
>>
>> [image: Inline image 1]
>>
>> Here's the Mathematica code which generated that:
>>
>>  cap[N_, p_, n_] := n * N^2 * (1 - p)
>>
>> Plot[Evaluate[Table[cap[N, p, .02], {p, {0.9, 0.95, 0.99}}]], {N, 512,
>> 2048},
>>  Filling -> Axis,
>>  PlotLegend -> {"90%", "95%", "99%"}, LegendPosition ->  {1, -0.0},
>>  LegendShadow -> None, LegendBorder -> None,
>>  AxesLabel -> {"Columns", "Patterns"}, GridLines -> Automatic]
>> Regards,
>>
>> Fergal Byrne
>>
>>
>>
>>  On Fri, Nov 22, 2013 at 2:55 PM, Marek Otahal <[email protected]>wrote:
>>
>>>  Guys,
>>>
>>> I want to run some benchmarks on the CLA, one of which includes what I
>>> called (information) capacity.
>>>
>>> This is #number of patterns a spatial pooler (SP) (with a fixed number
>>> of columns) (and probably fixed number of training rounds) can distinguish.
>>>
>>> So assuming I have a SP with 1000 columns and 2% sparsity (=20 cols ON
>>> at all times) and an encoder big enough to express larege range of patterns
>>> (say scalar encoder for 0...1.000.000.000).
>>>
>>> The top cap is (100 choose 20) which is some crazy number of 5*10^20.
>>> All these SDRs will be sparse, but not distributed (right??) because a
>>> change in one bit will already be another pattern.
>>>
>>> So my question is, what is the "usable" capacity where all outputs are
>>> still sparse (they all are) and distributed (=robust to noice). Is there a
>>> percentage of bits (say 20% bits chaotic and still recognizes the pattern
>>> still considered distributed/robust?)
>>>
>>>
>>> Or is it the other way around and the SP tries to maximize this
>>> robustnes for the given number of patterns it is presented? I if I feed it
>>> huge number of patterns I'll pay the obvious price of reducing the border
>>> between two patterns?
>>>
>>> Either way, is there a reasonable way to measure what I defined a
>>> capacity?
>>>
>>> I was thinking like:
>>>
>>> for 10 repetitions:
>>>    for p in patterns_to_present:
>>>       sp.input(p)
>>>
>>> sp.disableLearning()
>>> for p in patterns_to_present:
>>>    p_mod = randomize_some_percentage_of_pattern(p, percentage)  # what
>>> should the percentage be? see above
>>>    if( sp.input(p) == sp.input(p_mod):
>>>         # ok, it's same, pattern learned
>>>
>>>
>>> Thanks for your replies,
>>> Mark
>>>
>>>
>>> --
>>> Marek Otahal :o)
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>>
>> --
>>
>> Fergal Byrne, Brenter IT
>>
>> <http://www.examsupport.ie>http://inbits.com - Better Living through
>> Thoughtful Technology
>>
>> e:[email protected] t:+353 83 4214179
>> Formerly of Adnet [email protected] http://www.adnet.ie
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-discuss] Benchmarking CLA: information capacity of Spatial Pooler

Reply via email to