Re: [nupic-discuss] Benchmarking CLA: information capacity of Spatial Pooler

Doug King Sun, 24 Nov 2013 11:59:07 -0800

Subutai,

I think you are right. Kanerva uses hamming distance in his calculation,
and if hamming distance is zero the SDR behaves as a deterministic RAM
memory.


As for discussing semantic similarity, he does not mention semantic mapping
onto the SDR or how to measure similar semantic maps explicitly, but I
think the info one would need to do that is implicit in the calculations he
makes for measuring capacity. I say 'i think' because my higher level maths
are lacking. Can you analyze SDR capacity without talking about semantic
mapping? Would we need another set of analysis / formulas for measuring
semantic capacity but build on Kanerva's SDR work? Perhaps for our CLA
specific way of doing semantic storage / retrieval we need our own
definitions / formulas to measure capacity and retrieval success for sparse
and semantic representation. I'm sure the academics will be interested in
seeing that, but as a practical matter it would be useful now to be able to
calculate before setting up a model or deciding to size hardware.

This will also be important when discussing the value preposition for
creating cheap silicon chips that have some faulty bits or connections -
the CLA would degrade gracefully but still function. If you can eliminate
the costs associated with making perfect silicon (clean fabs, high
tolerances, etc.) you could get chip prices way down. For a cheap 'Furby'
like toy you might be ok with many faulty bits and a $1 piece silicon. The
question then would be, how do you calculate capacity and retrieval success
when you introduce some faulty bits?




On Sat, Nov 23, 2013 at 5:26 PM, Subutai Ahmad <[email protected]> wrote:

> Hi Doug,
>
> Thanks for the pointer. I like Kanerva’s work a lot. It is foundational
> stuff and I would like our work to build on that theory. I believe that
> particular section demonstrates a different type of capacity, i.e. how many
> patterns you can store simultaneously in an SDR. That is more related to
> this discussion on capacity:
>
>
> http://lists.numenta.org/pipermail/nupic_lists.numenta.org/2013-November/001823.html
>
> To my knowledge, I don’t think Kanerva really considered the notion of
> semantic similarity in his work. This is very important for CLA’s. Also he
> tends to use Hamming distance which is incorrect. I believe overlap
> distance is the right metric to use for CLA’s. I don’t know his work that
> well though. Are my statements true?
>
> —Subutai
>
>
>
>
> On Sat, Nov 23, 2013 at 8:44 AM, Doug King <[email protected]> wrote:
>
>> A lot of the theoretical work on capacity has been done by Kanerva and
>> can be referenced here:
>> http://www.rni.org/kanerva/sdmchapter-text.pdf
>> See the section - 3.3.7. Memory Capacity.
>>
>>
>> On Sat, Nov 23, 2013 at 6:32 AM, Fergal Byrne <
>> [email protected]> wrote:
>>
>>> Hi Marek,
>>>
>>> We had someone looking at this recently, they made a good stab at
>>> modelling this kind of capacity question, but I thought there would be a
>>> better way to do this.
>>>
>>> We should use the standard 2K columns for reference. There are 2.4 *
>>> 10^84 possible 2% activation patterns with a 2K region (this does not
>>> include the cells per column, which for 32 gives 5.5 * 10^144 patterns!).
>>>
>>> As we discussed before, we should think in terms of confidence when
>>> evaluating capacity. In other words, if we are checking to see if a pattern
>>> is represented, what is the probability that it appears by chance rather
>>> than actually stored? To calculate this, assume the pattern is not stored,
>>> and now calculate how often it will appear as a result of the combination
>>> of bits from other stored patterns. This will give us an indication of the
>>> number of patterns which need to be stored to give us a certain probability
>>> of seeing a false pattern.
>>>
>>> Let's assume we want 95% confidence that a pattern is real. This means
>>> that 5% of the time the pattern is created by chance (ie the other patterns
>>> happen to produce all the bits in our pattern). How many patterns need to
>>> be there (ie turning on all their bits) for our bits to be on 5% of the
>>> time?
>>>
>>> We assume that each bit is equally likely to be on (this is probably
>>> wrong in practice, but we'll need to make the assumption). Then, 1/2048 of
>>> the patterns will include any given bit. In other words, 2047/2048 of the
>>> time you store a pattern, the bit is off (and is on only if our pattern is
>>> the cause).
>>>
>>> So, let's say we start of with no patterns stored, and we add patterns
>>> at random until there is a 5% chance we have a false match with a given
>>> pattern. This is the number of patterns which represents the storage
>>> capacity limit at 95% confidence.
>>>
>>> At the beginning, there are no patterns so the probability is 0. For
>>> each pattern we add, there is an additional 1/2048 chance that a given bit
>>> has been switched on by now. So, after (0.05 / (1/2048)) = 102.4 additions,
>>> there is a 5% chance that the bit is on. Assuming independence of bits
>>> (again a big assumption), we'd need 39 further sets of such trials before
>>> we had a 5% chance of all the bits being on. This is a total of 4096
>>> patterns which would need to simultaneously appear before our pattern has a
>>> 5% chance of showing up.
>>>
>>> To generalise, we have N columns and a fraction n are turned on. We want
>>> to have a fraction p confidence that our pattern is really present. The
>>> number of patterns you have to add to make a given pattern appear is then:
>>>
>>> (1-p) * N * (n * N) = n * N^2 * (1-p)
>>>
>>> The capacity of the SDR is thus quadratic in the number of columns, and
>>> is proportional to both the sparcity (1/n) and the error tolerance (1-p).
>>>
>>> Here's a plot of the capacity at 90%, 95% and 99% confidence levels
>>> versus region size:
>>>
>>> [image: Inline image 1]
>>>
>>> Here's the Mathematica code which generated that:
>>>
>>>  cap[N_, p_, n_] := n * N^2 * (1 - p)
>>>
>>> Plot[Evaluate[Table[cap[N, p, .02], {p, {0.9, 0.95, 0.99}}]], {N, 512,
>>> 2048},
>>>  Filling -> Axis,
>>>  PlotLegend -> {"90%", "95%", "99%"}, LegendPosition ->  {1, -0.0},
>>>  LegendShadow -> None, LegendBorder -> None,
>>>  AxesLabel -> {"Columns", "Patterns"}, GridLines -> Automatic]
>>> Regards,
>>>
>>> Fergal Byrne
>>>
>>>
>>>
>>>  On Fri, Nov 22, 2013 at 2:55 PM, Marek Otahal <[email protected]>wrote:
>>>
>>>>  Guys,
>>>>
>>>> I want to run some benchmarks on the CLA, one of which includes what I
>>>> called (information) capacity.
>>>>
>>>> This is #number of patterns a spatial pooler (SP) (with a fixed number
>>>> of columns) (and probably fixed number of training rounds) can distinguish.
>>>>
>>>> So assuming I have a SP with 1000 columns and 2% sparsity (=20 cols ON
>>>> at all times) and an encoder big enough to express larege range of patterns
>>>> (say scalar encoder for 0...1.000.000.000).
>>>>
>>>> The top cap is (100 choose 20) which is some crazy number of 5*10^20.
>>>> All these SDRs will be sparse, but not distributed (right??) because a
>>>> change in one bit will already be another pattern.
>>>>
>>>> So my question is, what is the "usable" capacity where all outputs are
>>>> still sparse (they all are) and distributed (=robust to noice). Is there a
>>>> percentage of bits (say 20% bits chaotic and still recognizes the pattern
>>>> still considered distributed/robust?)
>>>>
>>>>
>>>> Or is it the other way around and the SP tries to maximize this
>>>> robustnes for the given number of patterns it is presented? I if I feed it
>>>> huge number of patterns I'll pay the obvious price of reducing the border
>>>> between two patterns?
>>>>
>>>> Either way, is there a reasonable way to measure what I defined a
>>>> capacity?
>>>>
>>>> I was thinking like:
>>>>
>>>> for 10 repetitions:
>>>>    for p in patterns_to_present:
>>>>       sp.input(p)
>>>>
>>>> sp.disableLearning()
>>>> for p in patterns_to_present:
>>>>    p_mod = randomize_some_percentage_of_pattern(p, percentage)  # what
>>>> should the percentage be? see above
>>>>    if( sp.input(p) == sp.input(p_mod):
>>>>         # ok, it's same, pattern learned
>>>>
>>>>
>>>> Thanks for your replies,
>>>> Mark
>>>>
>>>>
>>>> --
>>>> Marek Otahal :o)
>>>>
>>>> _______________________________________________
>>>> nupic mailing list
>>>> [email protected]
>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Fergal Byrne, Brenter IT
>>>
>>> <http://www.examsupport.ie>http://inbits.com - Better Living through
>>> Thoughtful Technology
>>>
>>> e:[email protected] t:+353 83 4214179
>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-discuss] Benchmarking CLA: information capacity of Spatial Pooler

Reply via email to