Re: [nupic-discuss] Benchmarking CLA: information capacity of Spatial Pooler

Scott Purdy Fri, 22 Nov 2013 15:26:30 -0800

On Fri, Nov 22, 2013 at 2:53 PM, Marek Otahal <[email protected]> wrote:


> Hi Scott, thank you for the response!
>
>
> On Fri, Nov 22, 2013 at 11:00 PM, Scott Purdy <[email protected]> wrote:
>
>>
>> On Fri, Nov 22, 2013 at 6:55 AM, Marek Otahal <[email protected]>wrote:
>>
>>> Guys,
>>>
>>> ...
>>> The top cap is (100 choose 20) which is some crazy number of 5*10^20.
>>> All these SDRs will be sparse, but not distributed (right??) because a
>>> change in one bit will already be another pattern.
>>>
>>
>> The number of possible unique SP outputs is (1000 choose 20), or ~10^41.
>>
>
> Yes, I missed one zero there.
>
>
>> These are all 2% sparsity. Changing one input bit doesn't necessary
>> result in a different SP output though. There could be many more input bit
>> patterns than combinations of 20 SP columns. For instance, 1000 input bits
>> have 10^300 possible patterns. And regardless of that, the semantic
>> information learned by the SP is distributed across the 1000 columns so it
>> would still be distributed.
>>
>
> I wasn't clear there - I was thinking top-down, so a 1bit change to the
> SP's output would result in representation of a different learned input
> pattern. This raises a question: the "robustness"-feature from SDRs, is it
> related to the input bits (I mess with some of the input bits and still
> expect to get same SP's output), or to the output ON bits? where when I
> have representations be (1000 choose 20), so even if I flip 3-5 of the
> output bits, there's still a good chance the result is the closest to my
> original input, and not some other input? And is "robust"=="distributed"?
> Or distributed means 2^1000 states are represented by (only) (1000 choose
> 20) states?
>

Distributed means the semantics are distributed across the bits. This means
it is tolerant to noise and that you can subsample the bits and still have
a very similar semantic representation.

>
>
>>
>>> So my question is, what is the "usable" capacity where all outputs are
>>> still sparse (they all are) and distributed (=robust to noice). Is there a
>>> percentage of bits (say 20% bits chaotic and still recognizes the pattern
>>> still considered distributed/robust?)
>>>
>>
>> This is still a valid question for real world datasets but is completely
>> dependent on the particular dataset. For instance, regardless of the SP
>> parameters, the dataset may have 10000 input bits but only ~50 of them
>> change regularly. The tolerance to noise at this point is limited by the
>> dataset.
>>
>
> Nice point, I haven't considered that. And assuming all of the bits carry
> information? Which is what I believe happens at the higher levels of the
> regions (?) - the useless data is cropped out. Do we have some data (from
> biology?) showing there have to be atleast say 5% (=R) of bits robustness
> at the output SDR? (eg because of errors at the synapses, etc..) So, for
> example, inputA causes SDR_A, iff I turn 5 of the 20 ON bits off, input_A
> would still be the most likely.
>
> This would allow me to lower the max number of patterns, because for (1000
> choose 20) I'd actually require (1000 choose 25).
>

This is a good question. I believe it is data-specific though. Your
proposed test from the previous email would be a good way to benchmark it.

>
>
>>
>>>
>>> Or is it the other way around and the SP tries to maximize this
>>> robustnes for the given number of patterns it is presented? I if I feed it
>>> huge number of patterns I'll pay the obvious price of reducing the border
>>> between two patterns?
>>>
>>
>> I think the answer to the first question is yes but to the second no. The
>> SP attempts to maximize the distance between the column input bits relative
>> to the actual data (rather than the entire input space). But feeding many
>> patterns in doesn't necessarily have an impact on this. If the input data
>> are not random, then the more data fed into the SP, I would expect the more
>> the columns will converge to the optimal representations.
>>
>
> This is true, I was (falsely) assuming random input. But in real
> use-cases, the SP will find "patterns" in the input patterns, so  even for
> higher number of input data, we may actually see drop in the entropy,as the
> SP will find some rule that separates the inputs.
>
>
>>> Either way, is there a reasonable way to measure what I defined a
>>> capacity?
>>>
>>> I was thinking like:
>>>
>>> for 10 repetitions:
>>>    for p in patterns_to_present:
>>>       sp.input(p)
>>>
>>> sp.disableLearning()
>>> for p in patterns_to_present:
>>>    p_mod = randomize_some_percentage_of_pattern(p, percentage)  # what
>>> should the percentage be? see above
>>>    if( sp.input(p) == sp.input(p_mod):
>>>         # ok, it's same, pattern learned
>>>
>>
>> This seems like a good methodology for determining how tolerant the model
>> is to noise for this particular dataset. The amount of data fed in before
>> disabling learning will have a large impact on the noise tolerance (but
>> with diminishing returns).
>>
>
>
> I think your answers led me to clearing it up, so a short summary... Does
> robustness to noise reciprocally correlate to the total number of input
> patterns I;m able to distinguish? (1/(rob tolerance) ~ #patterns) From what
> has been said, I think it is not necessary for real world datasets.
>

For some reason having trouble wrapping my head around that but I believe
that all makes sense. I think you are saying that if your input space was
varied enough that you saturate the SP, then the number of patterns you can
represent is inversely proportional to the noise tolerance. The caveat to
this is that even if you end up with a different SP representation because
of a small amount of noise, the representation will still be very
semantically similar to the previous. In fact, you most likely only have
one or two columns that are different so the higher levels will see a very
similar pattern.

>
>
>  PS:
> Is there a (lower bound) limit on the number of columns in SP? So would a
> 20 col SP work? That way, I could achieve the (20 choose 3) and reach the
> state of info-full SP.
>

The theory relies on large numbers. Subutai's CLA quiz covers it very
thoroughly. In your 20 choose 3 example, you lose fault
tolerance/subsampling at higher levels, the ability to represent many
different patterns (only 1140), and the ability to represent many
simultaneous patterns.

>
>
> Regards, Mark
>
>
>>>
>>> Thanks for your replies,
>>> Mark
>>>
>>>
>>> --
>>> Marek Otahal :o)
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
>
> --
> Marek Otahal :o)
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-discuss] Benchmarking CLA: information capacity of Spatial Pooler

Reply via email to