Hi Rik,

It's really great that this is being examined. It's going to be really
useful to develop these kinds of metrics, as this is a very important
question when discussing NuPIC with new people and for new applications.

As Subutai said, in the CLA all SDR's are matched based on a subset of the
bits. This reflects the fact that higher-level cells are exposed only to
samples of the region's bits. As Jeff explains in his talks, one of the
major properties of SDR's is the high confidence you can have that a sample
of bits matches what you're looking for.

I believe a good metric for "accuracy" is to use a confidence or likelihood
measure rather than counting overlaps to establish "false positives". Thus,
you can be "99% confident" that an SDR represents a particular input SDR if
99% of random samples of a certain size match. This will have a different
(but possibly related) numerical character as you increase the number of
SDRs in the union. I'll have a think about this and come back with
something in this thread (I must dig out Mathematica, it's great for things
like this).

In a lot of real-world applications, the output of the TP may indeed be
quite large, corresponding to a number of predicted next inputs. However,
it's relatively rare that the SDR's for the predicted inputs will overlap,
and if they do then this is telling you something important.

For example (using letters to represent input values), let's say we've seen
the sequence ABC and we run the TP. The TP might produce a prediction which
looks like a union of D, M and X, say. Now most of the time D, M and X will
share very few bits, because in the past when one of these (let's say D)
was seen, the cells which were predicting D in a column would have
inhibited any cells predicting M and X and would have strengthened its
connections to C.

If, on the other hand, you do get a lot of overlap between predicted
values. This would indicate that the TP was genuinely unsure which sequence
it was in, and perhaps that the various sequences had been poorly learned.

As everyone is saying, well done for kicking off the discussion, especially
with a candidate mathematical description of how you might estimate these
numbers. We need to be confident about the ability of the CLA to handle
large amounts of data and learn lots of useful sequences, so efforts to
quantify capacity are very important.

Regards,

Fergal Byrne




On Mon, Nov 4, 2013 at 10:02 PM, Jeff Hawkins <[email protected]> wrote:

> Rik,
> This is great!!  Nicely done.
>
> I am not sure I understand your question:
> Although the union property is related to what happens going from the TP at
> one level to the SP at the next level, that is not what motivated looking
> for the union property of SDRs.
>
> In a single region, a single CLA, the TP is making a prediction of what is
> going to happen next.  The union property allows the CLA to make multiple
> simultaneous predictions.  The goal is to check to see if the next input
> matches one of the predictions.  The capacity to distinguish these
> different
> predictions is a measure of how many things we can anticipate at the same
> time and still be confident we will notice something unexpected happens.
> This is an important property of an animal, to notice that something is
> unexpected.  As Subutai pointed out, it is significant that with SDRs we
> can
> make multiple simultaneous predictions using the same fixed set of cells.
> The brain doesn't build a dynamic list of predictions, plus it is
> computationally efficient to do a single comparison which checks for all
> the
> predictions at once.
>
> Sometimes we know precisely what is going to happen next and other times
> there are many possibilities.  Having a system that accommodates both is
> essential.
>
> In the CLA a mis-prediction happens at the columnar level.  This is also
> useful and unique.  It means we know what part of the input is wrong and
> what part is right.  The "wrong part" could be topologically related to a
> sensory array (e.g.  this part of visual field is wrong) or it could be
> more
> semantically related higher in the hierarchically (e.g. the verb action
> doesn't match vs. the verb tense doesn't match).
>
> When we have a column mis-match all the cells in the column become active.
> In the real brain we know that mis-predictions lead to significant more
> activity in the area of mis-prediction.  We also know that excess activity
> will open up the gated pathway through the thalamus.  Thus a mis-prediction
> forces attention on the part of the input that was mis-predicted.
> Jeff
>
> -----Original Message-----
> From: nupic [mailto:[email protected]] On Behalf Of Rik
> Sent: Monday, November 04, 2013 5:55 AM
> To: [email protected]
> Subject: [nupic-dev] SDR capacity
>
> Hi,
>
> Yesterday in the CLA presentation at the hackathon Subutai was asking "how
> many 40-out-of-2048 SDRs can you store in an SDR?" where "storing"
> meaning or'ing them all up (as in Boolean "or") and then comparing the
> combined SDR to a sample SDR. (The combined SDR may actually not be so
> sparse anymore but I stick to calling all bit vectors "SDR" here.) All "1"
> bits of the sample SDR should only be present in the result if the sample
> was part of the set in the first place, otherwise it'd be a false positive.
> A matching random SDR would almost certainly be a false positive as any
> realistic set of input SDRs are a vanishingly small subset of all possible
> SDRs.
>
> Since false positives are inevitable and their probability increases the
> more SDRs you "or" together, the question should really ask for a capacity
> given a desired probability of false positives, or rather a desired
> accuracy
> which is the complement (1 - x) of false positives (there being no false
> negatives). So how does the accuracy relate to the capacity? And what's the
> capacity for the above 40/2048 assuming a desired accuracy of, say, 99%?
>
> The attached Mathematica notebook "sdr_capacity.nb" seeks to work this out.
> First the accuracy is determined as a function of length, density and
> capacity and then solved for capacity.
>
> Starting with the example 40-out-of-2048, the density is 40/2048 = .0195
> meaning 1.95% of bits are "1" and the density of 0's (called "density0") is
> 1 - density = .9804. Because two bits or'ed together are only 0 if both
> bits
> are 0, the density0 of two SDRs or'ed together is the product of the
> original density0's. Or for the case of, say, 100 SDRs of the same density0
> or'ed together: density0^100 = .1142. The density (of 1's) is then 1 -
> density0 = .8858 (= 1814 bits for a length of 2048).
>
> To determine the probability of a random SDR to have all its 1-bits present
> in the combined SDR we take its first 1-bit position and check for a
> presence of 1 in the combined SDR. With a density of .8858 the probability
> is .8858. If true we can then picture taking that 1-bit out of the combined
> SDR leaving an empty spot (not a 0-bit). The remaining SDR shorted by one
> bit pretty much still has the same density, this is an approximation for
> low
> densities in the original and sample SDRs. So for all 40 bits the
> probability is again the product of all 40 probabilities = .8858^40 =
> .0078.
>
> (In reality the density goes down as you remove 1's so this approximation
> overestimates the probability of false positives a little bit and
> underestimates the accuracy. The exact calculation is left as an exercise
> for the reader.)
>
> The accuracy then is the complement (1 - x) of the probability of a false
> positive = .9922 or 99.22%. In general, in Mathematica notation:
>
> PoolerAccuracy[length_, density_, capacity_] := N[1 - (1 - (1 -
> density)^capacity)^(length*density)]
>
> Plotted for capacities from 0 to 1000, the attached graph "accuracy.png"
> shows the expected sigmoid function with a sharp drop off in accuracy
> starting from around 120.
>
> Mathematica then solves for capacity and for the values of length = 2048,
> density 40/2048 and accuracy .99 yields a capacity of 112. The attached
> plots "capacity_vs_density.png" and "capacity_vs_length.png"
> show that decreasing density or increasing length increase capacity.
>
> To test my own math I perform a little experiment in attached Mathematica
> notebook "sdr_experiment.nb". I combined 100 SDRs using Boolean "or" and
> then test 10,000 sample SDRs against the combined SDR.
> This yields an accuracy of 9892 out of 10,000 = 99% as expected.
>
> All correct? And how does this relate to CLA? The temporal pooler feeds an
> or'ed SDR of the current input and all predictions into the spatial pooler
> with its learned one SDR per each column. This doesn't test for exact
> matches, it's more like a certain percentage of bits has to match and that
> percentage depends on match rates of the neighbors. This requires some
> other
> model of "capacity" than the above. Correct?
>
> Cheers
>
> Rik
>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>



-- 

Fergal Byrne

<http://www.examsupport.ie>Brenter IT
[email protected] +353 83 4214179
Formerly of Adnet [email protected] http://www.adnet.ie
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to