Hi Ramesh,

There's a lot more going on than just that, the reason being that the CLA is 
learning the whole sequence of patterns (well, the sequence of sequences of 
patterns) presented to it since "birth". The bits in each input record are not 
being directly learned, but rather the SDR's associated (via the SP) with each 
configuration of bits.

So, assuming there are 128 bits in the input record, and 21 of these are on at 
any one time (on average), there are Binomial[128,21] "possible" input records, 
which is about 6x10^23. However, this is not the input space at all, as we 
require that the input data is encoded in such a way that there is significant 
semantic overlap in the bit patterns. In the case of the default scalar 
encoder, we will have only 107 possible bit patterns (a 21-bit window starting 
at bit 1, 2, etc). The number can be even smaller (the music learning demo from 
the hackathon had only 17 or so "note" values!).

So you see that the input size is usually extremely small in dense bits (107 
values is less than 7 traditional bits), but the encoding for the SP stretches 
it to create semantic redundancy, robustness to noise and dendritic subsampling.

The idea is that if there are only 107 real input patterns, if you match even 5 
or 10 bits you are almost certain to looking at the input you were expecting 
(or a near neighbour). The columns which get the highest number of on-bits will 
be almost certain to be exact matches.

This is true almost regardless of the "size" of the inputs, because the 
combinatorial mathematics creates odds with Avogadro's number (6x10^23 for 
every 128 bits) in the denominator, and the semantic overlap requirement 
ensures that the numerator (number of genuine data input patterns) is vastly 
smaller than this.

Further, for each input pattern, the column activation pattern (eg 40 chosen 
from 2048) produces 1 (or a very small number of nearly identical) pattern from 
2.3x10^84 possible patterns. Again, there will be significant semantic overlap 
between the SDR's you actually see as you vary the input. The reinforcement of 
the feedforward dendrites will drive the SP to learn an ever more stable 
mapping from inputs to these learned patterns of activation.

Finally, with 20 or 30 cells in each column, each learning to predict its 
activation based on previous inputs (currently activated cells in other 
columns) in a learned sequence, the space of "possible" learned representations 
becomes truly enormous.

All this massive capacity is of course never used, because the structure in the 
world (even a virtual world) gives rise to only a tiny fraction of the possible 
sequences of sequences of input records in a finite time. The CLA uses all this 
redundant capacity to add redundancy and robustness in the face of noise, 
spurious data, missing sequence elements, and so on.

You only start to get the above effects - stable SDR activation patterns, 
sequence learning, etc - for typical inputs when you have something like 500+ 
columns (for a typical 128-bit scalar encoding) and 2048 is more than enough 
for every job encountered to date (at least nobody at Numenta has replied to 
say this is not true...).

Regards,

Fergal Byrne


_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to