Hi Ramesh, There's a lot more going on than just that, the reason being that the CLA is learning the whole sequence of patterns (well, the sequence of sequences of patterns) presented to it since "birth". The bits in each input record are not being directly learned, but rather the SDR's associated (via the SP) with each configuration of bits.
So, assuming there are 128 bits in the input record, and 21 of these are on at any one time (on average), there are Binomial[128,21] "possible" input records, which is about 6x10^23. However, this is not the input space at all, as we require that the input data is encoded in such a way that there is significant semantic overlap in the bit patterns. In the case of the default scalar encoder, we will have only 107 possible bit patterns (a 21-bit window starting at bit 1, 2, etc). The number can be even smaller (the music learning demo from the hackathon had only 17 or so "note" values!). So you see that the input size is usually extremely small in dense bits (107 values is less than 7 traditional bits), but the encoding for the SP stretches it to create semantic redundancy, robustness to noise and dendritic subsampling. The idea is that if there are only 107 real input patterns, if you match even 5 or 10 bits you are almost certain to looking at the input you were expecting (or a near neighbour). The columns which get the highest number of on-bits will be almost certain to be exact matches. This is true almost regardless of the "size" of the inputs, because the combinatorial mathematics creates odds with Avogadro's number (6x10^23 for every 128 bits) in the denominator, and the semantic overlap requirement ensures that the numerator (number of genuine data input patterns) is vastly smaller than this. Further, for each input pattern, the column activation pattern (eg 40 chosen from 2048) produces 1 (or a very small number of nearly identical) pattern from 2.3x10^84 possible patterns. Again, there will be significant semantic overlap between the SDR's you actually see as you vary the input. The reinforcement of the feedforward dendrites will drive the SP to learn an ever more stable mapping from inputs to these learned patterns of activation. Finally, with 20 or 30 cells in each column, each learning to predict its activation based on previous inputs (currently activated cells in other columns) in a learned sequence, the space of "possible" learned representations becomes truly enormous. All this massive capacity is of course never used, because the structure in the world (even a virtual world) gives rise to only a tiny fraction of the possible sequences of sequences of input records in a finite time. The CLA uses all this redundant capacity to add redundancy and robustness in the face of noise, spurious data, missing sequence elements, and so on. You only start to get the above effects - stable SDR activation patterns, sequence learning, etc - for typical inputs when you have something like 500+ columns (for a typical 128-bit scalar encoding) and 2048 is more than enough for every job encountered to date (at least nobody at Numenta has replied to say this is not true...). Regards, Fergal Byrne
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
