To expand on this a little I think it would be good to add something I've noticed and been thinking about related to encoding and semantic rich SDRs.
When looking at your example if you want to encode words that are animals then (I believe this is true for all encodings) it will be necessary to know some things about the objects being referenced. Breaking up the higher level concept of animal into constituent components or attributes will allow your encoder to represent the words as a string of say 1s and 0s and give semantic overlap between representations of different words (animals in this case, I'm leaving out plants for simplicity). So for example, you might choose to include: number of legs 0-6 number of wings 0-4 number of fins 0-10 primary color 0-360° secondary color 0-360° has hair 0-1 has scales 0-1 has feathers 0-1 overall size 0-100% etc Using the encodings described in other threads you will end up with representations that have overlap and sparseness. I believe encoding in this way is a workaround (hack) for not using hierarchy, as these attributes would be generated by a lower lever region and fed to the current region for comparison, recognition and prediction. If you're going to feed the CLA words, then you will have to abstract the attributes you want to include and generate an SDR. This means your encoder will provide the CLA with a data stream that has semantic meaning embedded which will provide overlap and sparseness. If we include plants, then there would need to be attributes included for animals such as bees like: food source: flowers, carrion(yellow jackets) habitat (near flowers?) that way there is some crossover between plants and animals and the CLA can find patterns between them --however-- Maybe I'm reading your intent all wrong, and you'd just like to feed the CLA word associations and have it build a model around free association? If so, that sounds interesting. and I'm sorry if I'm off track and not helping with you question. If that's the idea then I'm not sure the best way to represent each word. The problem you're looking at feels like I high level issue and with only one region, the encoder will be asked to do a lot to give the CLA something it can use... without doing something like I've described above, I'm not sure how else the encoder could convert words into SDRs Patrick On Sep 13, 2013, at 12:30 PM, Chetan Surpur wrote: > I'll attempt to answer this question with the best of my understanding. > Someone more knowledgeable, please feel free to correct me where I'm wrong! > > I'll make a couple of assumptions in order to come up with a concrete answer. > > 1. Since you didn't specify how the sensory region that can accept arbitrary > strings works, let's say it works like this: > > It splits up the string by semicolons, and encodes each word in the resulting > list as a category. Category encoder treats each unique input as independent, > and assigns a random dense encoding for it. This means that 'flower' would > look something like 1101, and 'bee' would look something like 1010, and > 'horse' would look something like 0110, so the string 'flower;bee;horse' > would be encoded as 1101 1010 0110. > > 2. When you insert the word 'dog', you're actually inserting the word > '___;___;dog', such that the category encoding for the word 'dog' shows up at > the right side of the input to the spatial pooler. > > 3. Let's assume that the spatial pooler's columns are connected to a small > locality of the input, so a subset of the columns would be connected to each > of the three words in the input. > > If this were the case, then the spatial pooler would learn to produce an SDR > with 'on' values representing each word, since each word can be considered a > spatial coincidence of the exact configuration of bits outputted by the > sensory region. Further, each SDR would represent sparsely the words in the > input in order due to the second assumption of spatial pooler column locality. > > Now we can finally consider your exact question. TOR = boolean.OR(T1, T2) > would now be the union of the SDRs of 'flower;bee;horse' and > 'bee;flower;dog', and since '___;___;dog' would overlap on 'dog' with > 'bee;flower;dog', X = boolean.AND(TOR,C) would meaningfully overlap with TOR > on the right third of the SDRs. > > Keep in mind that you wouldn't get full overlap, since 'dog' is only a third > of the input. Also, you probably wouldn't get exact overlap on the 'dog' > third, because of the stochastic nature of the CLA. All I'm saying is that > with enough training data it would eventually learn to recognize words in > particular positions in the input, so the 'dog' overlap would eventually be > detected. > > > On Thu, Sep 12, 2013 at 12:37 PM, Stewart Mackenzie <[email protected]> > wrote: > Hi all, > > A very intelligent chap said in a 3 part video on the numenta YouTube channel > that a python dictionary goes into a sensory region. This region outputs a > variable length SDR which is fed into the spatial pooler. The Spatial pooler > then spits out a 2048 binary vector which has been made sparse. > > Say the sensory region can accept arbitrary strings. > > Say I insert 'flower;bee;horse' into the sensory region and record the SP's > output and call it T1. Then I insert 'bee;flower;dog' and record the SP's > output, called T2. > > Now I only insert the word 'dog' into the sensory region and record the SP's > output, calling it C. > > If I TOR = boolean.OR(T1, T2) then do a X = boolean.AND(TOR,C) will C and X > be identical? > > Thank you. > > Stewart > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org _______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
