Re: [nupic-dev] Arbitrary names converted into an SDR

Patrick Higgins Fri, 13 Sep 2013 16:09:51 -0700

To expand on this a little I think it would be good
to add something I've noticed and been thinking
about related to encoding and semantic rich SDRs.

When looking at your example if you want to encode
words that are animals then (I believe this is true for
all encodings) it will be necessary to know some things
about the objects being referenced. Breaking up the
higher level concept of animal into constituent components
or attributes will allow your encoder to represent the
words as a string of say 1s and 0s and give semantic
overlap between representations of different words
(animals in this case, I'm leaving out plants for simplicity).
So for example, you might choose to include:

number of legs   0-6
number of wings  0-4
number of fins 0-10
primary color 0-360°
secondary color 0-360°
has hair  0-1
has scales 0-1
has feathers 0-1
overall size 0-100%
etc

Using the encodings described in other threads you
will end up with representations that have overlap and
sparseness. I believe encoding in this way is a workaround
(hack) for not using hierarchy, as these attributes would
be generated by a lower lever region and fed to the
current region for comparison, recognition and prediction.

If you're going to feed the CLA words, then you will have
to abstract the attributes you want to include and generate
an SDR. This means your encoder will provide the CLA
with a data stream that has semantic meaning embedded
which will provide overlap and sparseness.

If we include plants, then there would need to be attributes
included for animals such as bees like:

food source: flowers, carrion(yellow jackets)
habitat (near flowers?)

that way there is some crossover between plants and
animals and the CLA can find patterns between them

--however--

Maybe I'm reading your intent all wrong, and you'd just
like to feed the CLA word associations and have it build
a model around free association? If so, that sounds interesting.
and I'm sorry if I'm off track and not helping with you question.
If that's the idea then I'm not sure the best way to represent
each word. The problem you're looking at feels like I high
level issue and with only one region, the encoder will be
asked to do a lot to give the CLA something it can use...
without doing something like I've described above, I'm not
sure how else the encoder could convert words into SDRs

Patrick

On Sep 13, 2013, at 12:30 PM, Chetan Surpur wrote:

> I'll attempt to answer this question with the best of my understanding. 
> Someone more knowledgeable, please feel free to correct me where I'm wrong!
> 
> I'll make a couple of assumptions in order to come up with a concrete answer.
> 
> 1. Since you didn't specify how the sensory region that can accept arbitrary 
> strings works, let's say it works like this:
> 
> It splits up the string by semicolons, and encodes each word in the resulting 
> list as a category. Category encoder treats each unique input as independent, 
> and assigns a random dense encoding for it. This means that 'flower' would 
> look something like 1101, and 'bee' would look something like 1010, and 
> 'horse' would look something like 0110, so the string 'flower;bee;horse' 
> would be encoded as 1101 1010 0110.
> 
> 2. When you insert the word 'dog', you're actually inserting the word 
> '___;___;dog', such that the category encoding for the word 'dog' shows up at 
> the right side of the input to the spatial pooler.
> 
> 3. Let's assume that the spatial pooler's columns are connected to a small 
> locality of the input, so a subset of the columns would be connected to each 
> of the three words in the input.
> 
> If this were the case, then the spatial pooler would learn to produce an SDR 
> with 'on' values representing each word, since each word can be considered a 
> spatial coincidence of the exact configuration of bits outputted by the 
> sensory region. Further, each SDR would represent sparsely the words in the 
> input in order due to the second assumption of spatial pooler column locality.
> 
> Now we can finally consider your exact question. TOR = boolean.OR(T1, T2) 
> would now be the union of the SDRs of 'flower;bee;horse' and 
> 'bee;flower;dog', and since '___;___;dog' would overlap on 'dog' with 
> 'bee;flower;dog', X = boolean.AND(TOR,C) would meaningfully overlap with TOR 
> on the right third of the SDRs.
> 
> Keep in mind that you wouldn't get full overlap, since 'dog' is only a third 
> of the input. Also, you probably wouldn't get exact overlap on the 'dog' 
> third, because of the stochastic nature of the CLA. All I'm saying is that 
> with enough training data it would eventually learn to recognize words in 
> particular positions in the input, so the 'dog' overlap would eventually be 
> detected.
> 
> 
> On Thu, Sep 12, 2013 at 12:37 PM, Stewart Mackenzie <[email protected]> 
> wrote:
> Hi all,
> 
> A very intelligent chap said in a 3 part video on the numenta YouTube channel 
> that a python dictionary goes into a sensory region. This region outputs a 
> variable length SDR which is fed into the spatial pooler. The Spatial pooler 
> then spits out a 2048 binary vector which has been made sparse.
> 
> Say the sensory region can accept arbitrary strings.
> 
> Say I insert 'flower;bee;horse' into the sensory region and record the SP's 
> output and call it T1. Then I insert 'bee;flower;dog' and record the SP's 
> output, called T2.
> 
> Now I only insert the word 'dog' into the sensory region and record the SP's 
> output, calling it C.
> 
> If I TOR = boolean.OR(T1, T2) then do a X = boolean.AND(TOR,C) will C and X 
> be identical?
> 
> Thank you.
> 
> Stewart 
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
> 
> 
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] Arbitrary names converted into an SDR

Reply via email to