Hi Jeff, The lads mentioned this in the Sprint Meeting, but more or less said to wait until we'd heard from you.
I disagree that that this is a vastly complex problem, and I think there is a way to avoid the problems you raise. If you consider the idea that each bit represents a centroid of the range with a radius, then the case where your range must be extended is just a situation of adjusting the centroids and radii of each bit, so that they shift gradually to accommodate the new range of data. You can do this based on the statistics of the data, allowing the centroids to spread out quite slowly when new out-of-range data are encountered, and have an algorithm which gradually spreads out the meaning of all the bits, and enlarges the radii as new min or max values are encountered. So, if you get a couple of new values larger than the max, then you let those values "burn out" the top encodings, but you increment the centroid values of all bits a little for them. If lots of values keep appearing above the max, this gives rise to a gradually enlarging spread for the encoder, while all bits slowly change their semantic meaning for the SP. The patterns and sequences already learned by the region would migrate gradually to the slowly evolving range, but would retain all their learned understanding of the data space. There would not be any sudden shift in the semantics of any input bit, or any column. If you treat an outlier as such, in other words give it a statistical weight which affects the min-max in terms of how often it occurs in the data, then you should be able to gradually adjust the interpretations of the mapping from input scalar value to encoder bits. Regards, Fergal Byrne On Fri, Oct 25, 2013 at 10:58 PM, Jeff Hawkins <[email protected]>wrote: > We have had an issue with our encoders for some time. I recently came > up with a solution that I want to share. It would make a good smallish > project, something that could be done during the hack-a-thon for example. > **** > > ** ** > > *The Problem* > > Our current encoder needs to know in advance the max and min value it will > represent. We usually look through any historical data we have to find the > max and min and add some for safety. A problem occurs if the range of > actual values is greater than we anticipated. It isn’t uncommon for > numbers to grow over time. If we just change the encoder to represent a > larger range it will mess up all the previous learning in the CLA. **** > > ** ** > > It is analogous to how our cochlea evolved to represent 20 to 20KHz for > humans. If we needed to start hearing patterns above 20KHz our cochlea > wouldn’t cut it. If we replaced it with a new cochlea that had an extended > range then all of our previous auditory learning would be lost.**** > > ** ** > > We toyed with the idea of slowly modifying the encoder so all learning > wouldn’t be lost at once, but this has problems. We didn’t have a good > solution for this problem.**** > > ** ** > > *The Proposed Solution* > > Let’s say our encoder produces a 500 bit output of which 20 bits are > active at once. Recall that each bit represents a small span of the number > line (we refer to this as a “bucket”). Adjacent bits in the output > represent overlapping buckets. Any input number overlaps twenty buckets. > As the input value increases one bit will turn to zero and another will > become one.**** > > ** ** > > Now imagine the input value approaches and then exceeds the max value of > the encoder. We have no more bits to encode the new high value, no more > buckets. Today we represent any value over the max the same as the max. > This isn’t good.**** > > ** ** > > The solution is to continue creating new buckets beyond the max value and > assign them to one of the existing 500 bits *at random*. As soon as we > do this, encoder bits will start representing two different ranges. They > will be assigned to two different buckets, the original one and the new one > that is above the max value. It is important that the new extended buckets > are assigned to existing bits randomly.**** > > ** ** > > Imagine we have encoded a new value that is above our max. It is > represented by 20 new buckets that have been randomly assigned to twenty > bits. The original bucket ranges for the 20 bits representing the new high > value are not overlapping, but the new bucket ranges are overlapping. > Therefore the spatial pooler will not get confused by bits having two or > more ranges. I am at a little loss of words to describe exactly why this > is so, hopefully you can see why this works. If not, I can try to describe > it further.**** > > ** ** > > The cleanest way to implement this might be to throw away the idea that > the first 500 buckets are assigned to adjacent bits. Instead just start > assigning buckets to random bits and keep going as far as you need to. > This will eliminate edge issues.**** > > ** ** > > If you are interested in tackling this project, Scott at Numenta has > volunteered to provide assistance. He can point you to the correct code > and help in other ways.**** > > ** ** > > Jeff**** > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > > -- Fergal Byrne <http://www.examsupport.ie>Brenter IT [email protected] +353 83 4214179 Formerly of Adnet [email protected] http://www.adnet.ie
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
