We have had an issue with our encoders for some time. I recently came up with a solution that I want to share. It would make a good smallish project, something that could be done during the hack-a-thon for example.
The Problem Our current encoder needs to know in advance the max and min value it will represent. We usually look through any historical data we have to find the max and min and add some for safety. A problem occurs if the range of actual values is greater than we anticipated. It isn't uncommon for numbers to grow over time. If we just change the encoder to represent a larger range it will mess up all the previous learning in the CLA. It is analogous to how our cochlea evolved to represent 20 to 20KHz for humans. If we needed to start hearing patterns above 20KHz our cochlea wouldn't cut it. If we replaced it with a new cochlea that had an extended range then all of our previous auditory learning would be lost. We toyed with the idea of slowly modifying the encoder so all learning wouldn't be lost at once, but this has problems. We didn't have a good solution for this problem. The Proposed Solution Let's say our encoder produces a 500 bit output of which 20 bits are active at once. Recall that each bit represents a small span of the number line (we refer to this as a "bucket"). Adjacent bits in the output represent overlapping buckets. Any input number overlaps twenty buckets. As the input value increases one bit will turn to zero and another will become one. Now imagine the input value approaches and then exceeds the max value of the encoder. We have no more bits to encode the new high value, no more buckets. Today we represent any value over the max the same as the max. This isn't good. The solution is to continue creating new buckets beyond the max value and assign them to one of the existing 500 bits at random. As soon as we do this, encoder bits will start representing two different ranges. They will be assigned to two different buckets, the original one and the new one that is above the max value. It is important that the new extended buckets are assigned to existing bits randomly. Imagine we have encoded a new value that is above our max. It is represented by 20 new buckets that have been randomly assigned to twenty bits. The original bucket ranges for the 20 bits representing the new high value are not overlapping, but the new bucket ranges are overlapping. Therefore the spatial pooler will not get confused by bits having two or more ranges. I am at a little loss of words to describe exactly why this is so, hopefully you can see why this works. If not, I can try to describe it further. The cleanest way to implement this might be to throw away the idea that the first 500 buckets are assigned to adjacent bits. Instead just start assigning buckets to random bits and keep going as far as you need to. This will eliminate edge issues. If you are interested in tackling this project, Scott at Numenta has volunteered to provide assistance. He can point you to the correct code and help in other ways. Jeff
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
