We have had an issue with our encoders for some time.  I recently came up with 
a solution that I want to share.  It would make a good smallish project, 
something that could be done during the hack-a-thon for example.

The Problem
Our current encoder needs to know in advance the max and min value it will 
represent.  We usually look through any historical data we have to find the max 
and min and add some for safety.  A problem occurs if the range of actual 
values is greater than we anticipated.  It isn't uncommon for numbers to grow 
over time.  If we just change the encoder to represent a larger range it will 
mess up all the previous learning in the CLA.

It is analogous to how our cochlea evolved to represent 20 to 20KHz for humans. 
 If we needed to start hearing patterns above 20KHz our cochlea wouldn't cut 
it.  If we replaced it with a new cochlea that had an extended range then all 
of our previous auditory learning would be lost.

We toyed with the idea of slowly modifying the encoder so all learning wouldn't 
be lost at once, but this has problems.  We didn't have a good solution for 
this problem.

The Proposed Solution
Let's say our encoder produces a 500 bit output of which 20 bits are active at 
once.  Recall that each bit represents a small span of the number line (we 
refer to this as a "bucket").  Adjacent bits in the output represent 
overlapping buckets.  Any input number overlaps twenty buckets.  As the input 
value increases one bit will turn to zero and another will become one.

Now imagine the input value approaches and then exceeds the max value of the 
encoder.  We have no more bits to encode the new high value, no more buckets.  
Today we represent any value over the max the same as the max.  This isn't good.

The solution is to continue creating new buckets beyond the max value and 
assign them to one of the existing 500 bits at random.  As soon as we do this, 
encoder bits will start representing two different ranges.  They will be 
assigned to two different buckets, the original one and the new one that is 
above the max value.  It is important that the new extended buckets are 
assigned to existing bits randomly.

Imagine we have encoded a new value that is above our max.  It is represented 
by 20 new buckets that have been randomly assigned to twenty bits.  The 
original bucket ranges for the 20 bits representing the new high value are not 
overlapping, but the new bucket ranges are overlapping.  Therefore the spatial 
pooler will not get confused by bits having two or more ranges.  I am at a 
little loss of words to describe exactly why this is so, hopefully you can see 
why this works.  If not, I can try to describe it further.

The cleanest way to implement this might be to throw away the idea that the 
first 500 buckets are assigned to adjacent bits.  Instead just start assigning 
buckets to random bits and keep going as far as you need to.  This will 
eliminate edge issues.

If you are interested in tackling this project, Scott at Numenta has 
volunteered to provide assistance.  He can point you to the correct code and 
help in other ways.

Jeff
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to