Thanks for the details on your implementation Fergal! Just to be clear, it is possible to create a constant memory and constant time solution (assuming fixed w). The one that I came up with does not use all nCw combinations of active bits though. Instead it uses (n/2)Cw * (n/2)Cw.
I am hoping someone can find a solution with the same time/memory bounds but with a higher entropy solution. IE one that will have random collisions less frequently. On Tue, Feb 18, 2014 at 1:01 AM, Fergal Byrne <[email protected]>wrote: > Hi Scott, Chetan, > > Great thought experiment, Scott. > > Someone looking at my Clojure code [1] (a Clojure expert, not one on > NuPIC) had some issues with my using a mutable data structure (ie memory) > so much. I looked at ways of eliminating it, but there's no simple way to > do it unless you know that you'll never go backwards on the number line. > This also means that the encodings are dependent on the order in which you > present the data to the encoder. > > For example, let's say the first value encoded is 0 (without loss of > generality). If you have to encode 1 next, it will get the second code, 2 > will get the next one, and so on. But if -1 is provided after 1, it'll get > the next code (or at least some distortion of it) and thus 2 will be > encoded differently. > > This means that every encoder must remember its buckets in order to give > back the same encoding for previously computed values, or else remember the > entire sequence of values and rerun their computations each time (which may > cost more memory if many values per bucket must be stored). > > I've added a test/demo for this to my document. > > Update: If you decide on 0 as a centre, you can precalculate bands of > buckets out to your first data value (and repeat this for each new one), > which ensures the encoding is always the same: > > e.g given 22 as the first datum, generate buckets for -10...10, -20...20, > -30...30 and return encoding(22). > > You could choose a different centre if you know more about your data. I've > detailed this idea in the doc. > > [1] http://fergalbyrne.github.io/rdse.html > > Regards > > Fergal Byrne > > > > > > On Tue, Feb 18, 2014 at 6:06 AM, Chetan Surpur <[email protected]> wrote: > >> For this problem, this looks useful: >> http://en.wikipedia.org/wiki/Linear_feedback_shift_register >> >> >> On Mon, Feb 17, 2014 at 6:01 PM, Chetan Surpur <[email protected]>wrote: >> >>> A very simple approach would be to trade speed for memory. Instead of >>> storing a map between buckets and SDRs, we can go through the bucket >>> generation process every time we want to find the SDR for a bucket. From >>> what I understand, this bucket generation process is linear in speed with >>> the number of buckets you want to generate. So the linear memory >>> requirement would be translated into a linear speed requirement. >>> >>> In a nutshell, walk through the number line, generating buckets until >>> you hit the target bucket you want a representation for, *every time* you >>> want to get a representation, and don't store anything. You'll need to use >>> the same seed for the random number generator though, to get consistent >>> results. >>> >>> The advantage of this is that it's a simple modification to what is >>> already implemented. On the other hand, it's slightly slower when >>> outputting SDRs for previously-seen buckets. >>> >>> >>> On Mon, Feb 17, 2014 at 5:15 PM, Scott Purdy <[email protected]> wrote: >>> >>>> Hi all, I thought some of you might enjoy trying to come up with a >>>> solution for this problem. If you watch Chetan's presentation about the >>>> random distributed scalar encoder (RDSE), you will see that we are keeping >>>> a mapping between all buckets computed so far and the bits that represent >>>> them. This was Subutai's implementation of Jeff's general idea for the >>>> encoder. This design has a memory usage for the encoder that increases >>>> linearly with the number of buckets that it has to represent. >>>> >>>> When originally discussing the design, I was trying to find a way to >>>> statically compute the mapping so that you don't have to store anything. >>>> But it has to have the property that buckets i and j have w-(j-i) >>>> overlapping bits if j-i<w and also that a given index is never assigned >>>> multiple times to the same bucket. I came up with a solution but it would >>>> likely have more random collisions than Subutai's linear-memory solution >>>> because it was limited in the number of possibly combinations of bits the >>>> buckets could have. Curious if someone can come up with something better! >>>> >>>> And be sure to watch Chetan's presentation on the RDSE that Subutai >>>> designed and implemented for background. >>>> >>>> *Note: the current implementation is fine for all practical scenarios >>>> so this is just a fun exercise for those interested* >>>> >>>> _______________________________________________ >>>> nupic mailing list >>>> [email protected] >>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>> >>>> >>> >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> >> > > > -- > > Fergal Byrne, Brenter IT > > <http://www.examsupport.ie>http://inbits.com - Better Living through > Thoughtful Technology > > e:[email protected] t:+353 83 4214179 > Formerly of Adnet [email protected] http://www.adnet.ie > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > >
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
