Hi Fergal, That's a nice property (not having to save buckets). You're right that once you fix the random seed everything else is deterministic. The NuPIC implementation does require loading/saving buckets each time to load/save the network.
--Subutai On Tue, Feb 18, 2014 at 12:20 PM, Fergal Byrne <[email protected]>wrote: > Yes, but then you'd have to save all your buckets between runs of the > encoder, and reload them each time you use it. The SP is depending on those > encodings never changing, and for example if you decided to aggregate > values, or choose every nth one, you'd break the constancy of the encoding. > > My workaround means that every encoding is identical no matter how you > give it data - it reconstructs all needed buckets the first time you give > it data, and continues to do so for every new one. The random number > generator provides everything you need to rebuild the encoding, so saving > or reloading necessary. > > Regards > > Fergal Byrne > > > On Tue, Feb 18, 2014 at 8:13 PM, Chetan Surpur <[email protected]> wrote: > >> As long as the encoder maintains the mappings for the currently-existing >> buckets, why must the new encodings be independent of the order of >> presentation of the data? >> >> >> On Tue, Feb 18, 2014 at 12:10 PM, Fergal Byrne < >> [email protected]> wrote: >> >>> Hi Chetan, >>> >>> No, but the encodings should always be independent of the order of >>> presentation of the data, so it's a bug if they're not. My code includes a >>> workaround which builds buckets out in both directions from a predefined >>> centre value until it encompasses each new value. This guarantees the same >>> encoding regardless of which values come in when. You could easily add a >>> version of this to your encoder, it's a small overhead for ensuring >>> identical encodings. >>> >>> This is the C4 idea in action - argue with patches... Just shows you how >>> useful an executable document can be when you're experimenting! >>> >>> Regards, >>> >>> Fergal >>> >>> >>> On Tue, Feb 18, 2014 at 6:55 PM, Chetan Surpur <[email protected]>wrote: >>> >>>> Oh, my mistake, I misunderstood the question. I thought Fergal was >>>> asking if the order has to be presented in a certain order to get correct >>>> results (results that have the desired overlap properties). >>>> >>>> So yes, order dependence exists in the currently implemented encoder, >>>> but it shouldn't affect correctness. >>>> On Feb 18, 2014 10:51 AM, "Scott Purdy" <[email protected]> wrote: >>>> >>>>> Fergal, I believe the implementation in NuPIC is dependent on the >>>>> order of data. Why do you ask? The constant-memory design I have brought >>>>> up >>>>> here do not exist in NuPIC. >>>>> >>>>> Chetan, you are right that it extends separately in each directly but >>>>> I believe that the randomness is shared so the order of the data would >>>>> affect it. It wouldn't be difficult to change that though. But it also >>>>> doesn't really solve any problems. >>>>> >>>>> >>>>> On Tue, Feb 18, 2014 at 10:39 AM, Chetan Surpur <[email protected]>wrote: >>>>> >>>>>> From reading the code, it looks to me that the generation of buckets >>>>>> happens on the left and right boundaries of the currently-existing >>>>>> buckets, >>>>>> and extends the boundaries to create buckets as necessary. Thus, it >>>>>> shouldn't matter what order the data is presented. Subutai can correct me >>>>>> if I'm mistaken. >>>>>> >>>>>> >>>>>> On Tue, Feb 18, 2014 at 10:35 AM, Fergal Byrne < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Cheers Scott. >>>>>>> >>>>>>> Have you checked the NuPIC implementation for dependence on the >>>>>>> order of data presented? My python isn't up to that ;{ >>>>>>> >>>>>>> Regards >>>>>>> >>>>>>> Fergal Byrne >>>>>>> >>>>>>> >>>>>>> On Tue, Feb 18, 2014 at 5:37 PM, Scott Purdy <[email protected]>wrote: >>>>>>> >>>>>>>> Oh and Chetan's proposal is good but it still has memory and time >>>>>>>> constraints that are linear with the number of buckets (but it doesn't >>>>>>>> have >>>>>>>> to keep the memory in use in between invocations). >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Feb 18, 2014 at 9:36 AM, Scott Purdy <[email protected]>wrote: >>>>>>>> >>>>>>>>> Thanks for the details on your implementation Fergal! >>>>>>>>> >>>>>>>>> Just to be clear, it is possible to create a constant memory and >>>>>>>>> constant time solution (assuming fixed w). The one that I came up >>>>>>>>> with does >>>>>>>>> not use all nCw combinations of active bits though. Instead it uses >>>>>>>>> (n/2)Cw >>>>>>>>> * (n/2)Cw. >>>>>>>>> >>>>>>>>> I am hoping someone can find a solution with the same time/memory >>>>>>>>> bounds but with a higher entropy solution. IE one that will have >>>>>>>>> random >>>>>>>>> collisions less frequently. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Feb 18, 2014 at 1:01 AM, Fergal Byrne < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Scott, Chetan, >>>>>>>>>> >>>>>>>>>> Great thought experiment, Scott. >>>>>>>>>> >>>>>>>>>> Someone looking at my Clojure code [1] (a Clojure expert, not one >>>>>>>>>> on NuPIC) had some issues with my using a mutable data structure (ie >>>>>>>>>> memory) so much. I looked at ways of eliminating it, but there's no >>>>>>>>>> simple >>>>>>>>>> way to do it unless you know that you'll never go backwards on the >>>>>>>>>> number >>>>>>>>>> line. This also means that the encodings are dependent on the order >>>>>>>>>> in >>>>>>>>>> which you present the data to the encoder. >>>>>>>>>> >>>>>>>>>> For example, let's say the first value encoded is 0 (without loss >>>>>>>>>> of generality). If you have to encode 1 next, it will get the second >>>>>>>>>> code, >>>>>>>>>> 2 will get the next one, and so on. But if -1 is provided after 1, >>>>>>>>>> it'll >>>>>>>>>> get the next code (or at least some distortion of it) and thus 2 >>>>>>>>>> will be >>>>>>>>>> encoded differently. >>>>>>>>>> >>>>>>>>>> This means that every encoder must remember its buckets in order >>>>>>>>>> to give back the same encoding for previously computed values, or >>>>>>>>>> else >>>>>>>>>> remember the entire sequence of values and rerun their computations >>>>>>>>>> each >>>>>>>>>> time (which may cost more memory if many values per bucket must be >>>>>>>>>> stored). >>>>>>>>>> >>>>>>>>>> I've added a test/demo for this to my document. >>>>>>>>>> >>>>>>>>>> Update: If you decide on 0 as a centre, you can precalculate >>>>>>>>>> bands of buckets out to your first data value (and repeat this for >>>>>>>>>> each new >>>>>>>>>> one), which ensures the encoding is always the same: >>>>>>>>>> >>>>>>>>>> e.g given 22 as the first datum, generate buckets for -10...10, >>>>>>>>>> -20...20, -30...30 and return encoding(22). >>>>>>>>>> >>>>>>>>>> You could choose a different centre if you know more about your >>>>>>>>>> data. I've detailed this idea in the doc. >>>>>>>>>> >>>>>>>>>> [1] http://fergalbyrne.github.io/rdse.html >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> >>>>>>>>>> Fergal Byrne >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Feb 18, 2014 at 6:06 AM, Chetan Surpur < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> For this problem, this looks useful: >>>>>>>>>>> http://en.wikipedia.org/wiki/Linear_feedback_shift_register >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Feb 17, 2014 at 6:01 PM, Chetan Surpur < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> A very simple approach would be to trade speed for memory. >>>>>>>>>>>> Instead of storing a map between buckets and SDRs, we can go >>>>>>>>>>>> through the >>>>>>>>>>>> bucket generation process every time we want to find the SDR for a >>>>>>>>>>>> bucket. >>>>>>>>>>>> From what I understand, this bucket generation process is linear >>>>>>>>>>>> in speed >>>>>>>>>>>> with the number of buckets you want to generate. So the linear >>>>>>>>>>>> memory >>>>>>>>>>>> requirement would be translated into a linear speed requirement. >>>>>>>>>>>> >>>>>>>>>>>> In a nutshell, walk through the number line, generating buckets >>>>>>>>>>>> until you hit the target bucket you want a representation for, >>>>>>>>>>>> *every >>>>>>>>>>>> time* you want to get a representation, and don't store >>>>>>>>>>>> anything. You'll need to use the same seed for the random number >>>>>>>>>>>> generator >>>>>>>>>>>> though, to get consistent results. >>>>>>>>>>>> >>>>>>>>>>>> The advantage of this is that it's a simple modification to >>>>>>>>>>>> what is already implemented. On the other hand, it's slightly >>>>>>>>>>>> slower when >>>>>>>>>>>> outputting SDRs for previously-seen buckets. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Feb 17, 2014 at 5:15 PM, Scott Purdy <[email protected] >>>>>>>>>>>> > wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi all, I thought some of you might enjoy trying to come up >>>>>>>>>>>>> with a solution for this problem. If you watch Chetan's >>>>>>>>>>>>> presentation about >>>>>>>>>>>>> the random distributed scalar encoder (RDSE), you will see that >>>>>>>>>>>>> we are >>>>>>>>>>>>> keeping a mapping between all buckets computed so far and the >>>>>>>>>>>>> bits that >>>>>>>>>>>>> represent them. This was Subutai's implementation of Jeff's >>>>>>>>>>>>> general idea >>>>>>>>>>>>> for the encoder. This design has a memory usage for the encoder >>>>>>>>>>>>> that >>>>>>>>>>>>> increases linearly with the number of buckets that it has to >>>>>>>>>>>>> represent. >>>>>>>>>>>>> >>>>>>>>>>>>> When originally discussing the design, I was trying to find a >>>>>>>>>>>>> way to statically compute the mapping so that you don't have to >>>>>>>>>>>>> store >>>>>>>>>>>>> anything. But it has to have the property that buckets i and j >>>>>>>>>>>>> have w-(j-i) >>>>>>>>>>>>> overlapping bits if j-i<w and also that a given index is never >>>>>>>>>>>>> assigned >>>>>>>>>>>>> multiple times to the same bucket. I came up with a solution but >>>>>>>>>>>>> it would >>>>>>>>>>>>> likely have more random collisions than Subutai's linear-memory >>>>>>>>>>>>> solution >>>>>>>>>>>>> because it was limited in the number of possibly combinations of >>>>>>>>>>>>> bits the >>>>>>>>>>>>> buckets could have. Curious if someone can come up with something >>>>>>>>>>>>> better! >>>>>>>>>>>>> >>>>>>>>>>>>> And be sure to watch Chetan's presentation on the RDSE that >>>>>>>>>>>>> Subutai designed and implemented for background. >>>>>>>>>>>>> >>>>>>>>>>>>> *Note: the current implementation is fine for all practical >>>>>>>>>>>>> scenarios so this is just a fun exercise for those interested* >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> nupic mailing list >>>>>>>>>>>>> [email protected] >>>>>>>>>>>>> >>>>>>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> nupic mailing list >>>>>>>>>>> [email protected] >>>>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> Fergal Byrne, Brenter IT >>>>>>>>>> >>>>>>>>>> <http://www.examsupport.ie>http://inbits.com - Better Living >>>>>>>>>> through Thoughtful Technology >>>>>>>>>> >>>>>>>>>> e:[email protected] t:+353 83 4214179 >>>>>>>>>> Formerly of Adnet [email protected] http://www.adnet.ie >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> nupic mailing list >>>>>>>>>> [email protected] >>>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> nupic mailing list >>>>>>>> [email protected] >>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Fergal Byrne, Brenter IT >>>>>>> >>>>>>> <http://www.examsupport.ie>http://inbits.com - Better Living >>>>>>> through Thoughtful Technology >>>>>>> >>>>>>> e:[email protected] t:+353 83 4214179 >>>>>>> Formerly of Adnet [email protected] http://www.adnet.ie >>>>>>> >>>>>>> _______________________________________________ >>>>>>> nupic mailing list >>>>>>> [email protected] >>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> nupic mailing list >>>>>> [email protected] >>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> nupic mailing list >>>>> [email protected] >>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>>> >>>>> >>>> _______________________________________________ >>>> nupic mailing list >>>> [email protected] >>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>> >>>> >>> >>> >>> -- >>> >>> Fergal Byrne, Brenter IT >>> >>> <http://www.examsupport.ie>http://inbits.com - Better Living through >>> Thoughtful Technology >>> >>> e:[email protected] t:+353 83 4214179 >>> Formerly of Adnet [email protected] http://www.adnet.ie >>> >>> _______________________________________________ >>> nupic mailing list >>> [email protected] >>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>> >>> >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> >> > > > -- > > Fergal Byrne, Brenter IT > > <http://www.examsupport.ie>http://inbits.com - Better Living through > Thoughtful Technology > > e:[email protected] t:+353 83 4214179 > Formerly of Adnet [email protected] http://www.adnet.ie > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > >
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
