Hi Fergal,

That's a nice property (not having to save buckets). You're right that once
you fix the random seed everything else is deterministic.  The NuPIC
implementation does require loading/saving buckets each time to load/save
the network.

--Subutai


On Tue, Feb 18, 2014 at 12:20 PM, Fergal Byrne
<[email protected]>wrote:

> Yes, but then you'd have to save all your buckets between runs of the
> encoder, and reload them each time you use it. The SP is depending on those
> encodings never changing, and for example if you decided to aggregate
> values, or choose every nth one, you'd break the constancy of the encoding.
>
> My workaround means that every encoding is identical no matter how you
> give it data - it reconstructs all needed buckets the first time you give
> it data, and continues to do so for every new one. The random number
> generator provides everything you need to rebuild the encoding, so saving
> or reloading necessary.
>
> Regards
>
> Fergal Byrne
>
>
> On Tue, Feb 18, 2014 at 8:13 PM, Chetan Surpur <[email protected]> wrote:
>
>> As long as the encoder maintains the mappings for the currently-existing
>> buckets, why must the new encodings be independent of the order of
>> presentation of the data?
>>
>>
>> On Tue, Feb 18, 2014 at 12:10 PM, Fergal Byrne <
>> [email protected]> wrote:
>>
>>> Hi Chetan,
>>>
>>> No, but the encodings should always be independent of the order of
>>> presentation of the data, so it's a bug if they're not.  My code includes a
>>> workaround which builds buckets out in both directions from a predefined
>>> centre value until it encompasses each new value. This guarantees the same
>>> encoding regardless of which values come in when. You could easily add a
>>> version of this to your encoder, it's a small overhead for ensuring
>>> identical encodings.
>>>
>>> This is the C4 idea in action - argue with patches... Just shows you how
>>> useful an executable document can be when you're experimenting!
>>>
>>> Regards,
>>>
>>> Fergal
>>>
>>>
>>> On Tue, Feb 18, 2014 at 6:55 PM, Chetan Surpur <[email protected]>wrote:
>>>
>>>> Oh, my mistake, I misunderstood the question. I thought Fergal was
>>>> asking if the order has to be presented in a certain order to get correct
>>>> results (results that have the desired overlap properties).
>>>>
>>>> So yes, order dependence exists in the currently implemented encoder,
>>>> but it shouldn't affect correctness.
>>>> On Feb 18, 2014 10:51 AM, "Scott Purdy" <[email protected]> wrote:
>>>>
>>>>> Fergal, I believe the implementation in NuPIC is dependent on the
>>>>> order of data. Why do you ask? The constant-memory design I have brought 
>>>>> up
>>>>> here do not exist in NuPIC.
>>>>>
>>>>> Chetan, you are right that it extends separately in each directly but
>>>>> I believe that the randomness is shared so the order of the data would
>>>>> affect it. It wouldn't be difficult to change that though. But it also
>>>>> doesn't really solve any problems.
>>>>>
>>>>>
>>>>> On Tue, Feb 18, 2014 at 10:39 AM, Chetan Surpur <[email protected]>wrote:
>>>>>
>>>>>> From reading the code, it looks to me that the generation of buckets
>>>>>> happens on the left and right boundaries of the currently-existing 
>>>>>> buckets,
>>>>>> and extends the boundaries to create buckets as necessary. Thus, it
>>>>>> shouldn't matter what order the data is presented. Subutai can correct me
>>>>>> if I'm mistaken.
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 18, 2014 at 10:35 AM, Fergal Byrne <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Cheers Scott.
>>>>>>>
>>>>>>> Have you checked the NuPIC implementation for dependence on the
>>>>>>> order of data presented? My python isn't up to that ;{
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Fergal Byrne
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 18, 2014 at 5:37 PM, Scott Purdy <[email protected]>wrote:
>>>>>>>
>>>>>>>> Oh and Chetan's proposal is good but it still has memory and time
>>>>>>>> constraints that are linear with the number of buckets (but it doesn't 
>>>>>>>> have
>>>>>>>> to keep the memory in use in between invocations).
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Feb 18, 2014 at 9:36 AM, Scott Purdy <[email protected]>wrote:
>>>>>>>>
>>>>>>>>> Thanks for the details on your implementation Fergal!
>>>>>>>>>
>>>>>>>>> Just to be clear, it is possible to create a constant memory and
>>>>>>>>> constant time solution (assuming fixed w). The one that I came up 
>>>>>>>>> with does
>>>>>>>>> not use all nCw combinations of active bits though. Instead it uses 
>>>>>>>>> (n/2)Cw
>>>>>>>>> * (n/2)Cw.
>>>>>>>>>
>>>>>>>>> I am hoping someone can find a solution with the same time/memory
>>>>>>>>> bounds but with a higher entropy solution. IE one that will have 
>>>>>>>>> random
>>>>>>>>> collisions less frequently.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Feb 18, 2014 at 1:01 AM, Fergal Byrne <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Scott, Chetan,
>>>>>>>>>>
>>>>>>>>>> Great thought experiment, Scott.
>>>>>>>>>>
>>>>>>>>>> Someone looking at my Clojure code [1] (a Clojure expert, not one
>>>>>>>>>> on NuPIC) had some issues with my using a mutable data structure (ie
>>>>>>>>>> memory) so much. I looked at ways of eliminating it, but there's no 
>>>>>>>>>> simple
>>>>>>>>>> way to do it unless you know that you'll never go backwards on the 
>>>>>>>>>> number
>>>>>>>>>> line. This also means that the encodings are dependent on the order 
>>>>>>>>>> in
>>>>>>>>>> which you present the data to the encoder.
>>>>>>>>>>
>>>>>>>>>> For example, let's say the first value encoded is 0 (without loss
>>>>>>>>>> of generality). If you have to encode 1 next, it will get the second 
>>>>>>>>>> code,
>>>>>>>>>> 2 will get the next one, and so on. But if -1 is provided after 1, 
>>>>>>>>>> it'll
>>>>>>>>>> get the next code (or at least some distortion of it) and thus 2 
>>>>>>>>>> will be
>>>>>>>>>> encoded differently.
>>>>>>>>>>
>>>>>>>>>> This means that every encoder must remember its buckets in order
>>>>>>>>>> to give back the same encoding for previously computed values, or 
>>>>>>>>>> else
>>>>>>>>>> remember the entire sequence of values and rerun their computations 
>>>>>>>>>> each
>>>>>>>>>> time (which may cost more memory if many values per bucket must be 
>>>>>>>>>> stored).
>>>>>>>>>>
>>>>>>>>>> I've added a test/demo for this to my document.
>>>>>>>>>>
>>>>>>>>>> Update: If you decide on 0 as a centre, you can precalculate
>>>>>>>>>> bands of buckets out to your first data value (and repeat this for 
>>>>>>>>>> each new
>>>>>>>>>> one), which ensures the encoding is always the same:
>>>>>>>>>>
>>>>>>>>>> e.g given 22 as the first datum, generate buckets for -10...10,
>>>>>>>>>> -20...20, -30...30 and return encoding(22).
>>>>>>>>>>
>>>>>>>>>> You could choose a different centre if you know more about your
>>>>>>>>>> data. I've detailed this idea in the doc.
>>>>>>>>>>
>>>>>>>>>> [1] http://fergalbyrne.github.io/rdse.html
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>>
>>>>>>>>>> Fergal Byrne
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 18, 2014 at 6:06 AM, Chetan Surpur <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> For this problem, this looks useful:
>>>>>>>>>>> http://en.wikipedia.org/wiki/Linear_feedback_shift_register
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 17, 2014 at 6:01 PM, Chetan Surpur <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> A very simple approach would be to trade speed for memory.
>>>>>>>>>>>> Instead of storing a map between buckets and SDRs, we can go 
>>>>>>>>>>>> through the
>>>>>>>>>>>> bucket generation process every time we want to find the SDR for a 
>>>>>>>>>>>> bucket.
>>>>>>>>>>>> From what I understand, this bucket generation process is linear 
>>>>>>>>>>>> in speed
>>>>>>>>>>>> with the number of buckets you want to generate. So the linear 
>>>>>>>>>>>> memory
>>>>>>>>>>>> requirement would be translated into a linear speed requirement.
>>>>>>>>>>>>
>>>>>>>>>>>> In a nutshell, walk through the number line, generating buckets
>>>>>>>>>>>> until you hit the target bucket you want a representation for, 
>>>>>>>>>>>> *every
>>>>>>>>>>>> time* you want to get a representation, and don't store
>>>>>>>>>>>> anything. You'll need to use the same seed for the random number 
>>>>>>>>>>>> generator
>>>>>>>>>>>> though, to get consistent results.
>>>>>>>>>>>>
>>>>>>>>>>>> The advantage of this is that it's a simple modification to
>>>>>>>>>>>> what is already implemented. On the other hand, it's slightly 
>>>>>>>>>>>> slower when
>>>>>>>>>>>> outputting SDRs for previously-seen buckets.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Feb 17, 2014 at 5:15 PM, Scott Purdy <[email protected]
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all, I thought some of you might enjoy trying to come up
>>>>>>>>>>>>> with a solution for this problem. If you watch Chetan's 
>>>>>>>>>>>>> presentation about
>>>>>>>>>>>>> the random distributed scalar encoder (RDSE), you will see that 
>>>>>>>>>>>>> we are
>>>>>>>>>>>>> keeping a mapping between all buckets computed so far and the 
>>>>>>>>>>>>> bits that
>>>>>>>>>>>>> represent them. This was Subutai's implementation of Jeff's 
>>>>>>>>>>>>> general idea
>>>>>>>>>>>>> for the encoder. This design has a memory usage for the encoder 
>>>>>>>>>>>>> that
>>>>>>>>>>>>> increases linearly with the number of buckets that it has to 
>>>>>>>>>>>>> represent.
>>>>>>>>>>>>>
>>>>>>>>>>>>> When originally discussing the design, I was trying to find a
>>>>>>>>>>>>> way to statically compute the mapping so that you don't have to 
>>>>>>>>>>>>> store
>>>>>>>>>>>>> anything. But it has to have the property that buckets i and j 
>>>>>>>>>>>>> have w-(j-i)
>>>>>>>>>>>>> overlapping bits if j-i<w and also that a given index is never 
>>>>>>>>>>>>> assigned
>>>>>>>>>>>>> multiple times to the same bucket. I came up with a solution but 
>>>>>>>>>>>>> it would
>>>>>>>>>>>>> likely have more random collisions than Subutai's linear-memory 
>>>>>>>>>>>>> solution
>>>>>>>>>>>>> because it was limited in the number of possibly combinations of 
>>>>>>>>>>>>> bits the
>>>>>>>>>>>>> buckets could have. Curious if someone can come up with something 
>>>>>>>>>>>>> better!
>>>>>>>>>>>>>
>>>>>>>>>>>>> And be sure to watch Chetan's presentation on the RDSE that
>>>>>>>>>>>>> Subutai designed and implemented for background.
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Note: the current implementation is fine for all practical
>>>>>>>>>>>>> scenarios so this is just a fun exercise for those interested*
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> nupic mailing list
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> nupic mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Fergal Byrne, Brenter IT
>>>>>>>>>>
>>>>>>>>>> <http://www.examsupport.ie>http://inbits.com - Better Living
>>>>>>>>>> through Thoughtful Technology
>>>>>>>>>>
>>>>>>>>>> e:[email protected] t:+353 83 4214179
>>>>>>>>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> nupic mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> nupic mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Fergal Byrne, Brenter IT
>>>>>>>
>>>>>>> <http://www.examsupport.ie>http://inbits.com - Better Living
>>>>>>> through Thoughtful Technology
>>>>>>>
>>>>>>> e:[email protected] t:+353 83 4214179
>>>>>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> nupic mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> nupic mailing list
>>>>>> [email protected]
>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> nupic mailing list
>>>>> [email protected]
>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>
>>>>>
>>>> _______________________________________________
>>>> nupic mailing list
>>>> [email protected]
>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Fergal Byrne, Brenter IT
>>>
>>> <http://www.examsupport.ie>http://inbits.com - Better Living through
>>> Thoughtful Technology
>>>
>>> e:[email protected] t:+353 83 4214179
>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
>
> --
>
> Fergal Byrne, Brenter IT
>
> <http://www.examsupport.ie>http://inbits.com - Better Living through
> Thoughtful Technology
>
> e:[email protected] t:+353 83 4214179
> Formerly of Adnet [email protected] http://www.adnet.ie
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to