Re: [nupic-discuss] Fun challenge: constant memory RDSE

Fergal Byrne Tue, 18 Feb 2014 12:11:34 -0800

Hi Chetan,

No, but the encodings should always be independent of the order of
presentation of the data, so it's a bug if they're not.  My code includes a
workaround which builds buckets out in both directions from a predefined
centre value until it encompasses each new value. This guarantees the same
encoding regardless of which values come in when. You could easily add a
version of this to your encoder, it's a small overhead for ensuring
identical encodings.


This is the C4 idea in action - argue with patches... Just shows you how
useful an executable document can be when you're experimenting!

Regards,

Fergal


On Tue, Feb 18, 2014 at 6:55 PM, Chetan Surpur <[email protected]> wrote:

> Oh, my mistake, I misunderstood the question. I thought Fergal was asking
> if the order has to be presented in a certain order to get correct results
> (results that have the desired overlap properties).
>
> So yes, order dependence exists in the currently implemented encoder, but
> it shouldn't affect correctness.
> On Feb 18, 2014 10:51 AM, "Scott Purdy" <[email protected]> wrote:
>
>> Fergal, I believe the implementation in NuPIC is dependent on the order
>> of data. Why do you ask? The constant-memory design I have brought up here
>> do not exist in NuPIC.
>>
>> Chetan, you are right that it extends separately in each directly but I
>> believe that the randomness is shared so the order of the data would affect
>> it. It wouldn't be difficult to change that though. But it also doesn't
>> really solve any problems.
>>
>>
>> On Tue, Feb 18, 2014 at 10:39 AM, Chetan Surpur <[email protected]>wrote:
>>
>>> From reading the code, it looks to me that the generation of buckets
>>> happens on the left and right boundaries of the currently-existing buckets,
>>> and extends the boundaries to create buckets as necessary. Thus, it
>>> shouldn't matter what order the data is presented. Subutai can correct me
>>> if I'm mistaken.
>>>
>>>
>>> On Tue, Feb 18, 2014 at 10:35 AM, Fergal Byrne <
>>> [email protected]> wrote:
>>>
>>>> Cheers Scott.
>>>>
>>>> Have you checked the NuPIC implementation for dependence on the order
>>>> of data presented? My python isn't up to that ;{
>>>>
>>>> Regards
>>>>
>>>> Fergal Byrne
>>>>
>>>>
>>>> On Tue, Feb 18, 2014 at 5:37 PM, Scott Purdy <[email protected]> wrote:
>>>>
>>>>> Oh and Chetan's proposal is good but it still has memory and time
>>>>> constraints that are linear with the number of buckets (but it doesn't 
>>>>> have
>>>>> to keep the memory in use in between invocations).
>>>>>
>>>>>
>>>>> On Tue, Feb 18, 2014 at 9:36 AM, Scott Purdy <[email protected]>wrote:
>>>>>
>>>>>> Thanks for the details on your implementation Fergal!
>>>>>>
>>>>>> Just to be clear, it is possible to create a constant memory and
>>>>>> constant time solution (assuming fixed w). The one that I came up with 
>>>>>> does
>>>>>> not use all nCw combinations of active bits though. Instead it uses 
>>>>>> (n/2)Cw
>>>>>> * (n/2)Cw.
>>>>>>
>>>>>> I am hoping someone can find a solution with the same time/memory
>>>>>> bounds but with a higher entropy solution. IE one that will have random
>>>>>> collisions less frequently.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 18, 2014 at 1:01 AM, Fergal Byrne <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Scott, Chetan,
>>>>>>>
>>>>>>> Great thought experiment, Scott.
>>>>>>>
>>>>>>> Someone looking at my Clojure code [1] (a Clojure expert, not one on
>>>>>>> NuPIC) had some issues with my using a mutable data structure (ie 
>>>>>>> memory)
>>>>>>> so much. I looked at ways of eliminating it, but there's no simple way 
>>>>>>> to
>>>>>>> do it unless you know that you'll never go backwards on the number line.
>>>>>>> This also means that the encodings are dependent on the order in which 
>>>>>>> you
>>>>>>> present the data to the encoder.
>>>>>>>
>>>>>>> For example, let's say the first value encoded is 0 (without loss of
>>>>>>> generality). If you have to encode 1 next, it will get the second code, 
>>>>>>> 2
>>>>>>> will get the next one, and so on. But if -1 is provided after 1, it'll 
>>>>>>> get
>>>>>>> the next code (or at least some distortion of it) and thus 2 will be
>>>>>>> encoded differently.
>>>>>>>
>>>>>>> This means that every encoder must remember its buckets in order to
>>>>>>> give back the same encoding for previously computed values, or else
>>>>>>> remember the entire sequence of values and rerun their computations each
>>>>>>> time (which may cost more memory if many values per bucket must be 
>>>>>>> stored).
>>>>>>>
>>>>>>> I've added a test/demo for this to my document.
>>>>>>>
>>>>>>> Update: If you decide on 0 as a centre, you can precalculate bands
>>>>>>> of buckets out to your first data value (and repeat this for each new 
>>>>>>> one),
>>>>>>> which ensures the encoding is always the same:
>>>>>>>
>>>>>>> e.g given 22 as the first datum, generate buckets for -10...10,
>>>>>>> -20...20, -30...30 and return encoding(22).
>>>>>>>
>>>>>>> You could choose a different centre if you know more about your
>>>>>>> data. I've detailed this idea in the doc.
>>>>>>>
>>>>>>> [1] http://fergalbyrne.github.io/rdse.html
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> Fergal Byrne
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 18, 2014 at 6:06 AM, Chetan Surpur 
>>>>>>> <[email protected]>wrote:
>>>>>>>
>>>>>>>> For this problem, this looks useful:
>>>>>>>> http://en.wikipedia.org/wiki/Linear_feedback_shift_register
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 17, 2014 at 6:01 PM, Chetan Surpur 
>>>>>>>> <[email protected]>wrote:
>>>>>>>>
>>>>>>>>> A very simple approach would be to trade speed for memory. Instead
>>>>>>>>> of storing a map between buckets and SDRs, we can go through the 
>>>>>>>>> bucket
>>>>>>>>> generation process every time we want to find the SDR for a bucket. 
>>>>>>>>> From
>>>>>>>>> what I understand, this bucket generation process is linear in speed 
>>>>>>>>> with
>>>>>>>>> the number of buckets you want to generate. So the linear memory
>>>>>>>>> requirement would be translated into a linear speed requirement.
>>>>>>>>>
>>>>>>>>> In a nutshell, walk through the number line, generating buckets
>>>>>>>>> until you hit the target bucket you want a representation for, *every
>>>>>>>>> time* you want to get a representation, and don't store anything.
>>>>>>>>> You'll need to use the same seed for the random number generator 
>>>>>>>>> though, to
>>>>>>>>> get consistent results.
>>>>>>>>>
>>>>>>>>> The advantage of this is that it's a simple modification to what
>>>>>>>>> is already implemented. On the other hand, it's slightly slower when
>>>>>>>>> outputting SDRs for previously-seen buckets.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Feb 17, 2014 at 5:15 PM, Scott Purdy <[email protected]>wrote:
>>>>>>>>>
>>>>>>>>>> Hi all, I thought some of you might enjoy trying to come up with
>>>>>>>>>> a solution for this problem. If you watch Chetan's presentation 
>>>>>>>>>> about the
>>>>>>>>>> random distributed scalar encoder (RDSE), you will see that we are 
>>>>>>>>>> keeping
>>>>>>>>>> a mapping between all buckets computed so far and the bits that 
>>>>>>>>>> represent
>>>>>>>>>> them. This was Subutai's implementation of Jeff's general idea for 
>>>>>>>>>> the
>>>>>>>>>> encoder. This design has a memory usage for the encoder that 
>>>>>>>>>> increases
>>>>>>>>>> linearly with the number of buckets that it has to represent.
>>>>>>>>>>
>>>>>>>>>> When originally discussing the design, I was trying to find a way
>>>>>>>>>> to statically compute the mapping so that you don't have to store 
>>>>>>>>>> anything.
>>>>>>>>>> But it has to have the property that buckets i and j have w-(j-i)
>>>>>>>>>> overlapping bits if j-i<w and also that a given index is never 
>>>>>>>>>> assigned
>>>>>>>>>> multiple times to the same bucket. I came up with a solution but it 
>>>>>>>>>> would
>>>>>>>>>> likely have more random collisions than Subutai's linear-memory 
>>>>>>>>>> solution
>>>>>>>>>> because it was limited in the number of possibly combinations of 
>>>>>>>>>> bits the
>>>>>>>>>> buckets could have. Curious if someone can come up with something 
>>>>>>>>>> better!
>>>>>>>>>>
>>>>>>>>>> And be sure to watch Chetan's presentation on the RDSE that
>>>>>>>>>> Subutai designed and implemented for background.
>>>>>>>>>>
>>>>>>>>>> *Note: the current implementation is fine for all practical
>>>>>>>>>> scenarios so this is just a fun exercise for those interested*
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> nupic mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> nupic mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Fergal Byrne, Brenter IT
>>>>>>>
>>>>>>> <http://www.examsupport.ie>http://inbits.com - Better Living
>>>>>>> through Thoughtful Technology
>>>>>>>
>>>>>>> e:[email protected] t:+353 83 4214179
>>>>>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> nupic mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> nupic mailing list
>>>>> [email protected]
>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Fergal Byrne, Brenter IT
>>>>
>>>> <http://www.examsupport.ie>http://inbits.com - Better Living through
>>>> Thoughtful Technology
>>>>
>>>> e:[email protected] t:+353 83 4214179
>>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>>
>>>> _______________________________________________
>>>> nupic mailing list
>>>> [email protected]
>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>
>>>>
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>


-- 

Fergal Byrne, Brenter IT

<http://www.examsupport.ie>http://inbits.com - Better Living through
Thoughtful Technology

e:[email protected] t:+353 83 4214179
Formerly of Adnet [email protected] http://www.adnet.ie

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-discuss] Fun challenge: constant memory RDSE

Reply via email to