Re: [nupic-discuss] Fun challenge: constant memory RDSE

Chetan Surpur Tue, 18 Feb 2014 10:41:01 -0800

>From reading the code, it looks to me that the generation of buckets
happens on the left and right boundaries of the currently-existing buckets,
and extends the boundaries to create buckets as necessary. Thus, it
shouldn't matter what order the data is presented. Subutai can correct me
if I'm mistaken.



On Tue, Feb 18, 2014 at 10:35 AM, Fergal Byrne
<[email protected]>wrote:

> Cheers Scott.
>
> Have you checked the NuPIC implementation for dependence on the order of
> data presented? My python isn't up to that ;{
>
> Regards
>
> Fergal Byrne
>
>
> On Tue, Feb 18, 2014 at 5:37 PM, Scott Purdy <[email protected]> wrote:
>
>> Oh and Chetan's proposal is good but it still has memory and time
>> constraints that are linear with the number of buckets (but it doesn't have
>> to keep the memory in use in between invocations).
>>
>>
>> On Tue, Feb 18, 2014 at 9:36 AM, Scott Purdy <[email protected]> wrote:
>>
>>> Thanks for the details on your implementation Fergal!
>>>
>>> Just to be clear, it is possible to create a constant memory and
>>> constant time solution (assuming fixed w). The one that I came up with does
>>> not use all nCw combinations of active bits though. Instead it uses (n/2)Cw
>>> * (n/2)Cw.
>>>
>>> I am hoping someone can find a solution with the same time/memory bounds
>>> but with a higher entropy solution. IE one that will have random collisions
>>> less frequently.
>>>
>>>
>>>
>>> On Tue, Feb 18, 2014 at 1:01 AM, Fergal Byrne <
>>> [email protected]> wrote:
>>>
>>>> Hi Scott, Chetan,
>>>>
>>>> Great thought experiment, Scott.
>>>>
>>>> Someone looking at my Clojure code [1] (a Clojure expert, not one on
>>>> NuPIC) had some issues with my using a mutable data structure (ie memory)
>>>> so much. I looked at ways of eliminating it, but there's no simple way to
>>>> do it unless you know that you'll never go backwards on the number line.
>>>> This also means that the encodings are dependent on the order in which you
>>>> present the data to the encoder.
>>>>
>>>> For example, let's say the first value encoded is 0 (without loss of
>>>> generality). If you have to encode 1 next, it will get the second code, 2
>>>> will get the next one, and so on. But if -1 is provided after 1, it'll get
>>>> the next code (or at least some distortion of it) and thus 2 will be
>>>> encoded differently.
>>>>
>>>> This means that every encoder must remember its buckets in order to
>>>> give back the same encoding for previously computed values, or else
>>>> remember the entire sequence of values and rerun their computations each
>>>> time (which may cost more memory if many values per bucket must be stored).
>>>>
>>>> I've added a test/demo for this to my document.
>>>>
>>>> Update: If you decide on 0 as a centre, you can precalculate bands of
>>>> buckets out to your first data value (and repeat this for each new one),
>>>> which ensures the encoding is always the same:
>>>>
>>>> e.g given 22 as the first datum, generate buckets for -10...10,
>>>> -20...20, -30...30 and return encoding(22).
>>>>
>>>> You could choose a different centre if you know more about your data.
>>>> I've detailed this idea in the doc.
>>>>
>>>> [1] http://fergalbyrne.github.io/rdse.html
>>>>
>>>> Regards
>>>>
>>>> Fergal Byrne
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Feb 18, 2014 at 6:06 AM, Chetan Surpur <[email protected]>wrote:
>>>>
>>>>> For this problem, this looks useful:
>>>>> http://en.wikipedia.org/wiki/Linear_feedback_shift_register
>>>>>
>>>>>
>>>>> On Mon, Feb 17, 2014 at 6:01 PM, Chetan Surpur <[email protected]>wrote:
>>>>>
>>>>>> A very simple approach would be to trade speed for memory. Instead of
>>>>>> storing a map between buckets and SDRs, we can go through the bucket
>>>>>> generation process every time we want to find the SDR for a bucket. From
>>>>>> what I understand, this bucket generation process is linear in speed with
>>>>>> the number of buckets you want to generate. So the linear memory
>>>>>> requirement would be translated into a linear speed requirement.
>>>>>>
>>>>>> In a nutshell, walk through the number line, generating buckets until
>>>>>> you hit the target bucket you want a representation for, *every time* you
>>>>>> want to get a representation, and don't store anything. You'll need to 
>>>>>> use
>>>>>> the same seed for the random number generator though, to get consistent
>>>>>> results.
>>>>>>
>>>>>> The advantage of this is that it's a simple modification to what is
>>>>>> already implemented. On the other hand, it's slightly slower when
>>>>>> outputting SDRs for previously-seen buckets.
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 17, 2014 at 5:15 PM, Scott Purdy <[email protected]>wrote:
>>>>>>
>>>>>>> Hi all, I thought some of you might enjoy trying to come up with a
>>>>>>> solution for this problem. If you watch Chetan's presentation about the
>>>>>>> random distributed scalar encoder (RDSE), you will see that we are 
>>>>>>> keeping
>>>>>>> a mapping between all buckets computed so far and the bits that 
>>>>>>> represent
>>>>>>> them. This was Subutai's implementation of Jeff's general idea for the
>>>>>>> encoder. This design has a memory usage for the encoder that increases
>>>>>>> linearly with the number of buckets that it has to represent.
>>>>>>>
>>>>>>> When originally discussing the design, I was trying to find a way to
>>>>>>> statically compute the mapping so that you don't have to store anything.
>>>>>>> But it has to have the property that buckets i and j have w-(j-i)
>>>>>>> overlapping bits if j-i<w and also that a given index is never assigned
>>>>>>> multiple times to the same bucket. I came up with a solution but it 
>>>>>>> would
>>>>>>> likely have more random collisions than Subutai's linear-memory solution
>>>>>>> because it was limited in the number of possibly combinations of bits 
>>>>>>> the
>>>>>>> buckets could have. Curious if someone can come up with something 
>>>>>>> better!
>>>>>>>
>>>>>>> And be sure to watch Chetan's presentation on the RDSE that Subutai
>>>>>>> designed and implemented for background.
>>>>>>>
>>>>>>> *Note: the current implementation is fine for all practical
>>>>>>> scenarios so this is just a fun exercise for those interested*
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> nupic mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> nupic mailing list
>>>>> [email protected]
>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Fergal Byrne, Brenter IT
>>>>
>>>> <http://www.examsupport.ie>http://inbits.com - Better Living through
>>>> Thoughtful Technology
>>>>
>>>> e:[email protected] t:+353 83 4214179
>>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>>
>>>> _______________________________________________
>>>> nupic mailing list
>>>> [email protected]
>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
>
> --
>
> Fergal Byrne, Brenter IT
>
> <http://www.examsupport.ie>http://inbits.com - Better Living through
> Thoughtful Technology
>
> e:[email protected] t:+353 83 4214179
> Formerly of Adnet [email protected] http://www.adnet.ie
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-discuss] Fun challenge: constant memory RDSE

Reply via email to