Re: [nupic-dev] [project][BUG?] "Story teller" on Nupic, and randomizing selection of states with same probabilities in TP

Marek Otahal Sun, 17 Nov 2013 08:23:15 -0800

Right, I see. So for each new letter, I'd sample from the model according
to the probability distribution of the inference for current step. This
should give me more "fair" generation than simple 1-Best.


Btw, I hope Scott, Subutai or sb to weight in, I'd expect OPF model to have
such functionality, so not to reinvent the wheel.

Thanks a lot!


On Sun, Nov 17, 2013 at 5:11 PM, Fergal Byrne
<[email protected]>wrote:

> Hi Marek,
>
> No, I meant do it for every letter. To solve the problem of how to start a
> new sentence, gather statistics on the starting letters from your corpus,
> and use the same idea to select a starting letter.
>
>
> On Sun, Nov 17, 2013 at 3:58 PM, Marek Otahal <[email protected]>wrote:
>
>> Hi Fergal,
>>
>> thanks for your advice. If I understand you, you mean to apply this to
>> the "new sentence starts after reset" ? Because otherwise the flow is
>> driven by the memory of CLA.
>>
>> I will do that, if I dont find a better solution. (<= actually I think
>> your SP-enhanced will cut it! )
>> The thing I'd prefer would be some "educated guess", so the sequences
>> could make sense as they follow.
>>
>> Problem here is, the sequence of states is always "<RESET>, ??", so ?? is
>> simply statistics of the most common first-letters.
>>
>> Do you have this student's result's, so we can compare CLA?
>>
>> Btw, I'd like to hear your intake on the randomization of undecidable
>> states.. :)
>>
>> regards, Mark
>>
>>
>> On Sun, Nov 17, 2013 at 4:47 PM, Fergal Byrne <
>> [email protected]> wrote:
>>
>>> Hi Marek,
>>>
>>> This is great. One suggestion is to steal from one of Geoff Hinton's
>>> students, who did exactly the same letter-by-letter prediction. What he did
>>> was to take the predictions, let's say:
>>>
>>> d: 0.33
>>> t: 0.27
>>> e: 0.2
>>> f: 0.2
>>>
>>> And use a random generator to decide which of these to give it next, in
>>> proportion to their probabilities. So 1/3 of the time you give it a d etc.
>>>
>>>
>>>
>>>
>>> On Sun, Nov 17, 2013 at 3:05 PM, Marek Otahal <[email protected]>wrote:
>>>
>>>> Here;s illustrative output on running a "xAAA. xBBB" dataset.
>>>>
>>>> ====== Repeat #100 =======
>>>>
>>>> [991]    x ==> BBB|x    (0.50 | 0.50 | 0.50 | 1.00 | 1.00)
>>>> <<<<<learning correctly
>>>> [992]    A ==> AA|xB    (0.88 | 0.78 | 0.78 | 0.78 | 1.00)
>>>> [993]    A ==> A|xBB    (0.92 | 0.81 | 0.81 | 0.89 | 1.00)
>>>> [994]    A ==> |xBBB    (0.80 | 0.80 | 0.80 | 0.88 | 1.00)
>>>> [995]    | ==> xBBB|    (1.00 | 0.92 | 0.92 | 0.92 | 1.00)
>>>> DEBUG:  Result of PyRegion::executeCommand : 'None'
>>>> reset
>>>> [996]    x ==> AAA|x    (0.50 | 0.50 | 0.50 | 1.00 | 1.00)
>>>> <<<<<<learning correctly
>>>> [997]    B ==> BB|xA    (0.94 | 0.89 | 0.89 | 0.89 | 1.00)
>>>> [998]    B ==> B|xAA    (0.91 | 0.85 | 0.85 | 0.94 | 1.00)
>>>> [999]    B ==> |xAAA    (0.85 | 0.85 | 0.85 | 0.94 | 1.00)
>>>> [1000]   | ==> xAAA|    (1.00 | 0.91 | 0.91 | 0.91 | 1.00)
>>>> DEBUG:  Result of PyRegion::executeCommand : 'None'
>>>> reset
>>>> ==========================================
>>>> Welcome young adventurer, let me tell you a story!
>>>> Enter story start (QUIT to go to work): x
>>>> x x B B B   <<<<interpretation is always same!!
>>>>
>>>>
>>>> x B B B
>>>>
>>>> Enter story start (QUIT to go to work): x
>>>> x x B B B
>>>>
>>>>
>>>> x B B B
>>>>
>>>> Enter story start (QUIT to go to work): x
>>>> x x B B B
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Nov 17, 2013 at 4:01 PM, Marek Otahal <[email protected]>wrote:
>>>>
>>>>> I've added an "interactive" feature to Chetan's Linguist
>>>>> https://github.com/chetan51/linguist - a story teller mode.
>>>>>
>>>>> It will (more or less) memorize the given text and then let you type
>>>>> starting words (ie "So he ") and follow up on its own to complete the
>>>>> sentence(s).
>>>>>
>>>>> ---------------------------------
>>>>> Yet there's a problem.
>>>>>
>>>>> I'll describe the project briefly, it uses TP to learn texts as a
>>>>> sequence(s) of letters.
>>>>>
>>>>> First it used to memorize whole text as one long sequence, this worked
>>>>> for smaller datasets, but for bigger, the accuracy went down quickly.
>>>>>
>>>>> I decided to simplify and separate text to separate sequences and
>>>>> reset the sequence memory of the temporal pooler at the end of each
>>>>> sentence. This greatly improved prediction probabilities as sequences are
>>>>> much shorter (avg sentence lenght (+-30chars) vs dataset len (hundreds -
>>>>> thousands chars)).
>>>>>
>>>>> The problem is, after the first end of sequence, there's no "flow" (I
>>>>> know, I've called a reset(), what could I expect ;) ), so a state with
>>>>> highest statistical probability is selected (always the same!)
>>>>>
>>>>> example dataset: "
>>>>> How are you?
>>>>> I'm fine.
>>>>> I'm tired.
>>>>> Yayyyyy!"
>>>>>
>>>>> So when you start "Ho"..it'll correctly follow.."w are you?" "I'm
>>>>> fine" "I'm fine" "I'm fine"...forever.
>>>>>
>>>>> The "I'm fine" is fine :) as from a new state it's the most probable
>>>>> choice (2 out of 4). But it doesn't look good.
>>>>>
>>>>> I;ve come with 2 solutions:
>>>>>  # Idea1:
>>>>>  after seq reset in the generation mode, randomly generate the first
>>>>> char manually, feed it to TP and let it follow...
>>>>>  should work: OK, principle: so-so.
>>>>>
>>>>>  #Idea2:
>>>>>  even though I trained with a reset (=new unknown state) after each
>>>>> sentence end, can I now somehow keep the flow spanning over more 
>>>>> sentences?
>>>>>
>>>>>
>>>>> Last but not least, the bug!
>>>>> The bug is in (CLA)model's result.inferences['prediction']
>>>>> By definition, this field should return the most probable state from
>>>>> the inference. But what if there are two+ most probable states? I believe
>>>>> we should go random.
>>>>>
>>>>> While for debuging the fixt order is convenient, the random order
>>>>> seems natural. I believe it would fix my problem with repetitive "Im fine"
>>>>> above too. (kindof)
>>>>>
>>>>> Proposed solution, if you agree, we;ll add init() parameter
>>>>> debug=False which will keep the fixed ordering if needed, and by default,
>>>>> do random on same probable states.
>>>>>
>>>>> Thanks for reading :)
>>>>> mark
>>>>> --
>>>>> Marek Otahal :o)
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Marek Otahal :o)
>>>>
>>>> _______________________________________________
>>>> nupic mailing list
>>>> [email protected]
>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Fergal Byrne, Brenter IT
>>>
>>> <http://www.examsupport.ie>http://inbits.com - Better Living through
>>> Thoughtful Technology
>>>
>>> e:[email protected] t:+353 83 4214179
>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>>
>> --
>> Marek Otahal :o)
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
>
> --
>
> Fergal Byrne, Brenter IT
>
> <http://www.examsupport.ie>http://inbits.com - Better Living through
> Thoughtful Technology
>
> e:[email protected] t:+353 83 4214179
> Formerly of Adnet [email protected] http://www.adnet.ie
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>


-- 
Marek Otahal :o)

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] [project][BUG?] "Story teller" on Nupic, and randomizing selection of states with same probabilities in TP

Reply via email to