Re: [nupic-dev] [project][BUG?] "Story teller" on Nupic, and randomizing selection of states with same probabilities in TP

Chetan Surpur Sun, 17 Nov 2013 08:43:59 -0800

Yes, this is what I would suggest as well. I believe the OPF CLA model already 
offers multiple possible inferences with a probability on each one.





Another thing you could try to make it work even better is, instead of 
resetting the sequence as soon as you see a sentence terminating symbol, set a 
timer to reset after seeing n more characters. Then, your model will understand 
what comes after each type of sentence terminator, and since it will include 
context about the characters near the end of the sentence, the next sentence 
prediction won't be the same every time.

On Sun, Nov 17, 2013 at 8:22 AM, Marek Otahal <[email protected]>
wrote:

> Right, I see. So for each new letter, I'd sample from the model according
> to the probability distribution of the inference for current step. This
> should give me more "fair" generation than simple 1-Best.
> Btw, I hope Scott, Subutai or sb to weight in, I'd expect OPF model to have
> such functionality, so not to reinvent the wheel.
> Thanks a lot!
> On Sun, Nov 17, 2013 at 5:11 PM, Fergal Byrne
> <[email protected]>wrote:
>> Hi Marek,
>>
>> No, I meant do it for every letter. To solve the problem of how to start a
>> new sentence, gather statistics on the starting letters from your corpus,
>> and use the same idea to select a starting letter.
>>
>>
>> On Sun, Nov 17, 2013 at 3:58 PM, Marek Otahal <[email protected]>wrote:
>>
>>> Hi Fergal,
>>>
>>> thanks for your advice. If I understand you, you mean to apply this to
>>> the "new sentence starts after reset" ? Because otherwise the flow is
>>> driven by the memory of CLA.
>>>
>>> I will do that, if I dont find a better solution. (<= actually I think
>>> your SP-enhanced will cut it! )
>>> The thing I'd prefer would be some "educated guess", so the sequences
>>> could make sense as they follow.
>>>
>>> Problem here is, the sequence of states is always "<RESET>, ??", so ?? is
>>> simply statistics of the most common first-letters.
>>>
>>> Do you have this student's result's, so we can compare CLA?
>>>
>>> Btw, I'd like to hear your intake on the randomization of undecidable
>>> states.. :)
>>>
>>> regards, Mark
>>>
>>>
>>> On Sun, Nov 17, 2013 at 4:47 PM, Fergal Byrne <
>>> [email protected]> wrote:
>>>
>>>> Hi Marek,
>>>>
>>>> This is great. One suggestion is to steal from one of Geoff Hinton's
>>>> students, who did exactly the same letter-by-letter prediction. What he did
>>>> was to take the predictions, let's say:
>>>>
>>>> d: 0.33
>>>> t: 0.27
>>>> e: 0.2
>>>> f: 0.2
>>>>
>>>> And use a random generator to decide which of these to give it next, in
>>>> proportion to their probabilities. So 1/3 of the time you give it a d etc.
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Nov 17, 2013 at 3:05 PM, Marek Otahal <[email protected]>wrote:
>>>>
>>>>> Here;s illustrative output on running a "xAAA. xBBB" dataset.
>>>>>
>>>>> ====== Repeat #100 =======
>>>>>
>>>>> [991]    x ==> BBB|x    (0.50 | 0.50 | 0.50 | 1.00 | 1.00)
>>>>> <<<<<learning correctly
>>>>> [992]    A ==> AA|xB    (0.88 | 0.78 | 0.78 | 0.78 | 1.00)
>>>>> [993]    A ==> A|xBB    (0.92 | 0.81 | 0.81 | 0.89 | 1.00)
>>>>> [994]    A ==> |xBBB    (0.80 | 0.80 | 0.80 | 0.88 | 1.00)
>>>>> [995]    | ==> xBBB|    (1.00 | 0.92 | 0.92 | 0.92 | 1.00)
>>>>> DEBUG:  Result of PyRegion::executeCommand : 'None'
>>>>> reset
>>>>> [996]    x ==> AAA|x    (0.50 | 0.50 | 0.50 | 1.00 | 1.00)
>>>>> <<<<<<learning correctly
>>>>> [997]    B ==> BB|xA    (0.94 | 0.89 | 0.89 | 0.89 | 1.00)
>>>>> [998]    B ==> B|xAA    (0.91 | 0.85 | 0.85 | 0.94 | 1.00)
>>>>> [999]    B ==> |xAAA    (0.85 | 0.85 | 0.85 | 0.94 | 1.00)
>>>>> [1000]   | ==> xAAA|    (1.00 | 0.91 | 0.91 | 0.91 | 1.00)
>>>>> DEBUG:  Result of PyRegion::executeCommand : 'None'
>>>>> reset
>>>>> ==========================================
>>>>> Welcome young adventurer, let me tell you a story!
>>>>> Enter story start (QUIT to go to work): x
>>>>> x x B B B   <<<<interpretation is always same!!
>>>>>
>>>>>
>>>>> x B B B
>>>>>
>>>>> Enter story start (QUIT to go to work): x
>>>>> x x B B B
>>>>>
>>>>>
>>>>> x B B B
>>>>>
>>>>> Enter story start (QUIT to go to work): x
>>>>> x x B B B
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Nov 17, 2013 at 4:01 PM, Marek Otahal <[email protected]>wrote:
>>>>>
>>>>>> I've added an "interactive" feature to Chetan's Linguist
>>>>>> https://github.com/chetan51/linguist - a story teller mode.
>>>>>>
>>>>>> It will (more or less) memorize the given text and then let you type
>>>>>> starting words (ie "So he ") and follow up on its own to complete the
>>>>>> sentence(s).
>>>>>>
>>>>>> ---------------------------------
>>>>>> Yet there's a problem.
>>>>>>
>>>>>> I'll describe the project briefly, it uses TP to learn texts as a
>>>>>> sequence(s) of letters.
>>>>>>
>>>>>> First it used to memorize whole text as one long sequence, this worked
>>>>>> for smaller datasets, but for bigger, the accuracy went down quickly.
>>>>>>
>>>>>> I decided to simplify and separate text to separate sequences and
>>>>>> reset the sequence memory of the temporal pooler at the end of each
>>>>>> sentence. This greatly improved prediction probabilities as sequences are
>>>>>> much shorter (avg sentence lenght (+-30chars) vs dataset len (hundreds -
>>>>>> thousands chars)).
>>>>>>
>>>>>> The problem is, after the first end of sequence, there's no "flow" (I
>>>>>> know, I've called a reset(), what could I expect ;) ), so a state with
>>>>>> highest statistical probability is selected (always the same!)
>>>>>>
>>>>>> example dataset: "
>>>>>> How are you?
>>>>>> I'm fine.
>>>>>> I'm tired.
>>>>>> Yayyyyy!"
>>>>>>
>>>>>> So when you start "Ho"..it'll correctly follow.."w are you?" "I'm
>>>>>> fine" "I'm fine" "I'm fine"...forever.
>>>>>>
>>>>>> The "I'm fine" is fine :) as from a new state it's the most probable
>>>>>> choice (2 out of 4). But it doesn't look good.
>>>>>>
>>>>>> I;ve come with 2 solutions:
>>>>>>  # Idea1:
>>>>>>  after seq reset in the generation mode, randomly generate the first
>>>>>> char manually, feed it to TP and let it follow...
>>>>>>  should work: OK, principle: so-so.
>>>>>>
>>>>>>  #Idea2:
>>>>>>  even though I trained with a reset (=new unknown state) after each
>>>>>> sentence end, can I now somehow keep the flow spanning over more 
>>>>>> sentences?
>>>>>>
>>>>>>
>>>>>> Last but not least, the bug!
>>>>>> The bug is in (CLA)model's result.inferences['prediction']
>>>>>> By definition, this field should return the most probable state from
>>>>>> the inference. But what if there are two+ most probable states? I believe
>>>>>> we should go random.
>>>>>>
>>>>>> While for debuging the fixt order is convenient, the random order
>>>>>> seems natural. I believe it would fix my problem with repetitive "Im 
>>>>>> fine"
>>>>>> above too. (kindof)
>>>>>>
>>>>>> Proposed solution, if you agree, we;ll add init() parameter
>>>>>> debug=False which will keep the fixed ordering if needed, and by default,
>>>>>> do random on same probable states.
>>>>>>
>>>>>> Thanks for reading :)
>>>>>> mark
>>>>>> --
>>>>>> Marek Otahal :o)
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Marek Otahal :o)
>>>>>
>>>>> _______________________________________________
>>>>> nupic mailing list
>>>>> [email protected]
>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Fergal Byrne, Brenter IT
>>>>
>>>> <http://www.examsupport.ie>http://inbits.com - Better Living through
>>>> Thoughtful Technology
>>>>
>>>> e:[email protected] t:+353 83 4214179
>>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>>
>>>> _______________________________________________
>>>> nupic mailing list
>>>> [email protected]
>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Marek Otahal :o)
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>>
>> --
>>
>> Fergal Byrne, Brenter IT
>>
>> <http://www.examsupport.ie>http://inbits.com - Better Living through
>> Thoughtful Technology
>>
>> e:[email protected] t:+353 83 4214179
>> Formerly of Adnet [email protected] http://www.adnet.ie
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
> -- 
> Marek Otahal :o)

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] [project][BUG?] "Story teller" on Nupic, and randomizing selection of states with same probabilities in TP

Reply via email to