Re: [nupic-dev] [project][BUG?] "Story teller" on Nupic, and randomizing selection of states with same probabilities in TP

Fergal Byrne Sun, 17 Nov 2013 08:13:15 -0800

Hi Marek,

No, I meant do it for every letter. To solve the problem of how to start a
new sentence, gather statistics on the starting letters from your corpus,
and use the same idea to select a starting letter.



On Sun, Nov 17, 2013 at 3:58 PM, Marek Otahal <[email protected]> wrote:

> Hi Fergal,
>
> thanks for your advice. If I understand you, you mean to apply this to the
> "new sentence starts after reset" ? Because otherwise the flow is driven by
> the memory of CLA.
>
> I will do that, if I dont find a better solution. (<= actually I think
> your SP-enhanced will cut it! )
> The thing I'd prefer would be some "educated guess", so the sequences
> could make sense as they follow.
>
> Problem here is, the sequence of states is always "<RESET>, ??", so ?? is
> simply statistics of the most common first-letters.
>
> Do you have this student's result's, so we can compare CLA?
>
> Btw, I'd like to hear your intake on the randomization of undecidable
> states.. :)
>
> regards, Mark
>
>
> On Sun, Nov 17, 2013 at 4:47 PM, Fergal Byrne <[email protected]
> > wrote:
>
>> Hi Marek,
>>
>> This is great. One suggestion is to steal from one of Geoff Hinton's
>> students, who did exactly the same letter-by-letter prediction. What he did
>> was to take the predictions, let's say:
>>
>> d: 0.33
>> t: 0.27
>> e: 0.2
>> f: 0.2
>>
>> And use a random generator to decide which of these to give it next, in
>> proportion to their probabilities. So 1/3 of the time you give it a d etc.
>>
>>
>>
>>
>> On Sun, Nov 17, 2013 at 3:05 PM, Marek Otahal <[email protected]>wrote:
>>
>>> Here;s illustrative output on running a "xAAA. xBBB" dataset.
>>>
>>> ====== Repeat #100 =======
>>>
>>> [991]    x ==> BBB|x    (0.50 | 0.50 | 0.50 | 1.00 | 1.00)
>>> <<<<<learning correctly
>>> [992]    A ==> AA|xB    (0.88 | 0.78 | 0.78 | 0.78 | 1.00)
>>> [993]    A ==> A|xBB    (0.92 | 0.81 | 0.81 | 0.89 | 1.00)
>>> [994]    A ==> |xBBB    (0.80 | 0.80 | 0.80 | 0.88 | 1.00)
>>> [995]    | ==> xBBB|    (1.00 | 0.92 | 0.92 | 0.92 | 1.00)
>>> DEBUG:  Result of PyRegion::executeCommand : 'None'
>>> reset
>>> [996]    x ==> AAA|x    (0.50 | 0.50 | 0.50 | 1.00 | 1.00)
>>> <<<<<<learning correctly
>>> [997]    B ==> BB|xA    (0.94 | 0.89 | 0.89 | 0.89 | 1.00)
>>> [998]    B ==> B|xAA    (0.91 | 0.85 | 0.85 | 0.94 | 1.00)
>>> [999]    B ==> |xAAA    (0.85 | 0.85 | 0.85 | 0.94 | 1.00)
>>> [1000]   | ==> xAAA|    (1.00 | 0.91 | 0.91 | 0.91 | 1.00)
>>> DEBUG:  Result of PyRegion::executeCommand : 'None'
>>> reset
>>> ==========================================
>>> Welcome young adventurer, let me tell you a story!
>>> Enter story start (QUIT to go to work): x
>>> x x B B B   <<<<interpretation is always same!!
>>>
>>>
>>> x B B B
>>>
>>> Enter story start (QUIT to go to work): x
>>> x x B B B
>>>
>>>
>>> x B B B
>>>
>>> Enter story start (QUIT to go to work): x
>>> x x B B B
>>>
>>>
>>>
>>>
>>> On Sun, Nov 17, 2013 at 4:01 PM, Marek Otahal <[email protected]>wrote:
>>>
>>>> I've added an "interactive" feature to Chetan's Linguist
>>>> https://github.com/chetan51/linguist - a story teller mode.
>>>>
>>>> It will (more or less) memorize the given text and then let you type
>>>> starting words (ie "So he ") and follow up on its own to complete the
>>>> sentence(s).
>>>>
>>>> ---------------------------------
>>>> Yet there's a problem.
>>>>
>>>> I'll describe the project briefly, it uses TP to learn texts as a
>>>> sequence(s) of letters.
>>>>
>>>> First it used to memorize whole text as one long sequence, this worked
>>>> for smaller datasets, but for bigger, the accuracy went down quickly.
>>>>
>>>> I decided to simplify and separate text to separate sequences and reset
>>>> the sequence memory of the temporal pooler at the end of each sentence.
>>>> This greatly improved prediction probabilities as sequences are much
>>>> shorter (avg sentence lenght (+-30chars) vs dataset len (hundreds -
>>>> thousands chars)).
>>>>
>>>> The problem is, after the first end of sequence, there's no "flow" (I
>>>> know, I've called a reset(), what could I expect ;) ), so a state with
>>>> highest statistical probability is selected (always the same!)
>>>>
>>>> example dataset: "
>>>> How are you?
>>>> I'm fine.
>>>> I'm tired.
>>>> Yayyyyy!"
>>>>
>>>> So when you start "Ho"..it'll correctly follow.."w are you?" "I'm fine"
>>>> "I'm fine" "I'm fine"...forever.
>>>>
>>>> The "I'm fine" is fine :) as from a new state it's the most probable
>>>> choice (2 out of 4). But it doesn't look good.
>>>>
>>>> I;ve come with 2 solutions:
>>>>  # Idea1:
>>>>  after seq reset in the generation mode, randomly generate the first
>>>> char manually, feed it to TP and let it follow...
>>>>  should work: OK, principle: so-so.
>>>>
>>>>  #Idea2:
>>>>  even though I trained with a reset (=new unknown state) after each
>>>> sentence end, can I now somehow keep the flow spanning over more sentences?
>>>>
>>>>
>>>> Last but not least, the bug!
>>>> The bug is in (CLA)model's result.inferences['prediction']
>>>> By definition, this field should return the most probable state from
>>>> the inference. But what if there are two+ most probable states? I believe
>>>> we should go random.
>>>>
>>>> While for debuging the fixt order is convenient, the random order seems
>>>> natural. I believe it would fix my problem with repetitive "Im fine" above
>>>> too. (kindof)
>>>>
>>>> Proposed solution, if you agree, we;ll add init() parameter debug=False
>>>> which will keep the fixed ordering if needed, and by default, do random on
>>>> same probable states.
>>>>
>>>> Thanks for reading :)
>>>> mark
>>>> --
>>>> Marek Otahal :o)
>>>>
>>>
>>>
>>>
>>> --
>>> Marek Otahal :o)
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>>
>> --
>>
>> Fergal Byrne, Brenter IT
>>
>> <http://www.examsupport.ie>http://inbits.com - Better Living through
>> Thoughtful Technology
>>
>> e:[email protected] t:+353 83 4214179
>> Formerly of Adnet [email protected] http://www.adnet.ie
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
>
> --
> Marek Otahal :o)
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>


-- 

Fergal Byrne, Brenter IT

<http://www.examsupport.ie>http://inbits.com - Better Living through
Thoughtful Technology

e:[email protected] t:+353 83 4214179
Formerly of Adnet [email protected] http://www.adnet.ie

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] [project][BUG?] "Story teller" on Nupic, and randomizing selection of states with same probabilities in TP

Reply via email to