Hi Marek, No, I meant do it for every letter. To solve the problem of how to start a new sentence, gather statistics on the starting letters from your corpus, and use the same idea to select a starting letter.
On Sun, Nov 17, 2013 at 3:58 PM, Marek Otahal <[email protected]> wrote: > Hi Fergal, > > thanks for your advice. If I understand you, you mean to apply this to the > "new sentence starts after reset" ? Because otherwise the flow is driven by > the memory of CLA. > > I will do that, if I dont find a better solution. (<= actually I think > your SP-enhanced will cut it! ) > The thing I'd prefer would be some "educated guess", so the sequences > could make sense as they follow. > > Problem here is, the sequence of states is always "<RESET>, ??", so ?? is > simply statistics of the most common first-letters. > > Do you have this student's result's, so we can compare CLA? > > Btw, I'd like to hear your intake on the randomization of undecidable > states.. :) > > regards, Mark > > > On Sun, Nov 17, 2013 at 4:47 PM, Fergal Byrne <[email protected] > > wrote: > >> Hi Marek, >> >> This is great. One suggestion is to steal from one of Geoff Hinton's >> students, who did exactly the same letter-by-letter prediction. What he did >> was to take the predictions, let's say: >> >> d: 0.33 >> t: 0.27 >> e: 0.2 >> f: 0.2 >> >> And use a random generator to decide which of these to give it next, in >> proportion to their probabilities. So 1/3 of the time you give it a d etc. >> >> >> >> >> On Sun, Nov 17, 2013 at 3:05 PM, Marek Otahal <[email protected]>wrote: >> >>> Here;s illustrative output on running a "xAAA. xBBB" dataset. >>> >>> ====== Repeat #100 ======= >>> >>> [991] x ==> BBB|x (0.50 | 0.50 | 0.50 | 1.00 | 1.00) >>> <<<<<learning correctly >>> [992] A ==> AA|xB (0.88 | 0.78 | 0.78 | 0.78 | 1.00) >>> [993] A ==> A|xBB (0.92 | 0.81 | 0.81 | 0.89 | 1.00) >>> [994] A ==> |xBBB (0.80 | 0.80 | 0.80 | 0.88 | 1.00) >>> [995] | ==> xBBB| (1.00 | 0.92 | 0.92 | 0.92 | 1.00) >>> DEBUG: Result of PyRegion::executeCommand : 'None' >>> reset >>> [996] x ==> AAA|x (0.50 | 0.50 | 0.50 | 1.00 | 1.00) >>> <<<<<<learning correctly >>> [997] B ==> BB|xA (0.94 | 0.89 | 0.89 | 0.89 | 1.00) >>> [998] B ==> B|xAA (0.91 | 0.85 | 0.85 | 0.94 | 1.00) >>> [999] B ==> |xAAA (0.85 | 0.85 | 0.85 | 0.94 | 1.00) >>> [1000] | ==> xAAA| (1.00 | 0.91 | 0.91 | 0.91 | 1.00) >>> DEBUG: Result of PyRegion::executeCommand : 'None' >>> reset >>> ========================================== >>> Welcome young adventurer, let me tell you a story! >>> Enter story start (QUIT to go to work): x >>> x x B B B <<<<interpretation is always same!! >>> >>> >>> x B B B >>> >>> Enter story start (QUIT to go to work): x >>> x x B B B >>> >>> >>> x B B B >>> >>> Enter story start (QUIT to go to work): x >>> x x B B B >>> >>> >>> >>> >>> On Sun, Nov 17, 2013 at 4:01 PM, Marek Otahal <[email protected]>wrote: >>> >>>> I've added an "interactive" feature to Chetan's Linguist >>>> https://github.com/chetan51/linguist - a story teller mode. >>>> >>>> It will (more or less) memorize the given text and then let you type >>>> starting words (ie "So he ") and follow up on its own to complete the >>>> sentence(s). >>>> >>>> --------------------------------- >>>> Yet there's a problem. >>>> >>>> I'll describe the project briefly, it uses TP to learn texts as a >>>> sequence(s) of letters. >>>> >>>> First it used to memorize whole text as one long sequence, this worked >>>> for smaller datasets, but for bigger, the accuracy went down quickly. >>>> >>>> I decided to simplify and separate text to separate sequences and reset >>>> the sequence memory of the temporal pooler at the end of each sentence. >>>> This greatly improved prediction probabilities as sequences are much >>>> shorter (avg sentence lenght (+-30chars) vs dataset len (hundreds - >>>> thousands chars)). >>>> >>>> The problem is, after the first end of sequence, there's no "flow" (I >>>> know, I've called a reset(), what could I expect ;) ), so a state with >>>> highest statistical probability is selected (always the same!) >>>> >>>> example dataset: " >>>> How are you? >>>> I'm fine. >>>> I'm tired. >>>> Yayyyyy!" >>>> >>>> So when you start "Ho"..it'll correctly follow.."w are you?" "I'm fine" >>>> "I'm fine" "I'm fine"...forever. >>>> >>>> The "I'm fine" is fine :) as from a new state it's the most probable >>>> choice (2 out of 4). But it doesn't look good. >>>> >>>> I;ve come with 2 solutions: >>>> # Idea1: >>>> after seq reset in the generation mode, randomly generate the first >>>> char manually, feed it to TP and let it follow... >>>> should work: OK, principle: so-so. >>>> >>>> #Idea2: >>>> even though I trained with a reset (=new unknown state) after each >>>> sentence end, can I now somehow keep the flow spanning over more sentences? >>>> >>>> >>>> Last but not least, the bug! >>>> The bug is in (CLA)model's result.inferences['prediction'] >>>> By definition, this field should return the most probable state from >>>> the inference. But what if there are two+ most probable states? I believe >>>> we should go random. >>>> >>>> While for debuging the fixt order is convenient, the random order seems >>>> natural. I believe it would fix my problem with repetitive "Im fine" above >>>> too. (kindof) >>>> >>>> Proposed solution, if you agree, we;ll add init() parameter debug=False >>>> which will keep the fixed ordering if needed, and by default, do random on >>>> same probable states. >>>> >>>> Thanks for reading :) >>>> mark >>>> -- >>>> Marek Otahal :o) >>>> >>> >>> >>> >>> -- >>> Marek Otahal :o) >>> >>> _______________________________________________ >>> nupic mailing list >>> [email protected] >>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>> >>> >> >> >> -- >> >> Fergal Byrne, Brenter IT >> >> <http://www.examsupport.ie>http://inbits.com - Better Living through >> Thoughtful Technology >> >> e:[email protected] t:+353 83 4214179 >> Formerly of Adnet [email protected] http://www.adnet.ie >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> >> > > > -- > Marek Otahal :o) > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > > -- Fergal Byrne, Brenter IT <http://www.examsupport.ie>http://inbits.com - Better Living through Thoughtful Technology e:[email protected] t:+353 83 4214179 Formerly of Adnet [email protected] http://www.adnet.ie
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
