Right, I see. So for each new letter, I'd sample from the model according to the probability distribution of the inference for current step. This should give me more "fair" generation than simple 1-Best.
Btw, I hope Scott, Subutai or sb to weight in, I'd expect OPF model to have such functionality, so not to reinvent the wheel. Thanks a lot! On Sun, Nov 17, 2013 at 5:11 PM, Fergal Byrne <[email protected]>wrote: > Hi Marek, > > No, I meant do it for every letter. To solve the problem of how to start a > new sentence, gather statistics on the starting letters from your corpus, > and use the same idea to select a starting letter. > > > On Sun, Nov 17, 2013 at 3:58 PM, Marek Otahal <[email protected]>wrote: > >> Hi Fergal, >> >> thanks for your advice. If I understand you, you mean to apply this to >> the "new sentence starts after reset" ? Because otherwise the flow is >> driven by the memory of CLA. >> >> I will do that, if I dont find a better solution. (<= actually I think >> your SP-enhanced will cut it! ) >> The thing I'd prefer would be some "educated guess", so the sequences >> could make sense as they follow. >> >> Problem here is, the sequence of states is always "<RESET>, ??", so ?? is >> simply statistics of the most common first-letters. >> >> Do you have this student's result's, so we can compare CLA? >> >> Btw, I'd like to hear your intake on the randomization of undecidable >> states.. :) >> >> regards, Mark >> >> >> On Sun, Nov 17, 2013 at 4:47 PM, Fergal Byrne < >> [email protected]> wrote: >> >>> Hi Marek, >>> >>> This is great. One suggestion is to steal from one of Geoff Hinton's >>> students, who did exactly the same letter-by-letter prediction. What he did >>> was to take the predictions, let's say: >>> >>> d: 0.33 >>> t: 0.27 >>> e: 0.2 >>> f: 0.2 >>> >>> And use a random generator to decide which of these to give it next, in >>> proportion to their probabilities. So 1/3 of the time you give it a d etc. >>> >>> >>> >>> >>> On Sun, Nov 17, 2013 at 3:05 PM, Marek Otahal <[email protected]>wrote: >>> >>>> Here;s illustrative output on running a "xAAA. xBBB" dataset. >>>> >>>> ====== Repeat #100 ======= >>>> >>>> [991] x ==> BBB|x (0.50 | 0.50 | 0.50 | 1.00 | 1.00) >>>> <<<<<learning correctly >>>> [992] A ==> AA|xB (0.88 | 0.78 | 0.78 | 0.78 | 1.00) >>>> [993] A ==> A|xBB (0.92 | 0.81 | 0.81 | 0.89 | 1.00) >>>> [994] A ==> |xBBB (0.80 | 0.80 | 0.80 | 0.88 | 1.00) >>>> [995] | ==> xBBB| (1.00 | 0.92 | 0.92 | 0.92 | 1.00) >>>> DEBUG: Result of PyRegion::executeCommand : 'None' >>>> reset >>>> [996] x ==> AAA|x (0.50 | 0.50 | 0.50 | 1.00 | 1.00) >>>> <<<<<<learning correctly >>>> [997] B ==> BB|xA (0.94 | 0.89 | 0.89 | 0.89 | 1.00) >>>> [998] B ==> B|xAA (0.91 | 0.85 | 0.85 | 0.94 | 1.00) >>>> [999] B ==> |xAAA (0.85 | 0.85 | 0.85 | 0.94 | 1.00) >>>> [1000] | ==> xAAA| (1.00 | 0.91 | 0.91 | 0.91 | 1.00) >>>> DEBUG: Result of PyRegion::executeCommand : 'None' >>>> reset >>>> ========================================== >>>> Welcome young adventurer, let me tell you a story! >>>> Enter story start (QUIT to go to work): x >>>> x x B B B <<<<interpretation is always same!! >>>> >>>> >>>> x B B B >>>> >>>> Enter story start (QUIT to go to work): x >>>> x x B B B >>>> >>>> >>>> x B B B >>>> >>>> Enter story start (QUIT to go to work): x >>>> x x B B B >>>> >>>> >>>> >>>> >>>> On Sun, Nov 17, 2013 at 4:01 PM, Marek Otahal <[email protected]>wrote: >>>> >>>>> I've added an "interactive" feature to Chetan's Linguist >>>>> https://github.com/chetan51/linguist - a story teller mode. >>>>> >>>>> It will (more or less) memorize the given text and then let you type >>>>> starting words (ie "So he ") and follow up on its own to complete the >>>>> sentence(s). >>>>> >>>>> --------------------------------- >>>>> Yet there's a problem. >>>>> >>>>> I'll describe the project briefly, it uses TP to learn texts as a >>>>> sequence(s) of letters. >>>>> >>>>> First it used to memorize whole text as one long sequence, this worked >>>>> for smaller datasets, but for bigger, the accuracy went down quickly. >>>>> >>>>> I decided to simplify and separate text to separate sequences and >>>>> reset the sequence memory of the temporal pooler at the end of each >>>>> sentence. This greatly improved prediction probabilities as sequences are >>>>> much shorter (avg sentence lenght (+-30chars) vs dataset len (hundreds - >>>>> thousands chars)). >>>>> >>>>> The problem is, after the first end of sequence, there's no "flow" (I >>>>> know, I've called a reset(), what could I expect ;) ), so a state with >>>>> highest statistical probability is selected (always the same!) >>>>> >>>>> example dataset: " >>>>> How are you? >>>>> I'm fine. >>>>> I'm tired. >>>>> Yayyyyy!" >>>>> >>>>> So when you start "Ho"..it'll correctly follow.."w are you?" "I'm >>>>> fine" "I'm fine" "I'm fine"...forever. >>>>> >>>>> The "I'm fine" is fine :) as from a new state it's the most probable >>>>> choice (2 out of 4). But it doesn't look good. >>>>> >>>>> I;ve come with 2 solutions: >>>>> # Idea1: >>>>> after seq reset in the generation mode, randomly generate the first >>>>> char manually, feed it to TP and let it follow... >>>>> should work: OK, principle: so-so. >>>>> >>>>> #Idea2: >>>>> even though I trained with a reset (=new unknown state) after each >>>>> sentence end, can I now somehow keep the flow spanning over more >>>>> sentences? >>>>> >>>>> >>>>> Last but not least, the bug! >>>>> The bug is in (CLA)model's result.inferences['prediction'] >>>>> By definition, this field should return the most probable state from >>>>> the inference. But what if there are two+ most probable states? I believe >>>>> we should go random. >>>>> >>>>> While for debuging the fixt order is convenient, the random order >>>>> seems natural. I believe it would fix my problem with repetitive "Im fine" >>>>> above too. (kindof) >>>>> >>>>> Proposed solution, if you agree, we;ll add init() parameter >>>>> debug=False which will keep the fixed ordering if needed, and by default, >>>>> do random on same probable states. >>>>> >>>>> Thanks for reading :) >>>>> mark >>>>> -- >>>>> Marek Otahal :o) >>>>> >>>> >>>> >>>> >>>> -- >>>> Marek Otahal :o) >>>> >>>> _______________________________________________ >>>> nupic mailing list >>>> [email protected] >>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>> >>>> >>> >>> >>> -- >>> >>> Fergal Byrne, Brenter IT >>> >>> <http://www.examsupport.ie>http://inbits.com - Better Living through >>> Thoughtful Technology >>> >>> e:[email protected] t:+353 83 4214179 >>> Formerly of Adnet [email protected] http://www.adnet.ie >>> >>> _______________________________________________ >>> nupic mailing list >>> [email protected] >>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>> >>> >> >> >> -- >> Marek Otahal :o) >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> >> > > > -- > > Fergal Byrne, Brenter IT > > <http://www.examsupport.ie>http://inbits.com - Better Living through > Thoughtful Technology > > e:[email protected] t:+353 83 4214179 > Formerly of Adnet [email protected] http://www.adnet.ie > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > > -- Marek Otahal :o)
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
