Here;s illustrative output on running a "xAAA. xBBB" dataset. ====== Repeat #100 =======
[991] x ==> BBB|x (0.50 | 0.50 | 0.50 | 1.00 | 1.00) <<<<<learning correctly [992] A ==> AA|xB (0.88 | 0.78 | 0.78 | 0.78 | 1.00) [993] A ==> A|xBB (0.92 | 0.81 | 0.81 | 0.89 | 1.00) [994] A ==> |xBBB (0.80 | 0.80 | 0.80 | 0.88 | 1.00) [995] | ==> xBBB| (1.00 | 0.92 | 0.92 | 0.92 | 1.00) DEBUG: Result of PyRegion::executeCommand : 'None' reset [996] x ==> AAA|x (0.50 | 0.50 | 0.50 | 1.00 | 1.00) <<<<<<learning correctly [997] B ==> BB|xA (0.94 | 0.89 | 0.89 | 0.89 | 1.00) [998] B ==> B|xAA (0.91 | 0.85 | 0.85 | 0.94 | 1.00) [999] B ==> |xAAA (0.85 | 0.85 | 0.85 | 0.94 | 1.00) [1000] | ==> xAAA| (1.00 | 0.91 | 0.91 | 0.91 | 1.00) DEBUG: Result of PyRegion::executeCommand : 'None' reset ========================================== Welcome young adventurer, let me tell you a story! Enter story start (QUIT to go to work): x x x B B B <<<<interpretation is always same!! x B B B Enter story start (QUIT to go to work): x x x B B B x B B B Enter story start (QUIT to go to work): x x x B B B On Sun, Nov 17, 2013 at 4:01 PM, Marek Otahal <[email protected]> wrote: > I've added an "interactive" feature to Chetan's Linguist > https://github.com/chetan51/linguist - a story teller mode. > > It will (more or less) memorize the given text and then let you type > starting words (ie "So he ") and follow up on its own to complete the > sentence(s). > > --------------------------------- > Yet there's a problem. > > I'll describe the project briefly, it uses TP to learn texts as a > sequence(s) of letters. > > First it used to memorize whole text as one long sequence, this worked for > smaller datasets, but for bigger, the accuracy went down quickly. > > I decided to simplify and separate text to separate sequences and reset > the sequence memory of the temporal pooler at the end of each sentence. > This greatly improved prediction probabilities as sequences are much > shorter (avg sentence lenght (+-30chars) vs dataset len (hundreds - > thousands chars)). > > The problem is, after the first end of sequence, there's no "flow" (I > know, I've called a reset(), what could I expect ;) ), so a state with > highest statistical probability is selected (always the same!) > > example dataset: " > How are you? > I'm fine. > I'm tired. > Yayyyyy!" > > So when you start "Ho"..it'll correctly follow.."w are you?" "I'm fine" > "I'm fine" "I'm fine"...forever. > > The "I'm fine" is fine :) as from a new state it's the most probable > choice (2 out of 4). But it doesn't look good. > > I;ve come with 2 solutions: > # Idea1: > after seq reset in the generation mode, randomly generate the first char > manually, feed it to TP and let it follow... > should work: OK, principle: so-so. > > #Idea2: > even though I trained with a reset (=new unknown state) after each > sentence end, can I now somehow keep the flow spanning over more sentences? > > > Last but not least, the bug! > The bug is in (CLA)model's result.inferences['prediction'] > By definition, this field should return the most probable state from the > inference. But what if there are two+ most probable states? I believe we > should go random. > > While for debuging the fixt order is convenient, the random order seems > natural. I believe it would fix my problem with repetitive "Im fine" above > too. (kindof) > > Proposed solution, if you agree, we;ll add init() parameter debug=False > which will keep the fixed ordering if needed, and by default, do random on > same probable states. > > Thanks for reading :) > mark > -- > Marek Otahal :o) > -- Marek Otahal :o)
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
