I've added an "interactive" feature to Chetan's Linguist https://github.com/chetan51/linguist - a story teller mode.
It will (more or less) memorize the given text and then let you type starting words (ie "So he ") and follow up on its own to complete the sentence(s). --------------------------------- Yet there's a problem. I'll describe the project briefly, it uses TP to learn texts as a sequence(s) of letters. First it used to memorize whole text as one long sequence, this worked for smaller datasets, but for bigger, the accuracy went down quickly. I decided to simplify and separate text to separate sequences and reset the sequence memory of the temporal pooler at the end of each sentence. This greatly improved prediction probabilities as sequences are much shorter (avg sentence lenght (+-30chars) vs dataset len (hundreds - thousands chars)). The problem is, after the first end of sequence, there's no "flow" (I know, I've called a reset(), what could I expect ;) ), so a state with highest statistical probability is selected (always the same!) example dataset: " How are you? I'm fine. I'm tired. Yayyyyy!" So when you start "Ho"..it'll correctly follow.."w are you?" "I'm fine" "I'm fine" "I'm fine"...forever. The "I'm fine" is fine :) as from a new state it's the most probable choice (2 out of 4). But it doesn't look good. I;ve come with 2 solutions: # Idea1: after seq reset in the generation mode, randomly generate the first char manually, feed it to TP and let it follow... should work: OK, principle: so-so. #Idea2: even though I trained with a reset (=new unknown state) after each sentence end, can I now somehow keep the flow spanning over more sentences? Last but not least, the bug! The bug is in (CLA)model's result.inferences['prediction'] By definition, this field should return the most probable state from the inference. But what if there are two+ most probable states? I believe we should go random. While for debuging the fixt order is convenient, the random order seems natural. I believe it would fix my problem with repetitive "Im fine" above too. (kindof) Proposed solution, if you agree, we;ll add init() parameter debug=False which will keep the fixed ordering if needed, and by default, do random on same probable states. Thanks for reading :) mark -- Marek Otahal :o)
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
