[nupic-dev] [project][BUG?] "Story teller" on Nupic, and randomizing selection of states with same probabilities in TP

Marek Otahal Sun, 17 Nov 2013 07:03:05 -0800

I've added an "interactive" feature to Chetan's Linguist
https://github.com/chetan51/linguist - a story teller mode.


It will (more or less) memorize the given text and then let you type
starting words (ie "So he ") and follow up on its own to complete the
sentence(s).

---------------------------------
Yet there's a problem.

I'll describe the project briefly, it uses TP to learn texts as a
sequence(s) of letters.

First it used to memorize whole text as one long sequence, this worked for
smaller datasets, but for bigger, the accuracy went down quickly.

I decided to simplify and separate text to separate sequences and reset the
sequence memory of the temporal pooler at the end of each sentence. This
greatly improved prediction probabilities as sequences are much shorter
(avg sentence lenght (+-30chars) vs dataset len (hundreds - thousands
chars)).

The problem is, after the first end of sequence, there's no "flow" (I know,
I've called a reset(), what could I expect ;) ), so a state with highest
statistical probability is selected (always the same!)

example dataset: "
How are you?
I'm fine.
I'm tired.
Yayyyyy!"

So when you start "Ho"..it'll correctly follow.."w are you?" "I'm fine"
"I'm fine" "I'm fine"...forever.

The "I'm fine" is fine :) as from a new state it's the most probable choice
(2 out of 4). But it doesn't look good.

I;ve come with 2 solutions:
 # Idea1:
 after seq reset in the generation mode, randomly generate the first char
manually, feed it to TP and let it follow...
 should work: OK, principle: so-so.

 #Idea2:
 even though I trained with a reset (=new unknown state) after each
sentence end, can I now somehow keep the flow spanning over more sentences?


Last but not least, the bug!
The bug is in (CLA)model's result.inferences['prediction']
By definition, this field should return the most probable state from the
inference. But what if there are two+ most probable states? I believe we
should go random.

While for debuging the fixt order is convenient, the random order seems
natural. I believe it would fix my problem with repetitive "Im fine" above
too. (kindof)

Proposed solution, if you agree, we;ll add init() parameter debug=False
which will keep the fixed ordering if needed, and by default, do random on
same probable states.

Thanks for reading :)
mark
-- 
Marek Otahal :o)

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

[nupic-dev] [project][BUG?] "Story teller" on Nupic, and randomizing selection of states with same probabilities in TP

Reply via email to