Here;s illustrative output on running a "xAAA. xBBB" dataset.

====== Repeat #100 =======

[991]    x ==> BBB|x    (0.50 | 0.50 | 0.50 | 1.00 | 1.00)
<<<<<learning correctly
[992]    A ==> AA|xB    (0.88 | 0.78 | 0.78 | 0.78 | 1.00)
[993]    A ==> A|xBB    (0.92 | 0.81 | 0.81 | 0.89 | 1.00)
[994]    A ==> |xBBB    (0.80 | 0.80 | 0.80 | 0.88 | 1.00)
[995]    | ==> xBBB|    (1.00 | 0.92 | 0.92 | 0.92 | 1.00)
DEBUG:  Result of PyRegion::executeCommand : 'None'
reset
[996]    x ==> AAA|x    (0.50 | 0.50 | 0.50 | 1.00 | 1.00)
<<<<<<learning correctly
[997]    B ==> BB|xA    (0.94 | 0.89 | 0.89 | 0.89 | 1.00)
[998]    B ==> B|xAA    (0.91 | 0.85 | 0.85 | 0.94 | 1.00)
[999]    B ==> |xAAA    (0.85 | 0.85 | 0.85 | 0.94 | 1.00)
[1000]   | ==> xAAA|    (1.00 | 0.91 | 0.91 | 0.91 | 1.00)
DEBUG:  Result of PyRegion::executeCommand : 'None'
reset
==========================================
Welcome young adventurer, let me tell you a story!
Enter story start (QUIT to go to work): x
x x B B B   <<<<interpretation is always same!!


x B B B

Enter story start (QUIT to go to work): x
x x B B B


x B B B

Enter story start (QUIT to go to work): x
x x B B B




On Sun, Nov 17, 2013 at 4:01 PM, Marek Otahal <[email protected]> wrote:

> I've added an "interactive" feature to Chetan's Linguist
> https://github.com/chetan51/linguist - a story teller mode.
>
> It will (more or less) memorize the given text and then let you type
> starting words (ie "So he ") and follow up on its own to complete the
> sentence(s).
>
> ---------------------------------
> Yet there's a problem.
>
> I'll describe the project briefly, it uses TP to learn texts as a
> sequence(s) of letters.
>
> First it used to memorize whole text as one long sequence, this worked for
> smaller datasets, but for bigger, the accuracy went down quickly.
>
> I decided to simplify and separate text to separate sequences and reset
> the sequence memory of the temporal pooler at the end of each sentence.
> This greatly improved prediction probabilities as sequences are much
> shorter (avg sentence lenght (+-30chars) vs dataset len (hundreds -
> thousands chars)).
>
> The problem is, after the first end of sequence, there's no "flow" (I
> know, I've called a reset(), what could I expect ;) ), so a state with
> highest statistical probability is selected (always the same!)
>
> example dataset: "
> How are you?
> I'm fine.
> I'm tired.
> Yayyyyy!"
>
> So when you start "Ho"..it'll correctly follow.."w are you?" "I'm fine"
> "I'm fine" "I'm fine"...forever.
>
> The "I'm fine" is fine :) as from a new state it's the most probable
> choice (2 out of 4). But it doesn't look good.
>
> I;ve come with 2 solutions:
>  # Idea1:
>  after seq reset in the generation mode, randomly generate the first char
> manually, feed it to TP and let it follow...
>  should work: OK, principle: so-so.
>
>  #Idea2:
>  even though I trained with a reset (=new unknown state) after each
> sentence end, can I now somehow keep the flow spanning over more sentences?
>
>
> Last but not least, the bug!
> The bug is in (CLA)model's result.inferences['prediction']
> By definition, this field should return the most probable state from the
> inference. But what if there are two+ most probable states? I believe we
> should go random.
>
> While for debuging the fixt order is convenient, the random order seems
> natural. I believe it would fix my problem with repetitive "Im fine" above
> too. (kindof)
>
> Proposed solution, if you agree, we;ll add init() parameter debug=False
> which will keep the fixed ordering if needed, and by default, do random on
> same probable states.
>
> Thanks for reading :)
> mark
> --
> Marek Otahal :o)
>



-- 
Marek Otahal :o)
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to