For gauging what's going on, have you tried using metrics (from the opf docs: https://github.com/numenta/nupic/wiki/Online-Prediction-Framework)?
On Thu, Aug 29, 2013 at 2:59 PM, Chetan Surpur <[email protected]> wrote: > Update: > > I ran a "medium"-sized swarm on the Ugly Duckling story that James Tauber > formatted in the "HTM in Natural Language Processing" thread. Here are the > parameters that got changed: > > Spatial pooler – > > n: 100 => 121 > w: 10 => 21 > synPermInactiveDec: 0.01 => 0.0058 > > Temporal pooler – > > minThreshold: 12 => 9 > activationThreshold: 16 => 12 > > I've updated the repo at https://github.com/chetan51/linguist with the > swarmed model. > > It's running a little bit better, but it's hard to tell without more > metrics (I'm currently just eyeballing the 1-10 step predictions for each > inputted character). > > Interestingly, it's able to recognize subsequences fairly well in the Ugly > Duckling story. The repetition looks like it's helping the CLA find > patterns more quickly, than if I were to feed it more difficult-to-read > text. More evidence that it might be a good idea to start training with > children's books, and increase reading difficulty over time. > > I'm also getting the feeling that it requires a lot of training data > before it can provide decent results. I'm thinking of setting up an EC2 > instance and have it running on larger datasets for longer periods of time, > and see if there's a threshold of training data after which it will start > producing useful results. > > > On Wed, Aug 28, 2013 at 4:40 PM, Chetan Surpur <[email protected]> wrote: > >> That's a good suggestion. I'm actually setting up a swarm for this task >> right now, and hopefully it'll come up with the best 'n' and 'w', among >> other parameters. >> >> >> On Wed, Aug 28, 2013 at 11:04 AM, Erik Blas <[email protected]> wrote: >> >>> Have you tried tweaking the 'n' (I keep forgetting what n refers to >>> here..) and 'w' (if I remember correctly this is a reference to the width >>> of the number of active bits used in the SDR) for the longer sentences? >>> >>> >>> >>> >>> On Tue, Aug 27, 2013 at 9:22 PM, Chetan Surpur <[email protected]>wrote: >>> >>>> With all this talk about Natural Language Processing and the hackathon, >>>> I figured this is a good time to share a little project my friend and I >>>> have recently started using NuPIC. We call it Linguist, and our goal for it >>>> is an AI that can read text unsupervised (from Wikipedia and the rest of >>>> the internet), build a model of language, and provide better autocorrect / >>>> autopredict for mobile keyboards (as the first useful application for it). >>>> >>>> So far, we've tried feeding characters, one at a time, to the CLA, each >>>> one encoded as a category. We've watched it learn sentences the way the >>>> melody-learning AI from the last hackathon learned notes. >>>> >>>> You can download what we have so far, and try it yourself: >>>> https://github.com/chetan51/linguist >>>> It might also be a decent platform to start experimenting with NLP >>>> tasks for the upcoming hackathon! >>>> >>>> We have a couple interesting ideas, and a bunch of questions. >>>> >>>> Some ideas: >>>> >>>> We want to train it on public text from the internet to build a global >>>> model, and then install it on a users phone and have it learn (possibly >>>> with higher weight) by the user's own text messages and emails. This would >>>> take an already intelligent model and personalize it to the user's own >>>> style of writing and vocabulary. >>>> >>>> We're also thinking of using anomaly detection to fix spelling >>>> mistakes, and probability thresholds to suggest the rest of the word, >>>> phrase, and sentence without being annoying. We're hoping that the CLA will >>>> live up to be a good algorithm for this application, and we're very curious >>>> to see how well it will do. >>>> >>>> Some questions: >>>> >>>> While playing with it, we noticed that it learns sequences pretty >>>> quickly, but patterns very slowly. We repeated a short sentence many times, >>>> and it was able to predict fairly correctly the rest of the sentence at >>>> every position in the sentence after a couple of repetitions. But when we >>>> fed it long text, such as novels from Project Gutenberg, its predictions >>>> were almost totally incoherent. >>>> >>>> Could this be because the CLA is currently implemented as just a single >>>> region, without hierarchies? For that matter, how well can a single region >>>> do for predicting complex patterns like those in language, beyond just >>>> simple character transitions? Do we need hierarchy support before we'll see >>>> any decent performance on this task? >>>> >>>> We're also not totally clear why a perfect run during the short >>>> sentence-repetition exercise as described above is sometimes followed by a >>>> mistake in the next run. Why exactly, down to the level of details of >>>> neuronal connections, can the prediction accuracy go down with an >>>> additional repetition of a pattern? Is it because the algorithm is >>>> stochastic? We'd love any insight on that :) >>>> >>>> Finally, we'd like to invite interested parties to join us in exploring >>>> this (and related) NLP applications of NuPIC. I would love to learn faster >>>> by working with other interested people and bounce ideas off of each other. >>>> Let me know if you'd like to chat! >>>> >>>> Thank you for your time, and your answers to my questions, >>>> Chetan >>>> >>>> _______________________________________________ >>>> nupic mailing list >>>> [email protected] >>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>> >>>> >>> >>> _______________________________________________ >>> nupic mailing list >>> [email protected] >>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>> >>> >> > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > >
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
