It's near the top of my todo list actually!
On Fri, Aug 30, 2013 at 5:40 PM, Erik Blas <[email protected]> wrote: > For gauging what's going on, have you tried using metrics (from the opf > docs: https://github.com/numenta/nupic/wiki/Online-Prediction-Framework)? > > > > On Thu, Aug 29, 2013 at 2:59 PM, Chetan Surpur <[email protected]> wrote: > >> Update: >> >> I ran a "medium"-sized swarm on the Ugly Duckling story that James Tauber >> formatted in the "HTM in Natural Language Processing" thread. Here are the >> parameters that got changed: >> >> Spatial pooler – >> >> n: 100 => 121 >> w: 10 => 21 >> synPermInactiveDec: 0.01 => 0.0058 >> >> Temporal pooler – >> >> minThreshold: 12 => 9 >> activationThreshold: 16 => 12 >> >> I've updated the repo at https://github.com/chetan51/linguist with the >> swarmed model. >> >> It's running a little bit better, but it's hard to tell without more >> metrics (I'm currently just eyeballing the 1-10 step predictions for each >> inputted character). >> >> Interestingly, it's able to recognize subsequences fairly well in the >> Ugly Duckling story. The repetition looks like it's helping the CLA find >> patterns more quickly, than if I were to feed it more difficult-to-read >> text. More evidence that it might be a good idea to start training with >> children's books, and increase reading difficulty over time. >> >> I'm also getting the feeling that it requires a lot of training data >> before it can provide decent results. I'm thinking of setting up an EC2 >> instance and have it running on larger datasets for longer periods of time, >> and see if there's a threshold of training data after which it will start >> producing useful results. >> >> >> On Wed, Aug 28, 2013 at 4:40 PM, Chetan Surpur <[email protected]>wrote: >> >>> That's a good suggestion. I'm actually setting up a swarm for this task >>> right now, and hopefully it'll come up with the best 'n' and 'w', among >>> other parameters. >>> >>> >>> On Wed, Aug 28, 2013 at 11:04 AM, Erik Blas <[email protected]> wrote: >>> >>>> Have you tried tweaking the 'n' (I keep forgetting what n refers to >>>> here..) and 'w' (if I remember correctly this is a reference to the width >>>> of the number of active bits used in the SDR) for the longer sentences? >>>> >>>> >>>> >>>> >>>> On Tue, Aug 27, 2013 at 9:22 PM, Chetan Surpur <[email protected]>wrote: >>>> >>>>> With all this talk about Natural Language Processing and the >>>>> hackathon, I figured this is a good time to share a little project my >>>>> friend and I have recently started using NuPIC. We call it Linguist, and >>>>> our goal for it is an AI that can read text unsupervised (from Wikipedia >>>>> and the rest of the internet), build a model of language, and provide >>>>> better autocorrect / autopredict for mobile keyboards (as the first useful >>>>> application for it). >>>>> >>>>> So far, we've tried feeding characters, one at a time, to the CLA, >>>>> each one encoded as a category. We've watched it learn sentences the way >>>>> the melody-learning AI from the last hackathon learned notes. >>>>> >>>>> You can download what we have so far, and try it yourself: >>>>> https://github.com/chetan51/linguist >>>>> It might also be a decent platform to start experimenting with NLP >>>>> tasks for the upcoming hackathon! >>>>> >>>>> We have a couple interesting ideas, and a bunch of questions. >>>>> >>>>> Some ideas: >>>>> >>>>> We want to train it on public text from the internet to build a global >>>>> model, and then install it on a users phone and have it learn (possibly >>>>> with higher weight) by the user's own text messages and emails. This would >>>>> take an already intelligent model and personalize it to the user's own >>>>> style of writing and vocabulary. >>>>> >>>>> We're also thinking of using anomaly detection to fix spelling >>>>> mistakes, and probability thresholds to suggest the rest of the word, >>>>> phrase, and sentence without being annoying. We're hoping that the CLA >>>>> will >>>>> live up to be a good algorithm for this application, and we're very >>>>> curious >>>>> to see how well it will do. >>>>> >>>>> Some questions: >>>>> >>>>> While playing with it, we noticed that it learns sequences pretty >>>>> quickly, but patterns very slowly. We repeated a short sentence many >>>>> times, >>>>> and it was able to predict fairly correctly the rest of the sentence at >>>>> every position in the sentence after a couple of repetitions. But when we >>>>> fed it long text, such as novels from Project Gutenberg, its predictions >>>>> were almost totally incoherent. >>>>> >>>>> Could this be because the CLA is currently implemented as just a >>>>> single region, without hierarchies? For that matter, how well can a single >>>>> region do for predicting complex patterns like those in language, beyond >>>>> just simple character transitions? Do we need hierarchy support before >>>>> we'll see any decent performance on this task? >>>>> >>>>> We're also not totally clear why a perfect run during the short >>>>> sentence-repetition exercise as described above is sometimes followed by a >>>>> mistake in the next run. Why exactly, down to the level of details of >>>>> neuronal connections, can the prediction accuracy go down with an >>>>> additional repetition of a pattern? Is it because the algorithm is >>>>> stochastic? We'd love any insight on that :) >>>>> >>>>> Finally, we'd like to invite interested parties to join us in >>>>> exploring this (and related) NLP applications of NuPIC. I would love to >>>>> learn faster by working with other interested people and bounce ideas off >>>>> of each other. Let me know if you'd like to chat! >>>>> >>>>> Thank you for your time, and your answers to my questions, >>>>> Chetan >>>>> >>>>> _______________________________________________ >>>>> nupic mailing list >>>>> [email protected] >>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> nupic mailing list >>>> [email protected] >>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>> >>>> >>> >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> >> > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > >
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
