That's a good suggestion. I'm actually setting up a swarm for this task right now, and hopefully it'll come up with the best 'n' and 'w', among other parameters.
On Wed, Aug 28, 2013 at 11:04 AM, Erik Blas <[email protected]> wrote: > Have you tried tweaking the 'n' (I keep forgetting what n refers to > here..) and 'w' (if I remember correctly this is a reference to the width > of the number of active bits used in the SDR) for the longer sentences? > > > > > On Tue, Aug 27, 2013 at 9:22 PM, Chetan Surpur <[email protected]> wrote: > >> With all this talk about Natural Language Processing and the hackathon, I >> figured this is a good time to share a little project my friend and I have >> recently started using NuPIC. We call it Linguist, and our goal for it is >> an AI that can read text unsupervised (from Wikipedia and the rest of the >> internet), build a model of language, and provide better autocorrect / >> autopredict for mobile keyboards (as the first useful application for it). >> >> So far, we've tried feeding characters, one at a time, to the CLA, each >> one encoded as a category. We've watched it learn sentences the way the >> melody-learning AI from the last hackathon learned notes. >> >> You can download what we have so far, and try it yourself: >> https://github.com/chetan51/linguist >> It might also be a decent platform to start experimenting with NLP tasks >> for the upcoming hackathon! >> >> We have a couple interesting ideas, and a bunch of questions. >> >> Some ideas: >> >> We want to train it on public text from the internet to build a global >> model, and then install it on a users phone and have it learn (possibly >> with higher weight) by the user's own text messages and emails. This would >> take an already intelligent model and personalize it to the user's own >> style of writing and vocabulary. >> >> We're also thinking of using anomaly detection to fix spelling mistakes, >> and probability thresholds to suggest the rest of the word, phrase, and >> sentence without being annoying. We're hoping that the CLA will live up to >> be a good algorithm for this application, and we're very curious to see how >> well it will do. >> >> Some questions: >> >> While playing with it, we noticed that it learns sequences pretty >> quickly, but patterns very slowly. We repeated a short sentence many times, >> and it was able to predict fairly correctly the rest of the sentence at >> every position in the sentence after a couple of repetitions. But when we >> fed it long text, such as novels from Project Gutenberg, its predictions >> were almost totally incoherent. >> >> Could this be because the CLA is currently implemented as just a single >> region, without hierarchies? For that matter, how well can a single region >> do for predicting complex patterns like those in language, beyond just >> simple character transitions? Do we need hierarchy support before we'll see >> any decent performance on this task? >> >> We're also not totally clear why a perfect run during the short >> sentence-repetition exercise as described above is sometimes followed by a >> mistake in the next run. Why exactly, down to the level of details of >> neuronal connections, can the prediction accuracy go down with an >> additional repetition of a pattern? Is it because the algorithm is >> stochastic? We'd love any insight on that :) >> >> Finally, we'd like to invite interested parties to join us in exploring >> this (and related) NLP applications of NuPIC. I would love to learn faster >> by working with other interested people and bounce ideas off of each other. >> Let me know if you'd like to chat! >> >> Thank you for your time, and your answers to my questions, >> Chetan >> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> >> > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > >
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
