Have you tried tweaking the 'n' (I keep forgetting what n refers to here..)
and 'w'  (if I remember correctly this is a reference to the width of the
number of active bits used in the SDR) for the longer sentences?




On Tue, Aug 27, 2013 at 9:22 PM, Chetan Surpur <[email protected]> wrote:

> With all this talk about Natural Language Processing and the hackathon, I
> figured this is a good time to share a little project my friend and I have
> recently started using NuPIC. We call it Linguist, and our goal for it is
> an AI that can read text unsupervised (from Wikipedia and the rest of the
> internet), build a model of language, and provide better autocorrect /
> autopredict for mobile keyboards (as the first useful application for it).
>
> So far, we've tried feeding characters, one at a time, to the CLA, each
> one encoded as a category. We've watched it learn sentences the way the
> melody-learning AI from the last hackathon learned notes.
>
> You can download what we have so far, and try it yourself:
> https://github.com/chetan51/linguist
> It might also be a decent platform to start experimenting with NLP tasks
> for the upcoming hackathon!
>
> We have a couple interesting ideas, and a bunch of questions.
>
> Some ideas:
>
> We want to train it on public text from the internet to build a global
> model, and then install it on a users phone and have it learn (possibly
> with higher weight) by the user's own text messages and emails. This would
> take an already intelligent model and personalize it to the user's own
> style of writing and vocabulary.
>
> We're also thinking of using anomaly detection to fix spelling mistakes,
> and probability thresholds to suggest the rest of the word, phrase, and
> sentence without being annoying. We're hoping that the CLA will live up to
> be a good algorithm for this application, and we're very curious to see how
> well it will do.
>
> Some questions:
>
> While playing with it, we noticed that it learns sequences pretty quickly,
> but patterns very slowly. We repeated a short sentence many times, and it
> was able to predict fairly correctly the rest of the sentence at every
> position in the sentence after a couple of repetitions. But when we fed it
> long text, such as novels from Project Gutenberg, its predictions were
> almost totally incoherent.
>
> Could this be because the CLA is currently implemented as just a single
> region, without hierarchies? For that matter, how well can a single region
> do for predicting complex patterns like those in language, beyond just
> simple character transitions? Do we need hierarchy support before we'll see
> any decent performance on this task?
>
> We're also not totally clear why a perfect run during the short
> sentence-repetition exercise as described above is sometimes followed by a
> mistake in the next run. Why exactly, down to the level of details of
> neuronal connections, can the prediction accuracy go down with an
> additional repetition of a pattern? Is it because the algorithm is
> stochastic? We'd love any insight on that :)
>
> Finally, we'd like to invite interested parties to join us in exploring
> this (and related) NLP applications of NuPIC. I would love to learn faster
> by working with other interested people and bounce ideas off of each other.
> Let me know if you'd like to chat!
>
> Thank you for your time, and your answers to my questions,
> Chetan
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to