That's a good suggestion. I'm actually setting up a swarm for this task
right now, and hopefully it'll come up with the best 'n' and 'w', among
other parameters.


On Wed, Aug 28, 2013 at 11:04 AM, Erik Blas <[email protected]> wrote:

> Have you tried tweaking the 'n' (I keep forgetting what n refers to
> here..) and 'w'  (if I remember correctly this is a reference to the width
> of the number of active bits used in the SDR) for the longer sentences?
>
>
>
>
> On Tue, Aug 27, 2013 at 9:22 PM, Chetan Surpur <[email protected]> wrote:
>
>> With all this talk about Natural Language Processing and the hackathon, I
>> figured this is a good time to share a little project my friend and I have
>> recently started using NuPIC. We call it Linguist, and our goal for it is
>> an AI that can read text unsupervised (from Wikipedia and the rest of the
>> internet), build a model of language, and provide better autocorrect /
>> autopredict for mobile keyboards (as the first useful application for it).
>>
>> So far, we've tried feeding characters, one at a time, to the CLA, each
>> one encoded as a category. We've watched it learn sentences the way the
>> melody-learning AI from the last hackathon learned notes.
>>
>> You can download what we have so far, and try it yourself:
>> https://github.com/chetan51/linguist
>> It might also be a decent platform to start experimenting with NLP tasks
>> for the upcoming hackathon!
>>
>> We have a couple interesting ideas, and a bunch of questions.
>>
>> Some ideas:
>>
>> We want to train it on public text from the internet to build a global
>> model, and then install it on a users phone and have it learn (possibly
>> with higher weight) by the user's own text messages and emails. This would
>> take an already intelligent model and personalize it to the user's own
>> style of writing and vocabulary.
>>
>> We're also thinking of using anomaly detection to fix spelling mistakes,
>> and probability thresholds to suggest the rest of the word, phrase, and
>> sentence without being annoying. We're hoping that the CLA will live up to
>> be a good algorithm for this application, and we're very curious to see how
>> well it will do.
>>
>> Some questions:
>>
>> While playing with it, we noticed that it learns sequences pretty
>> quickly, but patterns very slowly. We repeated a short sentence many times,
>> and it was able to predict fairly correctly the rest of the sentence at
>> every position in the sentence after a couple of repetitions. But when we
>> fed it long text, such as novels from Project Gutenberg, its predictions
>> were almost totally incoherent.
>>
>> Could this be because the CLA is currently implemented as just a single
>> region, without hierarchies? For that matter, how well can a single region
>> do for predicting complex patterns like those in language, beyond just
>> simple character transitions? Do we need hierarchy support before we'll see
>> any decent performance on this task?
>>
>> We're also not totally clear why a perfect run during the short
>> sentence-repetition exercise as described above is sometimes followed by a
>> mistake in the next run. Why exactly, down to the level of details of
>> neuronal connections, can the prediction accuracy go down with an
>> additional repetition of a pattern? Is it because the algorithm is
>> stochastic? We'd love any insight on that :)
>>
>> Finally, we'd like to invite interested parties to join us in exploring
>> this (and related) NLP applications of NuPIC. I would love to learn faster
>> by working with other interested people and bounce ideas off of each other.
>> Let me know if you'd like to chat!
>>
>> Thank you for your time, and your answers to my questions,
>> Chetan
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to