Re: [nupic-dev] An NLP application on NuPIC

Chetan Surpur Fri, 30 Aug 2013 16:40:37 -0700

It's near the top of my todo list actually!


On Fri, Aug 30, 2013 at 5:40 PM, Erik Blas <[email protected]> wrote:

> For gauging what's going on, have you tried using metrics (from the opf
> docs: https://github.com/numenta/nupic/wiki/Online-Prediction-Framework)?
>
>
>
> On Thu, Aug 29, 2013 at 2:59 PM, Chetan Surpur <[email protected]> wrote:
>
>> Update:
>>
>> I ran a "medium"-sized swarm on the Ugly Duckling story that James Tauber
>> formatted in the "HTM in Natural Language Processing" thread. Here are the
>> parameters that got changed:
>>
>> Spatial pooler –
>>
>> n: 100 => 121
>> w: 10 => 21
>> synPermInactiveDec: 0.01 => 0.0058
>>
>> Temporal pooler –
>>
>> minThreshold: 12 => 9
>> activationThreshold: 16 => 12
>>
>> I've updated the repo at https://github.com/chetan51/linguist with the
>> swarmed model.
>>
>> It's running a little bit better, but it's hard to tell without more
>> metrics (I'm currently just eyeballing the 1-10 step predictions for each
>> inputted character).
>>
>> Interestingly, it's able to recognize subsequences fairly well in the
>> Ugly Duckling story. The repetition looks like it's helping the CLA find
>> patterns more quickly, than if I were to feed it more difficult-to-read
>> text. More evidence that it might be a good idea to start training with
>> children's books, and increase reading difficulty over time.
>>
>> I'm also getting the feeling that it requires a lot of training data
>> before it can provide decent results. I'm thinking of setting up an EC2
>> instance and have it running on larger datasets for longer periods of time,
>> and see if there's a threshold of training data after which it will start
>> producing useful results.
>>
>>
>> On Wed, Aug 28, 2013 at 4:40 PM, Chetan Surpur <[email protected]>wrote:
>>
>>> That's a good suggestion. I'm actually setting up a swarm for this task
>>> right now, and hopefully it'll come up with the best 'n' and 'w', among
>>> other parameters.
>>>
>>>
>>> On Wed, Aug 28, 2013 at 11:04 AM, Erik Blas <[email protected]> wrote:
>>>
>>>> Have you tried tweaking the 'n' (I keep forgetting what n refers to
>>>> here..) and 'w'  (if I remember correctly this is a reference to the width
>>>> of the number of active bits used in the SDR) for the longer sentences?
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Aug 27, 2013 at 9:22 PM, Chetan Surpur <[email protected]>wrote:
>>>>
>>>>> With all this talk about Natural Language Processing and the
>>>>> hackathon, I figured this is a good time to share a little project my
>>>>> friend and I have recently started using NuPIC. We call it Linguist, and
>>>>> our goal for it is an AI that can read text unsupervised (from Wikipedia
>>>>> and the rest of the internet), build a model of language, and provide
>>>>> better autocorrect / autopredict for mobile keyboards (as the first useful
>>>>> application for it).
>>>>>
>>>>> So far, we've tried feeding characters, one at a time, to the CLA,
>>>>> each one encoded as a category. We've watched it learn sentences the way
>>>>> the melody-learning AI from the last hackathon learned notes.
>>>>>
>>>>> You can download what we have so far, and try it yourself:
>>>>> https://github.com/chetan51/linguist
>>>>> It might also be a decent platform to start experimenting with NLP
>>>>> tasks for the upcoming hackathon!
>>>>>
>>>>> We have a couple interesting ideas, and a bunch of questions.
>>>>>
>>>>> Some ideas:
>>>>>
>>>>> We want to train it on public text from the internet to build a global
>>>>> model, and then install it on a users phone and have it learn (possibly
>>>>> with higher weight) by the user's own text messages and emails. This would
>>>>> take an already intelligent model and personalize it to the user's own
>>>>> style of writing and vocabulary.
>>>>>
>>>>> We're also thinking of using anomaly detection to fix spelling
>>>>> mistakes, and probability thresholds to suggest the rest of the word,
>>>>> phrase, and sentence without being annoying. We're hoping that the CLA 
>>>>> will
>>>>> live up to be a good algorithm for this application, and we're very 
>>>>> curious
>>>>> to see how well it will do.
>>>>>
>>>>> Some questions:
>>>>>
>>>>> While playing with it, we noticed that it learns sequences pretty
>>>>> quickly, but patterns very slowly. We repeated a short sentence many 
>>>>> times,
>>>>> and it was able to predict fairly correctly the rest of the sentence at
>>>>> every position in the sentence after a couple of repetitions. But when we
>>>>> fed it long text, such as novels from Project Gutenberg, its predictions
>>>>> were almost totally incoherent.
>>>>>
>>>>> Could this be because the CLA is currently implemented as just a
>>>>> single region, without hierarchies? For that matter, how well can a single
>>>>> region do for predicting complex patterns like those in language, beyond
>>>>> just simple character transitions? Do we need hierarchy support before
>>>>> we'll see any decent performance on this task?
>>>>>
>>>>> We're also not totally clear why a perfect run during the short
>>>>> sentence-repetition exercise as described above is sometimes followed by a
>>>>> mistake in the next run. Why exactly, down to the level of details of
>>>>> neuronal connections, can the prediction accuracy go down with an
>>>>> additional repetition of a pattern? Is it because the algorithm is
>>>>> stochastic? We'd love any insight on that :)
>>>>>
>>>>> Finally, we'd like to invite interested parties to join us in
>>>>> exploring this (and related) NLP applications of NuPIC. I would love to
>>>>> learn faster by working with other interested people and bounce ideas off
>>>>> of each other. Let me know if you'd like to chat!
>>>>>
>>>>> Thank you for your time, and your answers to my questions,
>>>>> Chetan
>>>>>
>>>>> _______________________________________________
>>>>> nupic mailing list
>>>>> [email protected]
>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> nupic mailing list
>>>> [email protected]
>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] An NLP application on NuPIC

Reply via email to