For gauging what's going on, have you tried using metrics (from the opf
docs: https://github.com/numenta/nupic/wiki/Online-Prediction-Framework)?


On Thu, Aug 29, 2013 at 2:59 PM, Chetan Surpur <[email protected]> wrote:

> Update:
>
> I ran a "medium"-sized swarm on the Ugly Duckling story that James Tauber
> formatted in the "HTM in Natural Language Processing" thread. Here are the
> parameters that got changed:
>
> Spatial pooler –
>
> n: 100 => 121
> w: 10 => 21
> synPermInactiveDec: 0.01 => 0.0058
>
> Temporal pooler –
>
> minThreshold: 12 => 9
> activationThreshold: 16 => 12
>
> I've updated the repo at https://github.com/chetan51/linguist with the
> swarmed model.
>
> It's running a little bit better, but it's hard to tell without more
> metrics (I'm currently just eyeballing the 1-10 step predictions for each
> inputted character).
>
> Interestingly, it's able to recognize subsequences fairly well in the Ugly
> Duckling story. The repetition looks like it's helping the CLA find
> patterns more quickly, than if I were to feed it more difficult-to-read
> text. More evidence that it might be a good idea to start training with
> children's books, and increase reading difficulty over time.
>
> I'm also getting the feeling that it requires a lot of training data
> before it can provide decent results. I'm thinking of setting up an EC2
> instance and have it running on larger datasets for longer periods of time,
> and see if there's a threshold of training data after which it will start
> producing useful results.
>
>
> On Wed, Aug 28, 2013 at 4:40 PM, Chetan Surpur <[email protected]> wrote:
>
>> That's a good suggestion. I'm actually setting up a swarm for this task
>> right now, and hopefully it'll come up with the best 'n' and 'w', among
>> other parameters.
>>
>>
>> On Wed, Aug 28, 2013 at 11:04 AM, Erik Blas <[email protected]> wrote:
>>
>>> Have you tried tweaking the 'n' (I keep forgetting what n refers to
>>> here..) and 'w'  (if I remember correctly this is a reference to the width
>>> of the number of active bits used in the SDR) for the longer sentences?
>>>
>>>
>>>
>>>
>>> On Tue, Aug 27, 2013 at 9:22 PM, Chetan Surpur <[email protected]>wrote:
>>>
>>>> With all this talk about Natural Language Processing and the hackathon,
>>>> I figured this is a good time to share a little project my friend and I
>>>> have recently started using NuPIC. We call it Linguist, and our goal for it
>>>> is an AI that can read text unsupervised (from Wikipedia and the rest of
>>>> the internet), build a model of language, and provide better autocorrect /
>>>> autopredict for mobile keyboards (as the first useful application for it).
>>>>
>>>> So far, we've tried feeding characters, one at a time, to the CLA, each
>>>> one encoded as a category. We've watched it learn sentences the way the
>>>> melody-learning AI from the last hackathon learned notes.
>>>>
>>>> You can download what we have so far, and try it yourself:
>>>> https://github.com/chetan51/linguist
>>>> It might also be a decent platform to start experimenting with NLP
>>>> tasks for the upcoming hackathon!
>>>>
>>>> We have a couple interesting ideas, and a bunch of questions.
>>>>
>>>> Some ideas:
>>>>
>>>> We want to train it on public text from the internet to build a global
>>>> model, and then install it on a users phone and have it learn (possibly
>>>> with higher weight) by the user's own text messages and emails. This would
>>>> take an already intelligent model and personalize it to the user's own
>>>> style of writing and vocabulary.
>>>>
>>>> We're also thinking of using anomaly detection to fix spelling
>>>> mistakes, and probability thresholds to suggest the rest of the word,
>>>> phrase, and sentence without being annoying. We're hoping that the CLA will
>>>> live up to be a good algorithm for this application, and we're very curious
>>>> to see how well it will do.
>>>>
>>>> Some questions:
>>>>
>>>> While playing with it, we noticed that it learns sequences pretty
>>>> quickly, but patterns very slowly. We repeated a short sentence many times,
>>>> and it was able to predict fairly correctly the rest of the sentence at
>>>> every position in the sentence after a couple of repetitions. But when we
>>>> fed it long text, such as novels from Project Gutenberg, its predictions
>>>> were almost totally incoherent.
>>>>
>>>> Could this be because the CLA is currently implemented as just a single
>>>> region, without hierarchies? For that matter, how well can a single region
>>>> do for predicting complex patterns like those in language, beyond just
>>>> simple character transitions? Do we need hierarchy support before we'll see
>>>> any decent performance on this task?
>>>>
>>>> We're also not totally clear why a perfect run during the short
>>>> sentence-repetition exercise as described above is sometimes followed by a
>>>> mistake in the next run. Why exactly, down to the level of details of
>>>> neuronal connections, can the prediction accuracy go down with an
>>>> additional repetition of a pattern? Is it because the algorithm is
>>>> stochastic? We'd love any insight on that :)
>>>>
>>>> Finally, we'd like to invite interested parties to join us in exploring
>>>> this (and related) NLP applications of NuPIC. I would love to learn faster
>>>> by working with other interested people and bounce ideas off of each other.
>>>> Let me know if you'd like to chat!
>>>>
>>>> Thank you for your time, and your answers to my questions,
>>>> Chetan
>>>>
>>>> _______________________________________________
>>>> nupic mailing list
>>>> [email protected]
>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>
>>>>
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to