Re: [nupic-dev] NLP experiments with NuPIC

Matthew Taylor Wed, 16 Oct 2013 10:00:37 -0700

Daniel,

What version of python are you using?
What version of NuPIC are you using? (cd into the checkout directory
and run "git log -1" and paste the commit SHA)


Thanks,
---------
Matt Taylor
OS Community Flag-Bearer
Numenta


On Wed, Oct 16, 2013 at 12:57 AM, Daniel Jachyra
<[email protected]> wrote:
> Please, could you help me with following error:
>
> nupic@nupic-vm:~/nupic_nlp-master$ ./run_association_experiment.py
> resources/animals.txt resources/vegetables.txt -p 100 -t 1000
> Prediction output for 1000 pairs of terms
>
> #COUNT        TERM ONE        TERM TWO | TERM TWO PREDICTION
> --------------------------------------------------------------------
> Traceback (most recent call last):
>   File "./run_association_experiment.py", line 80, in <module>
>     main()
>   File "./run_association_experiment.py", line 76, in main
>     runner.random_dual_association(args[0], args[1])
>   File "/home/nupic/nupic_nlp-master/nupic_nlp/runner.py", line 65, in
> random_dual_association
>     self.associate(associations)
>   File "/home/nupic/nupic_nlp-master/nupic_nlp/runner.py", line 40, in
> associate
>     term2_prediction = self._feed_term(term1, fetch_result)
>   File "/home/nupic/nupic_nlp-master/nupic_nlp/runner.py", line 75, in
> _feed_term
>     predicted_bitmap = self.nupic.feed(sdr_array)
>   File "/home/nupic/nupic_nlp-master/nupic_nlp/nupic_words.py", line 24, in
> feed
>     predicted_cells = tp.getPredictedState()
>   File
> "/home/nupic/nta/eng/lib/python2.7/site-packages/nupic/research/TP10X2.py",
> line 296, in __getattr__
>     raise AttributeError("'TP' object has no attribute '%s'" % name)
> AttributeError: 'TP' object has no attribute 'getPredictedState'
>
> -----Original Message-----
> From: nupic [mailto:[email protected]] On Behalf Of Matthew
> Taylor
> Sent: Monday, October 07, 2013 1:41 AM
> To: NuPIC general mailing list.
> Subject: Re: [nupic-dev] NLP experiments with NuPIC
>
> I've added some work to my NuPIC / NLP repo that does POS predictions:
>
> https://github.com/rhyolight/nupic_nlp#parts-of-speech
>
> This experiment does not require the CEPT API, so anyone should be able to
> run it just by checking it out and installing. It parses a given corpus,
> decodes all the parts of speech tags for each sentence, and uses a category
> encoder to pass the POS into NuPIC, predicting the next POS.
>
> Here is some example output:
>
> $ ./run_pos_experiment.py -t 06_how_thor_got_the_hammer.txt ...
>             All           determiner              pronoun
>             the           determiner                 noun
>            gods                 noun                 noun
>            felt           past tense                    .
>            very               adverb          preposition
>           sorry            adjective          proper noun
>             for          preposition                 noun
>          little            adjective              pronoun
>            Brok          proper noun                 noun
>               .                    .           past tense
>            They              pronoun              pronoun
>         thought           past tense           past tense
>            Loki          proper noun              pronoun
>               '                                past tense
>               s                 noun                 noun
>          things                 noun                    .
>            were           past tense                    .
>            fine                 noun          preposition
>               .                    .                    .
> ...
>
> Column 1: input words
> Column 2: POS
> Column 3: predicted POS for the same word
>
> There are some interesting things here. NuPIC commonly predicts a pronoun as
> the first word after a sentence, because that's the most common word
> starting a sentence within the corpus. It also always predicts a noun will
> follow a determiner, because they usually do.
>
> While NuPIC isn't doing great, it does tend to pick up small POS phrases,
> and is pretty good and predicting the end of sentences. But this POS problem
> is not something I'd expect it to nail, frankly. It's not something a human
> can do well on either. Each phrase is a tree, and at any point in the
> phrase, could branch in multiple directions.
> NuPIC is going to make its best guess, but will likely be wrong most of the
> time. A more interesting experiment would be to turn this into an anomaly
> experiment. Once it's been trained on some text, incoming nonsense grammar
> should trigger high anomaly scores.
>
> Another thing you might note is that NLTK doesn't tag all the words
> properly. Nouns like "bit" are commonly mis-categorized as a noun instead of
> a verb in phrases like "the horse bit the dog", and vice versa. If anyone is
> experienced with NLTK, I'd be happy to get some help improving POS tag
> accuracy.
>
> I don't have time to continue these experiments, but I hope this lays some
> of the groundwork for anyone interested in the NLP focus of the Hackathon.
> I've added this to our list of NLP challenges on our wiki:
>
> https://github.com/numenta/nupic/wiki/Natural-Language-Processing#challenges
> ---------
> Matt Taylor
> OS Community Flag-Bearer
> Numenta
>
>
> On Thu, Oct 3, 2013 at 10:01 AM, Matthew Taylor <[email protected]> wrote:
>> Oh by the way, keep in mind that I'm still a python novice.
>> Improvements, clarifications, and pull requests are welcome!
>> ---------
>> Matt Taylor
>> OS Community Flag-Bearer
>> Numenta
>>
>>
>> On Thu, Oct 3, 2013 at 9:59 AM, Matthew Taylor <[email protected]> wrote:
>>> I've been putting together some experiments with NLP and CEPT's word
>>> SDRs. Thanks to Subutai and Francisco for your help with this.
>>>
>>> I've got some initial decent results, at least proving that we can
>>> take CEPT's SDRs as input for the CLA and get predicted SDRs back out
>>> and get the "similar terms" for the SDR from CEPT's API.
>>>
>>> https://github.com/rhyolight/nupic_nlp
>>>
>>> The README on that repo is extensive, so if you are interested,
>>> please get a CEPT API key[1] and try it out with your own word
> associations.
>>> Here is an example (from the README):
>>>
>>>     $ ./run_association_experiment.py resources/animals.txt
>>> resources/vegetables.txt -p 100 -t 1000
>>>     Prediction output for 1000 pairs of terms
>>>
>>>     #COUNT        TERM ONE        TERM TWO | TERM TWO PREDICTION
>>>     --------------------------------------------------------------------
>>>     #  100          salmon          endive |              lentil
>>>     #  101       crocodile          borage |
>>>     #  102            wolf        turmeric |            amaranth
>>>     #  103         termite       chickweed |
>>>     #  104           quail            poke |
>>>     #  105      woodpecker         shallot |
>>>     #  106         echidna           caper |              tomato
>>>     #  107         panther            guar |
>>>     #  108             ape       tomatillo |       chrysanthemum
>>>     #  109             bee         cabbage |
>>>     #  110        seahorse          sorrel |
>>>     #  111           camel       tomatillo |          lemongrass
>>>     #  112             rat          chives |
>>>     #  113            crab             yam |              turnip
>>>
>>> This script takes a random term from the first file and a random term
>>> from the second. It converts each term to an SDR through the CEPT API
>>> and feeds term #1 and term #2 into NuPIC, bypassing the spacial
>>> pooler and sending it right into the TP (as described in the hello_tp
>>> example[2]). The next prediction after feeding in term #1 is
>>> preserved and printed to the console. Then it resets the TP so that
>>> it can only learn that simple one->two relationship. In the sample
>>> above, NuPIC should only be predicting plants or vegatables, given
>>> that the association I'm training it on is "animal" --> "vegetable".
>>>
>>> This trivial example seems to be working rather well, although NuPIC
>>> doesn't always have a valid SDR prediction. The predictions it does
>>> create almost always seem to be some sort of plant. Even more
>>> interesting is that sometimes NuPIC predicts SDRs that resolve to
>>> words outside the range of the input values.
>>>
>>> Happy hacking!
>>> ---------
>>> Matt Taylor
>>> OS Community Flag-Bearer
>>> Numenta
>>>
>>> [1] https://cept.3scale.net/signup (YOU MUST upgrade your account to
>>> use the API endpoints this project requires, email [email protected]
>>> and tell him you're working on NuPIC NLP tasks and he'll upgrade
>>> you.) [2]
>>> https://github.com/numenta/nupic/blob/master/examples/tp/hello_tp.py
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] NLP experiments with NuPIC

Reply via email to