Re: [nupic-dev] NLP experiments with NuPIC

Daniel Jachyra Wed, 16 Oct 2013 00:57:57 -0700

Please, could you help me with following error:

nupic@nupic-vm:~/nupic_nlp-master$ ./run_association_experiment.py
resources/animals.txt resources/vegetables.txt -p 100 -t 1000
Prediction output for 1000 pairs of terms


#COUNT        TERM ONE        TERM TWO | TERM TWO PREDICTION
--------------------------------------------------------------------
Traceback (most recent call last):
  File "./run_association_experiment.py", line 80, in <module>
    main()
  File "./run_association_experiment.py", line 76, in main
    runner.random_dual_association(args[0], args[1])
  File "/home/nupic/nupic_nlp-master/nupic_nlp/runner.py", line 65, in
random_dual_association
    self.associate(associations)
  File "/home/nupic/nupic_nlp-master/nupic_nlp/runner.py", line 40, in
associate
    term2_prediction = self._feed_term(term1, fetch_result)
  File "/home/nupic/nupic_nlp-master/nupic_nlp/runner.py", line 75, in
_feed_term
    predicted_bitmap = self.nupic.feed(sdr_array)
  File "/home/nupic/nupic_nlp-master/nupic_nlp/nupic_words.py", line 24, in
feed
    predicted_cells = tp.getPredictedState()
  File
"/home/nupic/nta/eng/lib/python2.7/site-packages/nupic/research/TP10X2.py",
line 296, in __getattr__
    raise AttributeError("'TP' object has no attribute '%s'" % name)
AttributeError: 'TP' object has no attribute 'getPredictedState'

-----Original Message-----
From: nupic [mailto:[email protected]] On Behalf Of Matthew
Taylor
Sent: Monday, October 07, 2013 1:41 AM
To: NuPIC general mailing list.
Subject: Re: [nupic-dev] NLP experiments with NuPIC

I've added some work to my NuPIC / NLP repo that does POS predictions:

https://github.com/rhyolight/nupic_nlp#parts-of-speech

This experiment does not require the CEPT API, so anyone should be able to
run it just by checking it out and installing. It parses a given corpus,
decodes all the parts of speech tags for each sentence, and uses a category
encoder to pass the POS into NuPIC, predicting the next POS.

Here is some example output:

$ ./run_pos_experiment.py -t 06_how_thor_got_the_hammer.txt ...
            All           determiner              pronoun
            the           determiner                 noun
           gods                 noun                 noun
           felt           past tense                    .
           very               adverb          preposition
          sorry            adjective          proper noun
            for          preposition                 noun
         little            adjective              pronoun
           Brok          proper noun                 noun
              .                    .           past tense
           They              pronoun              pronoun
        thought           past tense           past tense
           Loki          proper noun              pronoun
              '                                past tense
              s                 noun                 noun
         things                 noun                    .
           were           past tense                    .
           fine                 noun          preposition
              .                    .                    .
...

Column 1: input words
Column 2: POS
Column 3: predicted POS for the same word

There are some interesting things here. NuPIC commonly predicts a pronoun as
the first word after a sentence, because that's the most common word
starting a sentence within the corpus. It also always predicts a noun will
follow a determiner, because they usually do.

While NuPIC isn't doing great, it does tend to pick up small POS phrases,
and is pretty good and predicting the end of sentences. But this POS problem
is not something I'd expect it to nail, frankly. It's not something a human
can do well on either. Each phrase is a tree, and at any point in the
phrase, could branch in multiple directions.
NuPIC is going to make its best guess, but will likely be wrong most of the
time. A more interesting experiment would be to turn this into an anomaly
experiment. Once it's been trained on some text, incoming nonsense grammar
should trigger high anomaly scores.

Another thing you might note is that NLTK doesn't tag all the words
properly. Nouns like "bit" are commonly mis-categorized as a noun instead of
a verb in phrases like "the horse bit the dog", and vice versa. If anyone is
experienced with NLTK, I'd be happy to get some help improving POS tag
accuracy.

I don't have time to continue these experiments, but I hope this lays some
of the groundwork for anyone interested in the NLP focus of the Hackathon.
I've added this to our list of NLP challenges on our wiki:

https://github.com/numenta/nupic/wiki/Natural-Language-Processing#challenges
---------
Matt Taylor
OS Community Flag-Bearer
Numenta


On Thu, Oct 3, 2013 at 10:01 AM, Matthew Taylor <[email protected]> wrote:
> Oh by the way, keep in mind that I'm still a python novice.
> Improvements, clarifications, and pull requests are welcome!
> ---------
> Matt Taylor
> OS Community Flag-Bearer
> Numenta
>
>
> On Thu, Oct 3, 2013 at 9:59 AM, Matthew Taylor <[email protected]> wrote:
>> I've been putting together some experiments with NLP and CEPT's word 
>> SDRs. Thanks to Subutai and Francisco for your help with this.
>>
>> I've got some initial decent results, at least proving that we can 
>> take CEPT's SDRs as input for the CLA and get predicted SDRs back out 
>> and get the "similar terms" for the SDR from CEPT's API.
>>
>> https://github.com/rhyolight/nupic_nlp
>>
>> The README on that repo is extensive, so if you are interested, 
>> please get a CEPT API key[1] and try it out with your own word
associations.
>> Here is an example (from the README):
>>
>>     $ ./run_association_experiment.py resources/animals.txt 
>> resources/vegetables.txt -p 100 -t 1000
>>     Prediction output for 1000 pairs of terms
>>
>>     #COUNT        TERM ONE        TERM TWO | TERM TWO PREDICTION
>>     --------------------------------------------------------------------
>>     #  100          salmon          endive |              lentil
>>     #  101       crocodile          borage |
>>     #  102            wolf        turmeric |            amaranth
>>     #  103         termite       chickweed |
>>     #  104           quail            poke |
>>     #  105      woodpecker         shallot |
>>     #  106         echidna           caper |              tomato
>>     #  107         panther            guar |
>>     #  108             ape       tomatillo |       chrysanthemum
>>     #  109             bee         cabbage |
>>     #  110        seahorse          sorrel |
>>     #  111           camel       tomatillo |          lemongrass
>>     #  112             rat          chives |
>>     #  113            crab             yam |              turnip
>>
>> This script takes a random term from the first file and a random term 
>> from the second. It converts each term to an SDR through the CEPT API 
>> and feeds term #1 and term #2 into NuPIC, bypassing the spacial 
>> pooler and sending it right into the TP (as described in the hello_tp 
>> example[2]). The next prediction after feeding in term #1 is 
>> preserved and printed to the console. Then it resets the TP so that 
>> it can only learn that simple one->two relationship. In the sample 
>> above, NuPIC should only be predicting plants or vegatables, given 
>> that the association I'm training it on is "animal" --> "vegetable".
>>
>> This trivial example seems to be working rather well, although NuPIC 
>> doesn't always have a valid SDR prediction. The predictions it does 
>> create almost always seem to be some sort of plant. Even more 
>> interesting is that sometimes NuPIC predicts SDRs that resolve to 
>> words outside the range of the input values.
>>
>> Happy hacking!
>> ---------
>> Matt Taylor
>> OS Community Flag-Bearer
>> Numenta
>>
>> [1] https://cept.3scale.net/signup (YOU MUST upgrade your account to 
>> use the API endpoints this project requires, email [email protected] 
>> and tell him you're working on NuPIC NLP tasks and he'll upgrade 
>> you.) [2] 
>> https://github.com/numenta/nupic/blob/master/examples/tp/hello_tp.py

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org


_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] NLP experiments with NuPIC

Reply via email to