[nupic-dev] NLP experiments with NuPIC

Matthew Taylor Thu, 03 Oct 2013 10:01:36 -0700

I've been putting together some experiments with NLP and CEPT's word
SDRs. Thanks to Subutai and Francisco for your help with this.


I've got some initial decent results, at least proving that we can
take CEPT's SDRs as input for the CLA and get predicted SDRs back out
and get the "similar terms" for the SDR from CEPT's API.

https://github.com/rhyolight/nupic_nlp

The README on that repo is extensive, so if you are interested, please
get a CEPT API key[1] and try it out with your own word associations.
Here is an example (from the README):

    $ ./run_association_experiment.py resources/animals.txt
resources/vegetables.txt -p 100 -t 1000
    Prediction output for 1000 pairs of terms

    #COUNT        TERM ONE        TERM TWO | TERM TWO PREDICTION
    --------------------------------------------------------------------
    #  100          salmon          endive |              lentil
    #  101       crocodile          borage |
    #  102            wolf        turmeric |            amaranth
    #  103         termite       chickweed |
    #  104           quail            poke |
    #  105      woodpecker         shallot |
    #  106         echidna           caper |              tomato
    #  107         panther            guar |
    #  108             ape       tomatillo |       chrysanthemum
    #  109             bee         cabbage |
    #  110        seahorse          sorrel |
    #  111           camel       tomatillo |          lemongrass
    #  112             rat          chives |
    #  113            crab             yam |              turnip

This script takes a random term from the first file and a random term
from the second. It converts each term to an SDR through the CEPT API
and feeds term #1 and term #2 into NuPIC, bypassing the spacial pooler
and sending it right into the TP (as described in the hello_tp
example[2]). The next prediction after feeding in term #1 is preserved
and printed to the console. Then it resets the TP so that it can only
learn that simple one->two relationship. In the sample above, NuPIC
should only be predicting plants or vegatables, given that the
association I'm training it on is "animal" --> "vegetable".

This trivial example seems to be working rather well, although NuPIC
doesn't always have a valid SDR prediction. The predictions it does
create almost always seem to be some sort of plant. Even more
interesting is that sometimes NuPIC predicts SDRs that resolve to
words outside the range of the input values.

Happy hacking!
---------
Matt Taylor
OS Community Flag-Bearer
Numenta

[1] https://cept.3scale.net/signup (YOU MUST upgrade your account to
use the API endpoints this project requires, email [email protected]
and tell him you're working on NuPIC NLP tasks and he'll upgrade you.)
[2] https://github.com/numenta/nupic/blob/master/examples/tp/hello_tp.py

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

[nupic-dev] NLP experiments with NuPIC

Reply via email to