Re: [nupic-dev] HTM in Natural Language Processing

Francisco Webber Sat, 24 Aug 2013 10:24:32 -0700

For those who don't want to use the API and for evaluation purposes, I would 
propose that we choose some reference text and I convert it into a sequence of 
SDRs. This file could be used for training.
I would also generate a list of all words contained in the text, together with 
their SDRs to be used as conversion table.
As a simple test measure we could feed a sequence of SDRs into a trained 
network and see if the HTM makes the right prediction about the following 
word(s). 
The last file to produce for a complete framework would be a list of lets say 
100 word sequences with their correct continuation.
The word sequences could be for example the beginnings of phrases with more 
than n words (n being the number of steps ahead that the CLA can predict ahead)
This could be the beginning of a measuring set-up that allows to compare 
different CLA-implementation flavors.


Any suggestions for a text to choose?

Francisco

On 24.08.2013, at 17:12, Matthew Taylor wrote:

> Very cool, Francisco. Here is where you can get cept API credentials: 
> https://cept.3scale.net/signup
> 
> ---------
> Matt Taylor
> OS Community Flag-Bearer
> Numenta
> 
> 
> On Fri, Aug 23, 2013 at 5:07 PM, Francisco Webber <[email protected]> wrote:
> Just a short post scriptum:
> 
> The public version of our API doesn't actually contain the generic conversion 
> function. But if people from the HTM community want to experiment just click 
> the "Request for Beta-Program" button and I will upgrade your accounts 
> manually.
> 
> Francisco
> 
> On 24.08.2013, at 01:59, Francisco Webber wrote:
> 
> > Jeff,
> > I thought about this already.
> > We have a REST API where you can send a word in and get the SDR back, and 
> > vice versa.
> > I invite all who want to experiment to try it out.
> > You just need to get credentials at our website: www.cept.at.
> >
> > In mid-term it would be cool to create some sort of evaluation set, that 
> > could be used to measure progress while improving the CLA.
> >
> > We are continuously improving our Retina but the version that is currently 
> > online works pretty well already.
> >
> > I hope that will help
> >
> > Francisco
> >
> > On 24.08.2013, at 01:46, Jeff Hawkins wrote:
> >
> >> Francisco,
> >> Your work is very cool.  Do you think it would be possible to make 
> >> available
> >> your word SDRs (or a sufficient subset of them) for experimentation?  I
> >> imagine there would be interested in the NuPIC community in training a CLA
> >> on text using your word SDRs.  You might get some useful results more
> >> quickly.  You could do this under a research only license or something like
> >> that.
> >> Jeff
> >>
> >> -----Original Message-----
> >> From: nupic [mailto:[email protected]] On Behalf Of Francisco
> >> Webber
> >> Sent: Wednesday, August 21, 2013 1:01 PM
> >> To: NuPIC general mailing list.
> >> Subject: Re: [nupic-dev] HTM in Natural Language Processing
> >>
> >> Hello,
> >> I am one of the founders of CEPT Systems and lead researcher of our retina
> >> algorithm.
> >>
> >> We have developed a method to represent words by a bitmap pattern capturing
> >> most of its "lexical semantics". (A text sensor) Our word-SDRs fulfill all
> >> the requirements for "good" HTM input data.
> >>
> >> - Words with similar meaning "look" similar
> >> - If you drop random bits in the representation the semantics remain intact
> >> - Only a small number (up to 5%) of bits are set in a word-SDR
> >> - Every bit in the representation corresponds to a specific semantic 
> >> feature
> >> of the language used
> >> - The Retina (sensory organ for a HTM) can be trained on any language
> >> - The retina training process is fully unsupervised.
> >>
> >> We have found out that the word-SDR by itself (without using any HTM yet)
> >> can improve many NLP problems that are only poorly solved using the
> >> traditional statistic approaches.
> >> We use the SDRs to:
> >> - Create fingerprints of text documents which allows us to compare them for
> >> semantic similarity using simple (euclidian) similarity measures
> >> - We can automatically detect polysemy and disambiguate multiple meanings.
> >> - We can characterize any text with context terms for automatic
> >> search-engine query-expansion .
> >>
> >> We hope to successfully link-up our Retina to an HTM network to go beyond
> >> lexical semantics into the field of "grammatical semantics".
> >> This would hopefully lead to improved abstracting-, conversation-, question
> >> answering- and translation- systems..
> >>
> >> Our correct web address is www.cept.at (no kangaroos in Vienna ;-)
> >>
> >> I am interested in any form of cooperation to apply HTM technology to text.
> >>
> >> Francisco
> >>
> >> On 21.08.2013, at 20:16, Christian Cleber Masdeval Braz wrote:
> >>
> >>>
> >>> Hello.
> >>>
> >>> As many of you here i am prety new in HTM technology.
> >>>
> >>> I am a researcher in Brazil and I am going to start my Phd program soon.
> >> My field of interest is NLP and the extraction of knowledge from text. I am
> >> thinking to use the ideas behind the Memory Prediction Framework to
> >> investigate semantic information retrieval from the Web, and answer
> >> questions in natural language. I intend to use the HTM implementation as
> >> base to do this.
> >>>
> >>> I apreciate a lot if someone could answer some questions:
> >>>
> >>> - Are there some researches related to HTM and NLP? Could indicate them?
> >>>
> >>> - Is HTM proper to address this problem? Could it learn, without
> >> supervision, the grammar of a language or just help in some aspects as 
> >> Named
> >> Entity Recognition?
> >>>
> >>>
> >>>
> >>> Regards,
> >>>
> >>> Christian
> >>>
> >>>
> >>> _______________________________________________
> >>> nupic mailing list
> >>> [email protected]
> >>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
> >>
> >>
> >> _______________________________________________
> >> nupic mailing list
> >> [email protected]
> >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
> >>
> >>
> >> _______________________________________________
> >> nupic mailing list
> >> [email protected]
> >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
> >
> >
> > _______________________________________________
> > nupic mailing list
> > [email protected]
> > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
> 
> 
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
> 
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] HTM in Natural Language Processing

Reply via email to