I will make 3 suggestions. All are out of copyright, well known,
uncontroversial, and still taught in schools (At least in the US)

1. Robinson Crusoe - Daniel Defoe

http://www.gutenberg.org/ebooks/521

2. Great Expectations - Charles Dickens

http://www.gutenberg.org/ebooks/1400

3. The Time Machine - H.G. Wells

http://www.gutenberg.org/ebooks/35

Ian


On Sat, Aug 24, 2013 at 10:24 AM, Francisco Webber <[email protected]> wrote:

> For those who don't want to use the API and for evaluation purposes, I
> would propose that we choose some reference text and I convert it into a
> sequence of SDRs. This file could be used for training.
> I would also generate a list of all words contained in the text, together
> with their SDRs to be used as conversion table.
> As a simple test measure we could feed a sequence of SDRs into a trained
> network and see if the HTM makes the right prediction about the following
> word(s).
> The last file to produce for a complete framework would be a list of lets
> say 100 word sequences with their correct continuation.
> The word sequences could be for example the beginnings of phrases with
> more than n words (n being the number of steps ahead that the CLA can
> predict ahead)
> This could be the beginning of a measuring set-up that allows to compare
> different CLA-implementation flavors.
>
> Any suggestions for a text to choose?
>
> Francisco
>
> On 24.08.2013, at 17:12, Matthew Taylor wrote:
>
> Very cool, Francisco. Here is where you can get cept API credentials:
> https://cept.3scale.net/signup
>
> ---------
> Matt Taylor
> OS Community Flag-Bearer
> Numenta
>
>
> On Fri, Aug 23, 2013 at 5:07 PM, Francisco Webber <[email protected]>wrote:
>
>> Just a short post scriptum:
>>
>> The public version of our API doesn't actually contain the generic
>> conversion function. But if people from the HTM community want to
>> experiment just click the "Request for Beta-Program" button and I will
>> upgrade your accounts manually.
>>
>> Francisco
>>
>> On 24.08.2013, at 01:59, Francisco Webber wrote:
>>
>> > Jeff,
>> > I thought about this already.
>> > We have a REST API where you can send a word in and get the SDR back,
>> and vice versa.
>> > I invite all who want to experiment to try it out.
>> > You just need to get credentials at our website: www.cept.at.
>> >
>> > In mid-term it would be cool to create some sort of evaluation set,
>> that could be used to measure progress while improving the CLA.
>> >
>> > We are continuously improving our Retina but the version that is
>> currently online works pretty well already.
>> >
>> > I hope that will help
>> >
>> > Francisco
>> >
>> > On 24.08.2013, at 01:46, Jeff Hawkins wrote:
>> >
>> >> Francisco,
>> >> Your work is very cool.  Do you think it would be possible to make
>> available
>> >> your word SDRs (or a sufficient subset of them) for experimentation?  I
>> >> imagine there would be interested in the NuPIC community in training a
>> CLA
>> >> on text using your word SDRs.  You might get some useful results more
>> >> quickly.  You could do this under a research only license or something
>> like
>> >> that.
>> >> Jeff
>> >>
>> >> -----Original Message-----
>> >> From: nupic [mailto:[email protected]] On Behalf Of
>> Francisco
>> >> Webber
>> >> Sent: Wednesday, August 21, 2013 1:01 PM
>> >> To: NuPIC general mailing list.
>> >> Subject: Re: [nupic-dev] HTM in Natural Language Processing
>> >>
>> >> Hello,
>> >> I am one of the founders of CEPT Systems and lead researcher of our
>> retina
>> >> algorithm.
>> >>
>> >> We have developed a method to represent words by a bitmap pattern
>> capturing
>> >> most of its "lexical semantics". (A text sensor) Our word-SDRs fulfill
>> all
>> >> the requirements for "good" HTM input data.
>> >>
>> >> - Words with similar meaning "look" similar
>> >> - If you drop random bits in the representation the semantics remain
>> intact
>> >> - Only a small number (up to 5%) of bits are set in a word-SDR
>> >> - Every bit in the representation corresponds to a specific semantic
>> feature
>> >> of the language used
>> >> - The Retina (sensory organ for a HTM) can be trained on any language
>> >> - The retina training process is fully unsupervised.
>> >>
>> >> We have found out that the word-SDR by itself (without using any HTM
>> yet)
>> >> can improve many NLP problems that are only poorly solved using the
>> >> traditional statistic approaches.
>> >> We use the SDRs to:
>> >> - Create fingerprints of text documents which allows us to compare
>> them for
>> >> semantic similarity using simple (euclidian) similarity measures
>> >> - We can automatically detect polysemy and disambiguate multiple
>> meanings.
>> >> - We can characterize any text with context terms for automatic
>> >> search-engine query-expansion .
>> >>
>> >> We hope to successfully link-up our Retina to an HTM network to go
>> beyond
>> >> lexical semantics into the field of "grammatical semantics".
>> >> This would hopefully lead to improved abstracting-, conversation-,
>> question
>> >> answering- and translation- systems..
>> >>
>> >> Our correct web address is www.cept.at (no kangaroos in Vienna ;-)
>> >>
>> >> I am interested in any form of cooperation to apply HTM technology to
>> text.
>> >>
>> >> Francisco
>> >>
>> >> On 21.08.2013, at 20:16, Christian Cleber Masdeval Braz wrote:
>> >>
>> >>>
>> >>> Hello.
>> >>>
>> >>> As many of you here i am prety new in HTM technology.
>> >>>
>> >>> I am a researcher in Brazil and I am going to start my Phd program
>> soon.
>> >> My field of interest is NLP and the extraction of knowledge from text.
>> I am
>> >> thinking to use the ideas behind the Memory Prediction Framework to
>> >> investigate semantic information retrieval from the Web, and answer
>> >> questions in natural language. I intend to use the HTM implementation
>> as
>> >> base to do this.
>> >>>
>> >>> I apreciate a lot if someone could answer some questions:
>> >>>
>> >>> - Are there some researches related to HTM and NLP? Could indicate
>> them?
>> >>>
>> >>> - Is HTM proper to address this problem? Could it learn, without
>> >> supervision, the grammar of a language or just help in some aspects as
>> Named
>> >> Entity Recognition?
>> >>>
>> >>>
>> >>>
>> >>> Regards,
>> >>>
>> >>> Christian
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> nupic mailing list
>> >>> [email protected]
>> >>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>> >>
>> >>
>> >> _______________________________________________
>> >> nupic mailing list
>> >> [email protected]
>> >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>> >>
>> >>
>> >> _______________________________________________
>> >> nupic mailing list
>> >> [email protected]
>> >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>> >
>> >
>> > _______________________________________________
>> > nupic mailing list
>> > [email protected]
>> > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Reply via email to