Re: [nupic-discuss] Introducing Fluent - Powered by NuPIC and CEPT

Francisco Webber Fri, 07 Mar 2014 19:51:07 -0800

Matt,
you are right. Currently the part of Speech information is not included 
explicitly within the word-SDRs.
But we do have the information as meta-tag. This means that each word-SDR is 
linked to at least one POS type.
Some words like “fire” have even several tags like NOUN and VERB.
We use this feature mainly to filter the result list of the “similar term 
function”. 
In a next version of our word-retina we plan to generate distinc word-SDRs for 
every word-POS combination.
This means that “fire” used as verb will have a different SDR pattern than fire 
used as noun.
All morphological variants of words have their own SDRs. This means when the 
CLA learns sequences of words, it actually learns sequences of morphological 
forms:
In "I eat an apple" and “he eats two apples” 
“eat" has a different SDR than “eats” and “apple" has a different SDR from 
“apples”.
You can verify this using the word-SDR viewer found here: 
http://www.cept.at/demo_retina_viewer.html


In the Fluent setup, each bit of the word-SDR is fed into its own 
(temporal-pooler) column.
The temporal pooler learns the transitions from one word-SDR to another. As 
long as you only feed the system exclusively correct english sentences, it will 
only learn specific, namely morphologically and syntactically correct 
(possible) transitions. This means that most predictions will tend to be 
morphologically and syntactically correct (or at least plausible in 
correctness). The question is now if this mechanism is sufficient to reach a 
state where the system “knows” about morphology or syntax.
I personally don’t believe that this is the case (this is more like the 
“google-massive-data” approach). In my experience evolutionary systems rather 
tend to specialize and become more efficient.
Such a specialization could be achieved by setting up a hierarchical structure 
and have the higher levels learn about syntax and morphology. But I don’t 
believe this either. I think that the CLA is built to handle semantics and not 
rules. I think that rule systems are implemented using independent regions that 
link up in higher regions. This would lead to a NOUN-region and a VERB-region 
and so on. At a higher hierarchical level a NOUN and a VERB CLA-region might 
merge into a PHRASE-region. In this layer, semantic information is expressed in 
phrases. It is imaginable that the anatomy of the different regions and their 
location on the cortical map are heavily influenced by genetics. This would 
explain why all human languages have comparable sets of syntactic rules. Noam 
Chomsky speaks of the universal grammar, Steven Pinker of the language instinct.

Francisco




On 07.03.2014, at 17:34, Matthew Taylor <[email protected]> wrote:

> Bert,
> 
> The SDRs coming out of CEPT are already semantically encoded using
> their proprietary algorithms [1]. Those encodings do not include POS.
> The sequence memory inside NuPIC will learn sequences between words
> based on those SDRs, but since they don't include POS or syntactic
> information, I doubt it can learn anything about syntax. But because
> the CEPT SDRs have other things encoded within them, you could feed
> Fluent <animal> <vegetable> <mineral> over and over and it will start
> predicting SDRs that match the types of SDR patterns it's seen. This
> hackathon demo [2] from last year might help explain.
> 
> [1] http://www.youtube.com/watch?v=hjMjhhmYKhI
> [2] http://www.youtube.com/watch?v=X4XjYXFRIAQ#t=3240 (54 minutes in)
> ---------
> Matt Taylor
> OS Community Flag-Bearer
> Numenta
> 
> 
> On Fri, Mar 7, 2014 at 5:21 AM, Bert Frederiks <[email protected]> wrote:
>>>> The word SDRs in CEPT that Fluent is using have no concept of part of
>>>> speech, so I doubt you would get the right types of words in the right
>> 
>> 
>> Trying to understand what you mean by this... Don't the SDRs automatically
>> become part of (hopefully) something language-like inside Fluents' neural
>> network? In other words... they should become part of speech/language by
>> using them in speech, not (here that is through
>> feeding it books)? Call this ("social") process structuration.
>> 
>> 
>>>> places. I have done some experiments with parts of speech tagging
>>>> using the POS tags in NLTK as categories for NuPIC [1], and it does
>>>> pretty well at guessing what POS is coming next in a sentence, but
>>>> this is a very hard problem that can't be done by most humans well
>>>> either, because of the possibility of so many branches in human
>>>> speech.
>> 
>> 
>> I do not mean Fluent should be able to tag. I am interested in how many
>> hierarchical neural levels are needed to get a syntactically correct output,
>> even though the content may be absurd, like: "I was going to the ball and
>> the ball rolled down the stairs walking to the moon."
>> 
>> If you can make this then yo have, I think, one prerequisite for speech, and
>> maybe this would not be the most difficult. Linguists now think we have
>> syntactical rules in our heads. It would be smashing to be able to show that
>> this is just the outcome of how HTM works!
>> 
>> If HTM is not enough then we may need to add something that has the function
>> of what psychologists call our short term memory (STM). This can hold up to
>> 7 items for 30 seconds. I am sure STM is needed for speech, but it would
>> make things a lot easier if it is not needed for a correct syntax. I guess
>> such an STM will itself be controlled by (part of) a HTM?
>> 
>> Bert
>> 
>> 
>>>> On Thu, Mar 6, 2014 at 9:35 AM, Bert Frederiks <[email protected]> wrote:
>>>>> 
>>>>> What would happen if one would feed Fluent with, say, books for children
>>>>> (to
>>>>> keep the task easy enough)? And then to have Fluent auto-associate from
>>>>> one
>>>>> word to the next? Would be very interesting. I would predict it shows
>>>>> psychotic sentences, but probably with correct syntax -- if true then
>>>>> this
>>>>> in itself (w/sh)ould be enough to end or change the jobs of most
>>>>> linguists,
>>>>> I guess. HTM is necessary but not enough for speech IMHO (if I
>>>>> understand
>>>>> well Jeff Hawkins thinks otherwise about this).
>>>>> 
>>>>> Bert
>>>>> 
>>>>> op 28-02-14 06:08, Chetan Surpur schreef:
>>>>>> 
>>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> I'm happy to introduce a project I've been working on this week. It's a
>>>>>> platform for language prediction, using NuPIC together with CEPT [1].
>>>>>> The
>>>>>> goal is to make it easy for anyone to build a language-based demo of
>>>>>> NuPIC
>>>>>> without having to know any of the internals of the CLA or CEPT.
>>>>>> 
>>>>>> In fact, I have not one, but /two/ little projects to open up to you.
>>>>>> 
>>>>>> 
>>>>>> The first is nupic.fluent [2], a python library. It builds off of
>>>>>> Subutai's and Matt's hackathon demos [3]. With it, you can create a
>>>>>> model,
>>>>>> feed it a word (also called a "term"), and get a prediction for the
>>>>>> next
>>>>>> one. It's very simple - and that's the point.
>>>>>> 
>>>>>> The second is nupic.fluent.server [4], a server-based API and sample
>>>>>> web
>>>>>> app using nupic.fluent at its core. You can use it to build a web-based
>>>>>> demo
>>>>>> of language prediction with NuPIC, something we invited the community
>>>>>> to
>>>>>> participate in during the last office hour [5].
>>>>>> 
>>>>>> But wait, there's more! I've hosted the Fluent server on an EC2
>>>>>> instance,
>>>>>> so you all can play with the Fluent web app right now. Enjoy:
>>>>>> 
>>>>>> http://bit.ly/nupic-fluent
>>>>>> 
>>>>>> Note that it's far from production-ready, and it may go down at any
>>>>>> time.
>>>>>> That link is just a little taste for now; I aim to host it in a more
>>>>>> permanent place soon.
>>>>>> 
>>>>>> Here is a screenshot of it in action:
>>>>>> 
>>>>>> Inline image 1
>>>>>> 
>>>>>> Lastly, I invite everyone in the community to come hack on this with
>>>>>> me;
>>>>>> it's under the same license as NuPIC. And of course, feel free to use
>>>>>> it in
>>>>>> your demos (but be wary, it's still very early and the API might/will
>>>>>> change).
>>>>>> 
>>>>>> Thanks,
>>>>>> Chetan
>>>>>> 
>>>>>> [1] http://www.cept.at/
>>>>>> [2] https://github.com/numenta/nupic.fluent
>>>>>> [3] http://numenta.org/blog/#demos
>>>>>> [4] https://github.com/numenta/nupic.fluent.server
>>>>>> [5] http://www.youtube.com/watch?v=67q75RnU58A&feature=share&t=37m16s
>> 
>> 
>> 
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
> 
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org


_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-discuss] Introducing Fluent - Powered by NuPIC and CEPT

Reply via email to