Re: [nupic-discuss] Emerging natural language systax (was: Introducing Fluent - Powered by NuPIC and CEPT)

Bert Frederiks Sat, 08 Mar 2014 01:07:50 -0800

(Topic changed to match content

Currently the part of Speech information is not included explicitlywithin the word-SDRs.

For my idea I would not want this information in the SDRs. The whole ideais that the neural network should discover this.

In the Fluent setup, each bit of the word-SDR is fed into its own(temporal-pooler) column. [...]

The temporal pooler learns the transitions from one word-SDR to another.As long as you only feed the system exclusively correct englishsentences, it will only learn specific, namely morphologically andsyntactically correct (possible) transitions. This means that mostpredictions will tend to be morphologically and syntactically correct (orat least plausible in correctness). The question is now if this mechanism


This would be great.

is sufficient to reach a state where the system “knows” about morphologyor syntax.

"knowing" would not be the correct term. The nice thing for me would bethat it is "mechanical"; we do not want to need a homuncules.

I personally don’t believe that this is the case (this is more like the“google-massive-data” approach). In my experience evolutionary systemsrather tend to specialize and become more efficient.

If I understand correctly you say, Fransisco, that Fluent will producecorrect syntax and morphology but... you want some "knowing" too? Do youmean by this knowing "tagging"?

Such a specialization could be achieved by setting up a hierarchicalstructure and have the higher levels learn about syntax and morphology.But I don’t believe this either. I think that the CLA is built to handlesemantics and not rules. I think that rule systems are implemented usingindependent regions that link up in higher regions. This would lead to aNOUN-region and a VERB-region and so on. At a higher hierarchical level aNOUN and a VERB CLA-region might merge into a PHRASE-region. In thislayer, semantic information is expressed in phrases. It is imaginablethat the anatomy of the different regions and their location on thecortical map are heavily influenced by genetics. This would explain whyall human languages have comparable sets of syntactic rules. Noam Chomskyspeaks of the universal grammar, Steven Pinker of the language instinct.

What of this is true is exactly the point to me. If you can make a HTMsystem that produces correct syntax then I think you have proven that theideas of a universal grammar or language instinct are wrong, or better,can be deduced from the CLA, which is a very interesting analysis initself, I think.

One (or more:-) step further, to get sensible speech one needs thesemantics in the SDRs, and somehow mix this not only with the syntax butalso with intentions and the social situation (after all, you are speakingto someone, who's presumed knowledge you try to manipulate). This is amuch harder problem. Here I expect some special hardwiring. In humansevolution has created some special connections between HTMs for this, Iagree, but I think most of its parts are HTMs. I have some ideas on thisto inspire people, if you like..

On 07.03.2014, at 17:34, Matthew Taylor <[email protected]> wrote:

The SDRs coming out of CEPT are already semantically encoded using
their proprietary algorithms [1]. Those encodings do not include POS.
The sequence memory inside NuPIC will learn sequences between words
based on those SDRs, but since they don't include POS or syntactic
information, I doubt it can learn anything about syntax. But because
the CEPT SDRs have other things encoded within them, you could feed
Fluent <animal> <vegetable> <mineral> over and over and it will start
predicting SDRs that match the types of SDR patterns it's seen. This
hackathon demo [2] from last year might help explain.

[1] http://www.youtube.com/watch?v=hjMjhhmYKhI
[2] http://www.youtube.com/watch?v=X4XjYXFRIAQ#t=3240 (54 minutes in)

[2] is difficult to follow for a non-native english speaker like me, but Ihave seen your animal-vegetable presentation before and indeed, one couldsee this as a very simple reproduction of a syntax. Will study this later.


Bert

The word SDRs in CEPT that Fluent is using have no concept of part of
speech, so I doubt you would get the right types of words in the right



Trying to understand what you mean by this... Don't the SDRs automatically
become part of (hopefully) something language-like inside Fluents' neural
network? In other words... they should become part of speech/language by
using them in speech, not (here that is through
feeding it books)? Call this ("social") process structuration.

places. I have done some experiments with parts of speech tagging
using the POS tags in NLTK as categories for NuPIC [1], and it does
pretty well at guessing what POS is coming next in a sentence, but
this is a very hard problem that can't be done by most humans well
either, because of the possibility of so many branches in human
speech.



I do not mean Fluent should be able to tag. I am interested in how many
hierarchical neural levels are needed to get a syntactically correct output,
even though the content may be absurd, like: "I was going to the ball and
the ball rolled down the stairs walking to the moon."

If you can make this then yo have, I think, one prerequisite for speech, and
maybe this would not be the most difficult. Linguists now think we have
syntactical rules in our heads. It would be smashing to be able to show that
this is just the outcome of how HTM works!

If HTM is not enough then we may need to add something that has the function
of what psychologists call our short term memory (STM). This can hold up to
7 items for 30 seconds. I am sure STM is needed for speech, but it would
make things a lot easier if it is not needed for a correct syntax. I guess
such an STM will itself be controlled by (part of) a HTM?

Bert

On Thu, Mar 6, 2014 at 9:35 AM, Bert Frederiks <[email protected]> wrote:


What would happen if one would feed Fluent with, say, books for children
(to
keep the task easy enough)? And then to have Fluent auto-associate from
one
word to the next? Would be very interesting. I would predict it shows
psychotic sentences, but probably with correct syntax -- if true then
this
in itself (w/sh)ould be enough to end or change the jobs of most
linguists,
I guess. HTM is necessary but not enough for speech IMHO (if I
understand
well Jeff Hawkins thinks otherwise about this).

Bert

op 28-02-14 06:08, Chetan Surpur schreef:



Hi everyone,

I'm happy to introduce a project I've been working on this week. It's a
platform for language prediction, using NuPIC together with CEPT [1].
The
goal is to make it easy for anyone to build a language-based demo of
NuPIC
without having to know any of the internals of the CLA or CEPT.

In fact, I have not one, but /two/ little projects to open up to you.


The first is nupic.fluent [2], a python library. It builds off of
Subutai's and Matt's hackathon demos [3]. With it, you can create a
model,
feed it a word (also called a "term"), and get a prediction for the
next
one. It's very simple - and that's the point.

The second is nupic.fluent.server [4], a server-based API and sample
web
app using nupic.fluent at its core. You can use it to build a web-based
demo
of language prediction with NuPIC, something we invited the community
to
participate in during the last office hour [5].

But wait, there's more! I've hosted the Fluent server on an EC2
instance,
so you all can play with the Fluent web app right now. Enjoy:

http://bit.ly/nupic-fluent

Note that it's far from production-ready, and it may go down at any
time.
That link is just a little taste for now; I aim to host it in a more
permanent place soon.

Here is a screenshot of it in action:

Inline image 1

Lastly, I invite everyone in the community to come hack on this with
me;
it's under the same license as NuPIC. And of course, feel free to use
it in
your demos (but be wary, it's still very early and the API might/will
change).

Thanks,
Chetan

[1] http://www.cept.at/
[2] https://github.com/numenta/nupic.fluent
[3] http://numenta.org/blog/#demos
[4] https://github.com/numenta/nupic.fluent.server
[5] http://www.youtube.com/watch?v=67q75RnU58A&feature=share&t=37m16s

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-discuss] Emerging natural language systax (was: Introducing Fluent - Powered by NuPIC and CEPT)

Reply via email to