(Topic changed to match content
Currently the part of Speech information is not included explicitly
within the word-SDRs.
For my idea I would not want this information in the SDRs. The whole idea
is that the neural network should discover this.
In the Fluent setup, each bit of the word-SDR is fed into its own
(temporal-pooler) column. [...]
The temporal pooler learns the transitions from one word-SDR to another.
As long as you only feed the system exclusively correct english
sentences, it will only learn specific, namely morphologically and
syntactically correct (possible) transitions. This means that most
predictions will tend to be morphologically and syntactically correct (or
at least plausible in correctness). The question is now if this mechanism
This would be great.
is sufficient to reach a state where the system “knows” about morphology
or syntax.
"knowing" would not be the correct term. The nice thing for me would be
that it is "mechanical"; we do not want to need a homuncules.
I personally don’t believe that this is the case (this is more like the
“google-massive-data” approach). In my experience evolutionary systems
rather tend to specialize and become more efficient.
If I understand correctly you say, Fransisco, that Fluent will produce
correct syntax and morphology but... you want some "knowing" too? Do you
mean by this knowing "tagging"?
Such a specialization could be achieved by setting up a hierarchical
structure and have the higher levels learn about syntax and morphology.
But I don’t believe this either. I think that the CLA is built to handle
semantics and not rules. I think that rule systems are implemented using
independent regions that link up in higher regions. This would lead to a
NOUN-region and a VERB-region and so on. At a higher hierarchical level a
NOUN and a VERB CLA-region might merge into a PHRASE-region. In this
layer, semantic information is expressed in phrases. It is imaginable
that the anatomy of the different regions and their location on the
cortical map are heavily influenced by genetics. This would explain why
all human languages have comparable sets of syntactic rules. Noam Chomsky
speaks of the universal grammar, Steven Pinker of the language instinct.
What of this is true is exactly the point to me. If you can make a HTM
system that produces correct syntax then I think you have proven that the
ideas of a universal grammar or language instinct are wrong, or better,
can be deduced from the CLA, which is a very interesting analysis in
itself, I think.
One (or more:-) step further, to get sensible speech one needs the
semantics in the SDRs, and somehow mix this not only with the syntax but
also with intentions and the social situation (after all, you are speaking
to someone, who's presumed knowledge you try to manipulate). This is a
much harder problem. Here I expect some special hardwiring. In humans
evolution has created some special connections between HTMs for this, I
agree, but I think most of its parts are HTMs. I have some ideas on this
to inspire people, if you like..
On 07.03.2014, at 17:34, Matthew Taylor <[email protected]> wrote:
The SDRs coming out of CEPT are already semantically encoded using
their proprietary algorithms [1]. Those encodings do not include POS.
The sequence memory inside NuPIC will learn sequences between words
based on those SDRs, but since they don't include POS or syntactic
information, I doubt it can learn anything about syntax. But because
the CEPT SDRs have other things encoded within them, you could feed
Fluent <animal> <vegetable> <mineral> over and over and it will start
predicting SDRs that match the types of SDR patterns it's seen. This
hackathon demo [2] from last year might help explain.
[1] http://www.youtube.com/watch?v=hjMjhhmYKhI
[2] http://www.youtube.com/watch?v=X4XjYXFRIAQ#t=3240 (54 minutes in)
[2] is difficult to follow for a non-native english speaker like me, but I
have seen your animal-vegetable presentation before and indeed, one could
see this as a very simple reproduction of a syntax. Will study this later.
Bert
The word SDRs in CEPT that Fluent is using have no concept of part of
speech, so I doubt you would get the right types of words in the right
Trying to understand what you mean by this... Don't the SDRs automatically
become part of (hopefully) something language-like inside Fluents' neural
network? In other words... they should become part of speech/language by
using them in speech, not (here that is through
feeding it books)? Call this ("social") process structuration.
places. I have done some experiments with parts of speech tagging
using the POS tags in NLTK as categories for NuPIC [1], and it does
pretty well at guessing what POS is coming next in a sentence, but
this is a very hard problem that can't be done by most humans well
either, because of the possibility of so many branches in human
speech.
I do not mean Fluent should be able to tag. I am interested in how many
hierarchical neural levels are needed to get a syntactically correct output,
even though the content may be absurd, like: "I was going to the ball and
the ball rolled down the stairs walking to the moon."
If you can make this then yo have, I think, one prerequisite for speech, and
maybe this would not be the most difficult. Linguists now think we have
syntactical rules in our heads. It would be smashing to be able to show that
this is just the outcome of how HTM works!
If HTM is not enough then we may need to add something that has the function
of what psychologists call our short term memory (STM). This can hold up to
7 items for 30 seconds. I am sure STM is needed for speech, but it would
make things a lot easier if it is not needed for a correct syntax. I guess
such an STM will itself be controlled by (part of) a HTM?
Bert
On Thu, Mar 6, 2014 at 9:35 AM, Bert Frederiks <[email protected]> wrote:
What would happen if one would feed Fluent with, say, books for children
(to
keep the task easy enough)? And then to have Fluent auto-associate from
one
word to the next? Would be very interesting. I would predict it shows
psychotic sentences, but probably with correct syntax -- if true then
this
in itself (w/sh)ould be enough to end or change the jobs of most
linguists,
I guess. HTM is necessary but not enough for speech IMHO (if I
understand
well Jeff Hawkins thinks otherwise about this).
Bert
op 28-02-14 06:08, Chetan Surpur schreef:
Hi everyone,
I'm happy to introduce a project I've been working on this week. It's a
platform for language prediction, using NuPIC together with CEPT [1].
The
goal is to make it easy for anyone to build a language-based demo of
NuPIC
without having to know any of the internals of the CLA or CEPT.
In fact, I have not one, but /two/ little projects to open up to you.
The first is nupic.fluent [2], a python library. It builds off of
Subutai's and Matt's hackathon demos [3]. With it, you can create a
model,
feed it a word (also called a "term"), and get a prediction for the
next
one. It's very simple - and that's the point.
The second is nupic.fluent.server [4], a server-based API and sample
web
app using nupic.fluent at its core. You can use it to build a web-based
demo
of language prediction with NuPIC, something we invited the community
to
participate in during the last office hour [5].
But wait, there's more! I've hosted the Fluent server on an EC2
instance,
so you all can play with the Fluent web app right now. Enjoy:
http://bit.ly/nupic-fluent
Note that it's far from production-ready, and it may go down at any
time.
That link is just a little taste for now; I aim to host it in a more
permanent place soon.
Here is a screenshot of it in action:
Inline image 1
Lastly, I invite everyone in the community to come hack on this with
me;
it's under the same license as NuPIC. And of course, feel free to use
it in
your demos (but be wary, it's still very early and the API might/will
change).
Thanks,
Chetan
[1] http://www.cept.at/
[2] https://github.com/numenta/nupic.fluent
[3] http://numenta.org/blog/#demos
[4] https://github.com/numenta/nupic.fluent.server
[5] http://www.youtube.com/watch?v=67q75RnU58A&feature=share&t=37m16s
_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org