Re: Re: [agi] Understanding Natural Language

Matt Mahoney Sun, 26 Nov 2006 15:22:08 -0800

My point about artificial languages is I don't believe that they are of much 
use in helping to understand or solve the natural language modeling problem, 
which is a central problem to AGI.  Ben mentioned one use, which is to use 
Lojban++ in combination with English to train an AGI in English.  In this case, 
Lojban++ serves to help ground the language, just as using a 3-D modeling 
language could also be used to describe the environment.  In this case, any 
language which is expressive enough to do this and is familiar to the developer 
will do.


It is a different case where we require users to learn an artificial language 
because we don't know how to model natural language.  I don't see how this can 
lead to any significant insights.  There are already many examples of 
unabiguous and easy to parse programming languages (including superficially 
English-like languages such as COBOL and SQL) and formal knowledge 
representation languages (Cycl, prolog, etc).

An AGI has to deal with ambiguity and errors in language.  Consider the 
following sentence which I used earlier: "I could even invent a new branch of 
mathematics, introduce appropriate notation, and express ideas in it."  What 
does "it" refer to?  The solution in an artificial language would be either to 
forbid pronouns (as in most programming languages) or explicitly label it to 
make the meaning explicit.  But people don't want or need to do this.  They can 
figure it out by context.  If your AGI can't use context to solve such problems 
then you haven't solved the natural language modeling problem, and a vast body 
of knowledge will be inaccessible.

I think you will find that writing a Lojban parser will be trivial compared to 
writing an English to Lojban translator.
 
Andrii (lOkadin) Zvorygin <[EMAIL PROTECTED]> wrote:
>My initial reasoning was that right now many programs don't use AI,
>because programmers don't know, and the ones that do can't easily add
>code.

It is because language modeling is unsolved.  Computers would be much easier to 
use if we could talk to them in English.  But they do not understand.  We don't 
know how to make them understand.

But we are making progress.  Google will answer simple, natural language 
questions (although they don't advertise it).  The fact that others haven't 
done it suggests the problem requires vast computational resources and training 
data.



-- Matt Mahoney, [EMAIL PROTECTED]

----- Original Message ----
From: Andrii (lOkadin) Zvorygin <[EMAIL PROTECTED]>
To: [email protected]
Sent: Sunday, November 26, 2006 4:37:02 PM
Subject: Re: Re: [agi] Understanding Natural Language

On 11/25/06, Matt Mahoney <[EMAIL PROTECTED]> wrote:
> Andrii (lOkadin) Zvorygin <[EMAIL PROTECTED]> wrote:
> >> Even if we were able to constrain the grammar, you still have the
> problem that people will still make ungrammatical statements, misspell
> words, omit words, and so on.
> >Amazing you should mention such valid points against natural languages.
>
> This misses the point.  Where are you going to get 1 GB of Lojban text to 
> train your language model?
Well A:  I could just get IRC logs and mailing lists of the current
Lojban community.
B: point is to translate English into Lojban
C: I'm not training a language model. I'm creating a parser, then a
translator, then other things. The translator will have some elements
of an AI probably Bayesian probability will be involved, it's too
early to say however. I may be on the wrong list discussing this.

>If you require that all text pass through a syntax checker for
errors, you will greatly increase the cost of generating your training
data.
Well A: There are rarely any errors -- unlike in a natural language
like say English.
B: Addressed above.

>This is not a trivial problem.
Which one? Maybe as a whole it's not trivial, but when you break it
down the little pieces are all individually trivial.

>It is a big part of why programmers can only write 10 lines of code
per day on projects 1/1000 the size of a language model.
Monolithic programming is the paradigm of the past, is one of the
reasons I'm creating this new development model.
>Then when you have built the model, you will still have a system that
is intolerant of errors and hard to use.
Because of the nature of the development model -- designed after
functional programming languages, going to be able to add functions
anywhere in the process without interupting the rest of the functions,
as it wont be changing the input other functions recieve(unless that
is the intent).
Hard to use? Well we'll see when I have a basic implementation, the
whole point is so that it will be easy to use, maybe it wont work out
though --  can't see how. .iacu'i(skepticism)
>Your language model needs to have a better way to deal with
inconsistency than to report errors and make more work for the user.
It can easily just check what the previous response of this user, or
someone else that has made a similar error was when correcting.
Trivial once we get the implementation going.
>
> >Lojban already exceeds many natural languages in it's ability to express.
>
> How so?  In English I can use mathematical notation understood by others to 
> express complex ideas.  I could even invent a new branch of mathematics, 
> introduce appropriate notation, and express ideas in it.

Stops being English when it starts being "mathematical notation". In
English, mathematical notation usually has either non-standard or
ungrammatical formulation. .i.e. "right bracket one plus two left
bracket times four equals x?"
as opposed to the same sentance in lojban "li vei pa su'i re ve'o pi'i
vo du da". Where each word corresponds to a math character(li declares
it a number).

There is no such distinction in Lojban,  cmavo and gismu for math can
be used anywhere in the language. I can/do add any mathematical
feature and use it while speaking. .i.e. "xu do du mi" meaning "are
you the identity of me?" the du is the same as the english "=". "Are
you equal to me" leaves it a little ambiguous and vague.

Please note that I am forced to use the limitations of the English
language while conveying all this, so the translations into English do
tend to be vague/ambiguous in comparison to the Lojban.

Oh and English doesn't have Attitudinals. They are two to three letter
emotional indicators, extremly valuable once you know how to use
them.io.o'a.u'i(respect for previous statement in a pride amusement
kind of way).
>
> There are cognitive limits to what natural language can express, such as the 
> inability to describe a person's face (as well as a picture would), or to 
> describe a novel odor, or to convey learned physical skills such as swimming 
> or riding a bicycle.
I agree.
>One could conceivably introduce notation to describe such things in
any natural or artificial language, but that does not solve the
problem.  Your neural circuitry has limits; it allows you to connect a
face to a name but not to a description.  Any such notation might be
usable by machines but not by humans.
What about adding notations for URI's?
la'o.ubu. http://community.livejournal.com/lojban/17618.html .ubu.PIXrami
the foreign named start quote URI end quote is a picture of me.

Say recreating a scenario however may be more difficult. We'll figure
that out in the MMORPG that will be created to further the advancment
of the AI.

I'm thinking of all of this as not some little project, or training a
bunch of algorithms. I am creating a new development framework, though
that's too specific a term to describe it.

My initial reasoning was that right now many programs don't use AI,
because programmers don't know, and the ones that do can't easily add
code. I was thinking of making an AI library, but realized you'd have
to make one for every language. Then I though about how we could unify
every language. Then I realized that there was non-standard convention
everywhere. It was mentioned to me that Lojban existed I found it
learned it,  am now making parser. Allow for standard compliance to be
easier as it is intuitive and "natural" to comply to standards when
you are simply extending the language that you speak by making new
sentances from words you already know. Sentances which are no
different than programs.

A general AI would emerge out of many hundreds of people using the
development framework and extending it to understand them. Eventually
it will understand it's users and be able to do what they ask of it.
As what has been explained once doesn't have to be again -- this is of
course assuming the small functions connected to distributed network
to redistribute functions to other computers that will look for them
without any need of user interferance (mmorpg will probably run on
same network).
>
> -- Matt Mahoney, [EMAIL PROTECTED]
>
> ----- Original Message ----
> From: Andrii (lOkadin) Zvorygin <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Saturday, November 25, 2006 5:01:04 AM
> Subject: Re: Re: [agi] Understanding Natural Language
>
> On 11/24/06, Matt Mahoney <[EMAIL PROTECTED]> wrote:
> > Andrii (lOkadin) Zvorygin <[EMAIL PROTECTED]> wrote:
> > >I  personally don't understand why everyone seems to insist on using
> > >ambiguous illogical languages to express things when there are viable
> > >alternative available.
> >
> > I think because an AGI needs to communicate in languages that people 
> > already know.
> I don't understand how artificial languages like Lojban contribute to
> this goal.
> We should focus our efforts instead on learning and modeling existing 
> languages.
> >
> > I understand that artificial languages like Lojban and Esperanto and 
> > Attempto have simple grammars.
> >I don't believe they would stay that way if they were widely used for
> person to person communication (as opposed to machine interfaces).
> Lojban grammar is easily extensible and forwards compatible.
> You can add features to the language through CMAvo and GISmu.
> Lojban already exceeds many natural languages in it's ability to
> express.  There are very crucial parts of communication that English
> lacks such as logical connectives and attitudinals.
>
> >Languages evolve over time, both in individuals, and more slowly in
> social groups.
> Are you implying languages evolve faster in individuals?
> >A language model is not a simple set of rules.
> A natural language model is not.
> An artificial language is constructed with rules that were also
> created by individual -- as opposed to groups of -- humans. Lojban was
> especially designed to be logical, unlike Esperanto.
> Therefore making them recreatable by individual humans, and depending
> on your definition: "simple".
> >It is a probability distribution described by a large set of patterns
> such as words, word associations, grammatical structures and
> sentences.
> The approach of a world of blind to seeing is to feel at things.
> Sometimes they wonder if there is not another way.
> >Each time you read or hear a message, the probabilities for the
> observed patterns are increased a little and new patterns are added.
> >In a social setting, these probabilities tend to converge by
> consensus as this knowledge is shared.
> I agree this is a wonderful solution to predicting what the vocabulary
> of a language group is.
> >Formal definitions of artificial languages do not capture this type
> of knowledge, the thousands or millions of new words, idioms, shared
> knowledge and habits of usage.
> sa'u(simply speaking) Artificial languages lack a historic/cultural user base.
> Do I even need to reply to that?zo'o.ui.u'i(last statement humourously
> while happy in an amused kind of way)
> >
> > Even if we were able to constrain the grammar, you still have the problem 
> > that people will still make ungrammatical statements, misspell words, omit 
> > words, and so on.
> Amazing you should mention such valid points against natural languages.
>
> * ungrammatical statements:
> If they were ungrammatical they wouldn't parse in the universal Lojban
> parser(All Lojban parsers can be Universal Lojban parsers as long as
> they follow the few simple grammar rules).
> * misspell words:
>      In Lojban words have a very strict formation,
>      mu'a(for example): GISmu are either in (ccvcv or cvccv formation)
> all others are also syntactically unambiguous.
>      Additionally words in Lojban are specifically designed not to
> sound similar to each other, so chances are it  still looks/sounds
> just like the original word even when misspelled.
>      If a parse error occurs(rare for Lojban users, usually typos) the
> user can always be notified.
> *omit words:
> (I gave an example of some GISmu before, basically they have
> predefined places, so you can always ask a specific question about
> ommitted information by simply putting a "ma" for the SUMti(argument)
> which you wish to know, or "mo" for the SELbri(function).
> *and so on.
> >A language model must be equipped to deal with this.
> go'i.ui(repetition of your statement as confirmation and happiness)
> >It means evaluating lots of soft constraints from a huge database for
> error correction, just like we do to resolve ambiguity in natural
> language.
> If "It" can be substituted as "Resolving ambiguity in natural
> languages" OR(logical connective) "Resolve ambiguity in ambiguous
> languages", I agree.
> > -- Matt Mahoney, [EMAIL PROTECTED]
> mu'omi'eLOKadin(Over to you, my name in Lokadin.)
>
> -----
> This list is sponsored by AGIRI: http://www.agiri.org/email
> To unsubscribe or change your options, please go to:
> http://v2.listbox.com/member/?list_id=303
>
>
>
> -----
> This list is sponsored by AGIRI: http://www.agiri.org/email
> To unsubscribe or change your options, please go to:
> http://v2.listbox.com/member/?list_id=303
>


-- 
ta'o(by the way)  We With You Network at: http://lokiworld.org .i(and)
more on Lojban: http://lojban.org
mu'oimi'e lOkadin (Over, my name is lOkadin)

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303



-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Re: Re: [agi] Understanding Natural Language

Reply via email to