My point about artificial languages is I don't believe that they are of much use in helping to understand or solve the natural language modeling problem, which is a central problem to AGI. Ben mentioned one use, which is to use Lojban++ in combination with English to train an AGI in English. In this case, Lojban++ serves to help ground the language, just as using a 3-D modeling language could also be used to describe the environment. In this case, any language which is expressive enough to do this and is familiar to the developer will do.
It is a different case where we require users to learn an artificial language because we don't know how to model natural language. I don't see how this can lead to any significant insights. There are already many examples of unabiguous and easy to parse programming languages (including superficially English-like languages such as COBOL and SQL) and formal knowledge representation languages (Cycl, prolog, etc). An AGI has to deal with ambiguity and errors in language. Consider the following sentence which I used earlier: "I could even invent a new branch of mathematics, introduce appropriate notation, and express ideas in it." What does "it" refer to? The solution in an artificial language would be either to forbid pronouns (as in most programming languages) or explicitly label it to make the meaning explicit. But people don't want or need to do this. They can figure it out by context. If your AGI can't use context to solve such problems then you haven't solved the natural language modeling problem, and a vast body of knowledge will be inaccessible. I think you will find that writing a Lojban parser will be trivial compared to writing an English to Lojban translator. Andrii (lOkadin) Zvorygin <[EMAIL PROTECTED]> wrote: >My initial reasoning was that right now many programs don't use AI, >because programmers don't know, and the ones that do can't easily add >code. It is because language modeling is unsolved. Computers would be much easier to use if we could talk to them in English. But they do not understand. We don't know how to make them understand. But we are making progress. Google will answer simple, natural language questions (although they don't advertise it). The fact that others haven't done it suggests the problem requires vast computational resources and training data. -- Matt Mahoney, [EMAIL PROTECTED] ----- Original Message ---- From: Andrii (lOkadin) Zvorygin <[EMAIL PROTECTED]> To: [email protected] Sent: Sunday, November 26, 2006 4:37:02 PM Subject: Re: Re: [agi] Understanding Natural Language On 11/25/06, Matt Mahoney <[EMAIL PROTECTED]> wrote: > Andrii (lOkadin) Zvorygin <[EMAIL PROTECTED]> wrote: > >> Even if we were able to constrain the grammar, you still have the > problem that people will still make ungrammatical statements, misspell > words, omit words, and so on. > >Amazing you should mention such valid points against natural languages. > > This misses the point. Where are you going to get 1 GB of Lojban text to > train your language model? Well A: I could just get IRC logs and mailing lists of the current Lojban community. B: point is to translate English into Lojban C: I'm not training a language model. I'm creating a parser, then a translator, then other things. The translator will have some elements of an AI probably Bayesian probability will be involved, it's too early to say however. I may be on the wrong list discussing this. >If you require that all text pass through a syntax checker for errors, you will greatly increase the cost of generating your training data. Well A: There are rarely any errors -- unlike in a natural language like say English. B: Addressed above. >This is not a trivial problem. Which one? Maybe as a whole it's not trivial, but when you break it down the little pieces are all individually trivial. >It is a big part of why programmers can only write 10 lines of code per day on projects 1/1000 the size of a language model. Monolithic programming is the paradigm of the past, is one of the reasons I'm creating this new development model. >Then when you have built the model, you will still have a system that is intolerant of errors and hard to use. Because of the nature of the development model -- designed after functional programming languages, going to be able to add functions anywhere in the process without interupting the rest of the functions, as it wont be changing the input other functions recieve(unless that is the intent). Hard to use? Well we'll see when I have a basic implementation, the whole point is so that it will be easy to use, maybe it wont work out though -- can't see how. .iacu'i(skepticism) >Your language model needs to have a better way to deal with inconsistency than to report errors and make more work for the user. It can easily just check what the previous response of this user, or someone else that has made a similar error was when correcting. Trivial once we get the implementation going. > > >Lojban already exceeds many natural languages in it's ability to express. > > How so? In English I can use mathematical notation understood by others to > express complex ideas. I could even invent a new branch of mathematics, > introduce appropriate notation, and express ideas in it. Stops being English when it starts being "mathematical notation". In English, mathematical notation usually has either non-standard or ungrammatical formulation. .i.e. "right bracket one plus two left bracket times four equals x?" as opposed to the same sentance in lojban "li vei pa su'i re ve'o pi'i vo du da". Where each word corresponds to a math character(li declares it a number). There is no such distinction in Lojban, cmavo and gismu for math can be used anywhere in the language. I can/do add any mathematical feature and use it while speaking. .i.e. "xu do du mi" meaning "are you the identity of me?" the du is the same as the english "=". "Are you equal to me" leaves it a little ambiguous and vague. Please note that I am forced to use the limitations of the English language while conveying all this, so the translations into English do tend to be vague/ambiguous in comparison to the Lojban. Oh and English doesn't have Attitudinals. They are two to three letter emotional indicators, extremly valuable once you know how to use them.io.o'a.u'i(respect for previous statement in a pride amusement kind of way). > > There are cognitive limits to what natural language can express, such as the > inability to describe a person's face (as well as a picture would), or to > describe a novel odor, or to convey learned physical skills such as swimming > or riding a bicycle. I agree. >One could conceivably introduce notation to describe such things in any natural or artificial language, but that does not solve the problem. Your neural circuitry has limits; it allows you to connect a face to a name but not to a description. Any such notation might be usable by machines but not by humans. What about adding notations for URI's? la'o.ubu. http://community.livejournal.com/lojban/17618.html .ubu.PIXrami the foreign named start quote URI end quote is a picture of me. Say recreating a scenario however may be more difficult. We'll figure that out in the MMORPG that will be created to further the advancment of the AI. I'm thinking of all of this as not some little project, or training a bunch of algorithms. I am creating a new development framework, though that's too specific a term to describe it. My initial reasoning was that right now many programs don't use AI, because programmers don't know, and the ones that do can't easily add code. I was thinking of making an AI library, but realized you'd have to make one for every language. Then I though about how we could unify every language. Then I realized that there was non-standard convention everywhere. It was mentioned to me that Lojban existed I found it learned it, am now making parser. Allow for standard compliance to be easier as it is intuitive and "natural" to comply to standards when you are simply extending the language that you speak by making new sentances from words you already know. Sentances which are no different than programs. A general AI would emerge out of many hundreds of people using the development framework and extending it to understand them. Eventually it will understand it's users and be able to do what they ask of it. As what has been explained once doesn't have to be again -- this is of course assuming the small functions connected to distributed network to redistribute functions to other computers that will look for them without any need of user interferance (mmorpg will probably run on same network). > > -- Matt Mahoney, [EMAIL PROTECTED] > > ----- Original Message ---- > From: Andrii (lOkadin) Zvorygin <[EMAIL PROTECTED]> > To: [email protected] > Sent: Saturday, November 25, 2006 5:01:04 AM > Subject: Re: Re: [agi] Understanding Natural Language > > On 11/24/06, Matt Mahoney <[EMAIL PROTECTED]> wrote: > > Andrii (lOkadin) Zvorygin <[EMAIL PROTECTED]> wrote: > > >I personally don't understand why everyone seems to insist on using > > >ambiguous illogical languages to express things when there are viable > > >alternative available. > > > > I think because an AGI needs to communicate in languages that people > > already know. > I don't understand how artificial languages like Lojban contribute to > this goal. > We should focus our efforts instead on learning and modeling existing > languages. > > > > I understand that artificial languages like Lojban and Esperanto and > > Attempto have simple grammars. > >I don't believe they would stay that way if they were widely used for > person to person communication (as opposed to machine interfaces). > Lojban grammar is easily extensible and forwards compatible. > You can add features to the language through CMAvo and GISmu. > Lojban already exceeds many natural languages in it's ability to > express. There are very crucial parts of communication that English > lacks such as logical connectives and attitudinals. > > >Languages evolve over time, both in individuals, and more slowly in > social groups. > Are you implying languages evolve faster in individuals? > >A language model is not a simple set of rules. > A natural language model is not. > An artificial language is constructed with rules that were also > created by individual -- as opposed to groups of -- humans. Lojban was > especially designed to be logical, unlike Esperanto. > Therefore making them recreatable by individual humans, and depending > on your definition: "simple". > >It is a probability distribution described by a large set of patterns > such as words, word associations, grammatical structures and > sentences. > The approach of a world of blind to seeing is to feel at things. > Sometimes they wonder if there is not another way. > >Each time you read or hear a message, the probabilities for the > observed patterns are increased a little and new patterns are added. > >In a social setting, these probabilities tend to converge by > consensus as this knowledge is shared. > I agree this is a wonderful solution to predicting what the vocabulary > of a language group is. > >Formal definitions of artificial languages do not capture this type > of knowledge, the thousands or millions of new words, idioms, shared > knowledge and habits of usage. > sa'u(simply speaking) Artificial languages lack a historic/cultural user base. > Do I even need to reply to that?zo'o.ui.u'i(last statement humourously > while happy in an amused kind of way) > > > > Even if we were able to constrain the grammar, you still have the problem > > that people will still make ungrammatical statements, misspell words, omit > > words, and so on. > Amazing you should mention such valid points against natural languages. > > * ungrammatical statements: > If they were ungrammatical they wouldn't parse in the universal Lojban > parser(All Lojban parsers can be Universal Lojban parsers as long as > they follow the few simple grammar rules). > * misspell words: > In Lojban words have a very strict formation, > mu'a(for example): GISmu are either in (ccvcv or cvccv formation) > all others are also syntactically unambiguous. > Additionally words in Lojban are specifically designed not to > sound similar to each other, so chances are it still looks/sounds > just like the original word even when misspelled. > If a parse error occurs(rare for Lojban users, usually typos) the > user can always be notified. > *omit words: > (I gave an example of some GISmu before, basically they have > predefined places, so you can always ask a specific question about > ommitted information by simply putting a "ma" for the SUMti(argument) > which you wish to know, or "mo" for the SELbri(function). > *and so on. > >A language model must be equipped to deal with this. > go'i.ui(repetition of your statement as confirmation and happiness) > >It means evaluating lots of soft constraints from a huge database for > error correction, just like we do to resolve ambiguity in natural > language. > If "It" can be substituted as "Resolving ambiguity in natural > languages" OR(logical connective) "Resolve ambiguity in ambiguous > languages", I agree. > > -- Matt Mahoney, [EMAIL PROTECTED] > mu'omi'eLOKadin(Over to you, my name in Lokadin.) > > ----- > This list is sponsored by AGIRI: http://www.agiri.org/email > To unsubscribe or change your options, please go to: > http://v2.listbox.com/member/?list_id=303 > > > > ----- > This list is sponsored by AGIRI: http://www.agiri.org/email > To unsubscribe or change your options, please go to: > http://v2.listbox.com/member/?list_id=303 > -- ta'o(by the way) We With You Network at: http://lokiworld.org .i(and) more on Lojban: http://lojban.org mu'oimi'e lOkadin (Over, my name is lOkadin) ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303 ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
