I am very interested in parsing the constructions used in WordNet and Wiktionary glosses (i.e. definitions). Here are some samples from WordNet online http://wordnet.princeton.edu/perl/webwn . The glosses are parenthesized, and examples are in italics for those of you with rich text email editors.
(1) very simple patterns ruble - (the basic unit of money in Tajikistan) ruble, rouble (the basic unit of money in Russia) lira, Maltese lira (the basic unit of money on Malta; equal to 100 cents) lira, Turkish lira (the basic unit of money in Turkey) lira, Italian lira (formerly the basic unit of money in Italy; equal to 100 centesimi) (2) complex constructions break (terminate) "She interrupted her pregnancy"; "break a lucky streak"; "break the cycle of poverty break, separate, split up, fall apart, come apart (become separated into pieces or fragments) "The figurine broke"; "The freshly baked loaf fell apart" break (render inoperable or ineffective) "You broke the alarm clock when you took it apart!" break, bust (ruin completely) "He busted my radio!" break (destroy the integrity of; usually by force; cause to separate into pieces or fragments) "He broke the glass plate"; "She broke the match" transgress, offend, infract, violate, go against, breach, break (act in disregard of laws, rules, contracts, or promises) "offend all laws of humanity"; "violate the basic laws or human civilization"; "break a law"; "break a promise" break, break out, break away (move away or escape suddenly) "The horses broke from the stable"; "Three inmates broke jail"; "Nobody can break out--this prison is high security break (scatter or part) "The clouds broke after the heavy downpour" Having a dialog system gives one the ability to query a contributing user about otherwise confusing or circular glosses. Plus one can always recurse into a session to understand a word or phrase used in a containing gloss. And after the commonly occuring word senses from Wiktionary / WordNet glosses are understood and incorporated in the KB, its on to Wikipedia. For the latter, I'm closely monitoring the Cyc Foundation effort to link OpenCyc with Wikipedia topics. -Steve Stephen L. Reed Artificial Intelligence Researcher http://texai.org/blog http://texai.org 3008 Oak Crest Ave. Austin, Texas, USA 78704 512.791.7860 ----- Original Message ---- From: William Pearson <[EMAIL PROTECTED]> To: [email protected] Sent: Thursday, January 10, 2008 3:04:34 PM Subject: Re: [agi] Incremental Fluid Construction Grammar released On 10/01/2008, Benjamin Goertzel <[EMAIL PROTECTED]> wrote: > On Jan 10, 2008 10:26 AM, William Pearson <[EMAIL PROTECTED]> wrote: > > On 10/01/2008, Benjamin Goertzel <[EMAIL PROTECTED]> wrote: > > > > I'll be a lot more interested when people start creating NLP systems > > > > that are syntactically and semantically processing statements *about* > > > > words, sentences and other linguistic structures and adding syntactic > > > > and semantic rules based on those sentences. > > > > Note the new emphasis ;-) You example didn't have statements *about* > > words, but new rules were inferred from word usage. > > Well, here's the thing. > > Dictionary text and English-grammar-textbook text are highly ambiguous and > complex English... so you'll need a very sophisticated NLP system to be able > to grok them... Firstly, so what? Why not allow for the fact that there will hopefully be a sophisticated NLP system in the system at some point? Give it the hooks to use dictionary style acquisition, even if it won't for the first x years of development. We are aiming for adult human-level in the end, right? Not just a 5 year old. It will make adding French or another language a whole lot quicker, when it comes to that level. Retrofitting the ability may or may not be easy at that stage. It would be better to figure out whether it is easy or not before settling on an architecture. My hunch, is that it is not easy. Secondly, I'm not buying that it is any more complex than dealing with other domains. You easily get equal complexity dealing with non-linguistic stuff such as This is a battery A battery can be part of a machine Putting a battery in the battery holder, gives the machine power Is as complex, if not more so, than un- is a prefix A prefix is the front part of a word Adding un- to a, "word," is equivalent to saying, "not word." What the system does after processing these different sets of sentences is vastly different. A difference worth exploring before settling on an architecture, IMO. Not building the potential to have a capability into a baby based AI, even if it is not initially used, means when the AI is grown up it still won't be able to have that capability. Unless you are relying on it getting to the self-modifying code phase before the asking-what-words-mean phase. Will ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?& ____________________________________________________________________________________ Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244&id_secret=84435099-a2f4b6
