Matt: To parse English you have to know that pizzas have pepperoni, that demonstrators advocate violence, that cats chase mice, and so on.  There is no neat, tidy algorithm that will generate all of this knowledge.  You can't do any better than to just write down all of these facts.  The data is not compressable.

James: You CAN actually, simply because there is patterns, anytime there are patterns, there is regularity, and the ability to compress things.  And those things are limited, even if on a super-large scale.
  The problem with that is the irregular parts, which have to be handled, and the amount of bad data, which has to be handled.
But a simple example is
ate a pepperoni pizza
ate a tuna pizza
ate a VEGAN SUPREME pizza
ate a Mexican pizza
ate a pineapple pizza

And we can see right off, that these are different types of pizza topping, and we can compress that into a frame easily
Frame Pizza:
  can have Toppings: pepperoni, tuna, pineapple
  can be Type: vegan supreme, mexican

This does take some work, and does require some good data, but can be done.
We can take that further to gather probabilities, and confidences about the Pizza frame, such that we can determine that a pepperoni pizza is the most likely if a random pizza is ordered.

This does not give a perfect collection of information, but alot can be garnered just from this.  This does not solve the AI problem, but does give us a nice building block of Knowledge to start working with.
 
  This is a much preferred method than hand-coding each piece as Cyc has seen, and they are currently coding and using many algorithms now that take advantage of statistical NLP and google to assist and suggest answers, and check the answers they have in place.

There is a simple pattern between Nouns and Verbs as well that can be taken out and extracted with relative ease, and also between Adj and Nouns, and Adv and Verbs.

Ex:
  The dog eats, barks, growls, sniffs, attacks, alerts.

That gives us an initial store of information about a dog frame.

Then if given Rover barked at the mailmen. we can programmatically narrow the possibilities about what Actor can fulfill the "bark" role, and see that dogs bark, and are most likely to bark at the mailman, and give a probability, and confidence.

One problem I have with you task of text compression is the stricture that it retain exactly the same text, as opposed to exactly the same Information.
For a computer science data transmission issue the first is important, but for an AI issue the latter is more important.
The dog sniffed the shoes. and The dog smelled the shoes.  Is so very close in meaning as to be acceptable representation of the event, and many things can be reduced to their component parts, or even use a more common synonym, or word root.
And it much more important that the system would be able to answer the question What did the dog sniff/smell?  as opposed to keeping the data exactly the same.
As long as the answers come out the same, the internal representation could be in chinese or marks in the sand.

James Ratcliff


Matt Mahoney <[EMAIL PROTECTED]> wrote:
James Ratcliff <[EMAIL PROTECTED]> wrote:
>Many of these examples actually arnt hard, if you use some statitisical information and common sense knowledge base.
 
The problem is not that these examples are hard, but that are are millions of them.  To parse English you have to know that pizzas have pepperoni, that demonstrators advocate violence, that cats chase mice, and so on.  There is no neat, tidy algorithm that will generate all of this knowledge.  You can't do any better than to just write down all of these facts.  The data is not compressable.

I said millions, but we really don't know, maybe 10^9 bits.  We have a long history of underestimating the complexity of natural language, going back to SHRDLU, Eliza, and the 1959 BASEBALL program, all of which could parse simple sentences.  Cycorp is the only one who actually collected this much common human knowledge in a structured form.  They probably did not expect it would take 20 years of manual coding, only to discover you can't build the knowledge base first and then tack on a natural language interface later.  Something is still wrong.

We have many ways to represent knowledge: LISP lists, frame-slot, augmented first order logic, term logic, Bayesian, connectionist, NARS, Novamente, etc.  Humans can easily take sentences and convert them into the internal representation of any of these systems.  Yet none of these systems has solved the natural language interface problem.  Why is this?

You can't ignore information theory.  A Turing machine can't model another machine with greater Kolmogorov complexity.  The brain can't understand itself.  We want to build data structures where we can see how knowledge is represented so we can test and debug our systems.  Sorry, information theory doesn't allow it.  You can't have your AGI and understand it too. 

We need to think about opaque representations, systems we can train and test without looking inside, systems that work but we don't know how.  This will be hard, but we have already tried the easy ways.

-- Matt Mahoney, [EMAIL PROTECTED]


----- Original Message ----
From: James Ratcliff <[EMAIL PROTECTED]>
To: agi@v2.listbox.com
Sent: Tuesday, November 7, 2006 9:38:54 AM
Subject: Re: [agi] The concept of a KBMS


Matt Mahoney <[EMAIL PROTECTED]> wrote:
----- Original Message ----
From: YKY (Yan King Yin) <[EMAIL PROTECTED]>
To: agi@v2.listbox.com
Sent: Monday, November 6, 2006 7:49:06 PM
Subject: Re: [agi] The concept of a KBMS

>This is the specification of my logic:
>I conjecture that NL sentences can be easily translated to/from this form.

I conjecture it will be hard.

Here is why.  If it was easy to translate between natural language and an unambiguous, structured form, then it would be easy to translate between two natural languages, e.g. Russian -> Geniform -> English.  This problem is known to be hard.

What does the prepositional phrase "with" modify in "I ate pizza with {pepperoni, a fork, gusto, George}"?
It is simple to show that there is a type fo pizza that is a pepperoni pizza, but not a fork pizza etc.  The others all have different roles that are recognizable by the word type they have.
This would create frames similar to:
ate(Person, pepperoni pizza)
ate(Person, pizza, with Utensil)
ate(Person, pizza, with Feeling)
ate(Person, pizza, with Person)
So the eat action would show the different type of modifiers it would expect, and when it saw something different it would try to fit it into one of the expected slots, or a new slow/frame definition would need to be created.


What does "they" refer to in (from Lenat) "The police arrested the demonstrators because they {feared, advocated} violence"?
This one is harder, but...
Statistically, on the first pass,
"police feared violence" has 62 instances, and
"demonstrators feared violence"  has 0
Then we can expand the "violence" term to attacks, riots we see
police:
50+40
demonstrators: 0
So we have overwhelming evidence there for police fearing it. 
Grammatically we assume the closest match which is demonstrators, so those have to be reconciled together to come up with police.

Is the following sentence correct: "The cat caught a moose"?
This can acutally be handled fairly well.  looking at a frame of cat, and moose, we can statistically see that it is a rare if not non-existent event that a cat can catch a moose.  Now in theory this could be a sci-fi book where a huge cat did catch the moose, but that would have to be learned with more context information.
A frame for Cat catching would show about
15% mouse
5% rats
5% bird
3% < others
A general statement can be made that "cats catch small animals" and that matches most item.
It is mentioned once on the net, by an unreliable quotes page, that a "cat caught a moose"
and once in a fairy tale (The Violet Fairy Book - The Nunda)
a cat caught a donkey.
But for general commons sense these type source would be too far from the norm and are not in the corpora.

What does "it" refer to in "it is raining"?
What is the structured representation of "What?"
This is where it breaks down and the english language goes into a tornada of insanity.
I solve this simply, by staying away from it. Most all the data I work with is in the form of mostly-proper english, from reasonable sources such as news articles which will mostly have proper grammar, and novels.
Other things will be needed here, such as a specialized Dialogue module, to handle speech events.  And a idiom and metaphor translation unit.

What does "it is raining actuall mean?"
interesting: http://www.askoxford.com/asktheexperts/faq/aboutgrammar/it

-- Matt Mahoney, [EMAIL PROTECTED]


This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303



Thank You
James Ratcliff
http://falazar.com

Sponsored Link

For just $24.99/mo., Vonage offers unlimited local and long- distance calling. Sign up now. This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303


This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303



_______________________________________
James Ratcliff - http://falazar.com
New Torrent Site, Has TV and Movie Downloads! http://www.falazar.com/projects/Torrents/tvtorrents_show.php


Want to start your own business? Learn how on Yahoo! Small Business.
This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303

Reply via email to