Re: [agi] The crux of the problem

Matt Mahoney Fri, 10 Nov 2006 18:59:44 -0800

James Ratcliff <[EMAIL PROTECTED]> wrote:
>Matt,
> expand upon the first part as you said there please.

I argued earlier that a natural language model has a complexity of about 10^9 bits. To be precise, let p(s) be a function that outputs an estimate of the probability that string s will appear as a prefix in human discourse, such as might occur in a Turing test between a judge and human confederate. If p(s) is a good estimate of the true probability for most s, then this model could be used to pass the Turing test as follows: if Q is the dialog so far, the the machine will respond with answer A by selecting randomly from the distribution p(A|Q) = p(QA)/p(Q). I argue that the Kolmogorov complexity of a function p() which is sufficiently accurate to pass the Turing test is about 10^9 bits.

My argument that a language model must be opaque is based on the premise that the human brain cannot understand itself, for the same reason that a Turing machine cannot simulate another Turing machine with greater Kolmogorov complexity. This is not to say we can't build a brain. There are simple learning algorithms that can store vast knowledge. We can understand enough of the brain in order to describe its development, to write an algorithm for the learning mechanism and simulate its behavior. But we cannot know all of the knowledge it has learned. So we will be able to build an AGI and train it, but after we train it we cannot know everything that it knows. A transparent representation that implies otherwise is not possible.

Most AGI designs have the form of a data structure to represent knowledge, and functions to convert input to knowledge and knowledge to output:

input --> knowledge representation --> output

Many knowledge representations have been proposed: frame-slot, first order logic, connnectionist systems, etc. These generally have the form of labeled graphs, where the vertices generally correspond to words, concepts, or system states, and the edges correspond to relations such as "is-a" or "contains", implications, probabilities, confidences, etc. We argue for the correctness of these models by showing how facts such as "the cat ate a mouse" can be easily represented, and give many examples.

Here is the problem. We know that the knowledge representation must have a complexity of 10^9 bits. Anything smaller cannot work. When we give examples, we usually draw graphs with just a few edges per vertex, but this is not how it will look when training is complete. Suppose there are 10^5 vertices, enough to represent a large vocabulary. Then your trained system must have about 10^4 edges per vertex. Building such a model by hand, or even trying to understand or debug it would be hopeless. I would call such a model opaque.

It is natural for us to seek simple solutions, a "theory of everything". After all, we are agents in the sense of Hutter's AIXI following the provably optimal strategy of Occam's Razor. But in our drive to simplify and understand, we are trying to compress the language model to an impossibly small size, always misled down a dead end path by our initial successes with low complexity toy systems.

-- Matt Mahoney, [EMAIL PROTECTED]

----- Original Message ----
From: James Ratcliff <[EMAIL PROTECTED]>
To: [email protected]
Sent: Friday, November 10, 2006 9:56:00 AM
Subject: Re: [agi] The crux of the problem

Matt,
expand upon the first part as you said there please.

James

Matt Mahoney <[EMAIL PROTECTED]> wrote:

James,
Many of the solutions you describe can use information gathered from statistical models, which are opaque. I need to elaborate on this, because I think opaque models will be fundamental to solving AGI. We need to build models in a way that doesn't require access to the internals. This requires a different approach than traditional knowledge representation. It will require black box testing and performance metrics. It will be less of an engineering approach, and more of an experimental one.

Information retrieval is a good example. It is really simple. You type a question, and the system matches the words in your query to words in the document and ranks the documents by TF*IDF (term frequency times log inverse document frequency). This is an opaque model. We normally build an index, but this is really just an optimization. The language model is just the documents themselves. There is no good theory to explain why it works. It just does.

-- Matt Mahoney, [EMAIL PROTECTED]

This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303

Re: [agi] The crux of the problem

Reply via email to