Re: Goals of AGI (was Re: [agi] AGI interests)

Mark Waser Thu, 19 Apr 2007 12:07:48 -0700

Matt,

I would appreciate it if you could separate your replies to differentpeople into different e-mails.

Can you cite any lossy text compression models or models that separatedeep
knowledge from representation?

Yes. *Any* system that is composed of 1) a parser which reads text andplaces the knowledge into a standardized KR scheme and 2) a text generatorthat outputs any data in that KR stream in English text is separatingknowledge from representation (because it is discarding the representation).

Do you have any figures on how much
information is needed to encode meaning vs. representation?

Well, in the class of systems cited above, there is no attempt to encoderepresentation so your question is meaningless. My entire point is that byencoding representation, you are wasting time and effort.

Do you have any figures on how much information is needed to encode meaningvs. representation in any system that does both? Do you even know of anysystem that attempts to maintain representation that can separate meaningand representation and provide such numbers? I'm sure not -- so why are youasking me? *You* are the one trying to do both, not me. I think that it'snonsensical.

Can you argue
that the representation is at least half of the information?

Yes, I can. Take any case that involves a unit of measurement. Thestatements "John is x inches tall", "John is y centimeters tall", "John isa.b feet tall", "John is c.d meters tall", "John is 1/e miles tall" can bereproduced nearly ad infinitum. If you combine that with a second set ofstatements as to how tall Jane is, you only double the amount of knowledgebut increase the number of possible statements geometrically (since you canput the John statements first, you can put the Jane statements first, youcan compare their heights in all the different units, you can include thecomparison with one or both of the statements in any order). Actually, nowthat I think about it, you also need to consider the cases where you areeither repeat a single statement n times or where you repeat a variablenumber of statements a variable number of times.

So . . . YES, I can quite easily make it so that the representation in textis approaching infinity in magnitude larger than the knowledge encoded init.

For example, can
you think of a 100 character sentence that can be expressed in 2^50different
ways without changing its meaning? (assuming 1 bpc entropy)

A 100 character sentence? Sure. The first time I say it once. The secondtime I say it twice in a row. The third time I say it three times . . . .The 2^50th + 1 time . . . .


And that's without doing anything else . . . .

Knowledge does *NOT* increase just because the amount of text increases.You can always add text without adding knowledge and in that case 100% ofthe added encoding is due to the representation (which sounds like 100% ofadded unnecessary effort to me).

----- Original Message -----From: "Matt Mahoney" <[EMAIL PROTECTED]>

To: <[email protected]>
Sent: Thursday, April 19, 2007 2:15 PM
Subject: Re: Goals of AGI (was Re: [agi] AGI interests)

--- David Clark <[EMAIL PROTECTED]> wrote:
Turing's test is obviously not sufficient for AGI. Why would an AGIwasteit's time learning to lie, miscompute numbers, simulate a forgetfulmemoryetc, to pass a test? Why would the creators of an AGI spend time andmoney
to create the worst aspects of being human?
I agree, these are all good reasons to use a different test.
I use a simple metaphor for *understanding*. If information was X/Ypairsof numbers, and they were plotted on a graph, the Y intercept and slopeof
the resulting line would be *understanding*.
Now you are talking about compression. Encode the X,Y points as the Xpoints
plus a function for computing Y.
> A common argument against compression as a test for AI is that humans
don't
> compress like a zip program.  Compression requires a *deterministic*
model.  A
> compressor codes string x using a code of length log 1/p(x) bits.  The
> decompressor must also compute p(x) exactly to invert the code.  Humans
can't
> do this because they use noisy neurons to compute p(x) that varies a> bit
each
> time.

Any test that requires the AGI to jump through hoops that a human (or any
human) can't pass, is a poor test. The idea isn't to make the potentialAGIfail but to recognize when something approximating human levelintelligenceis achieved. To make a test so hard as to fail, obviously intelligentand
useful programs wouldn't have much value.
I don't think this is a hard hoop for a deterministic machine. The hardpart
is figuring out how to compute p(x).  Once you can do this, computing it a
second time is trivial.


--- Andrew Babian <[EMAIL PROTECTED]> wrote:
It occurs to me the problem I'm having with this definition of AI as
compression. There are two different tasks here, recognition of"sensory"data and reproduction of it. It sounds like this definition proposesthatthey are exactly equivalent, or that any recognition system isautomaticallyinvertable. I simply doubt that this can be true, using a principle (Ihave
no proof for but I hold) that "meaning"--something we use to recognize
equivalence-- is just not the same for different peceptual events.
An another example I use to think about it is how difficult it is tryingtodraw a reproduction of a picture from memory, and how different the taskis
from drawing a copy is from analyzing the elements in a picture.
Reproducing
 visual information is different from conceptual scene decomposition.
I should have discussed my motiviation for using lossy video compressionas atest for AGI, as I did for lossless text. The idea is that lossycompressionis not possible without an accurate model of human perception. Humansreceivesensory information at about 1 Gb/s (b = bits, B = Bytes), and somehowfilter
and compress this down to about 10 b/s by the time it reaches long term
memory.

A lossy compressor given input x must first compute the lossy function y =
f(x) that models human perception, then compress y using a lossless model
p(y).  All lossy compressors work this way.  For example, JPEG performs a
color transform and downsamples the two chroma components because the eyeisless sensitive to high spatial frequencies in chroma than in luma. Ituses 3primary colors because the eye has 3 types of cones. Thus, there is noneedto distinguish the pure spectral yellow in a rainbow from the yellow yousee
on a monitor that results from mixing red and green.  After the lossy
transform (which also involves quantization that varies by spatialfrequency),the remaining features are compressed losslessly (using e.g. run-lengthand
Huffman coding).

Humans are not capable of inverting visual perception (i.e. producing real
time video from memory), but nearly all compuer models, whether lossy or
lossless, can decompress at least as fast as they can compress, and often
faster (e.g. JPEG and MPEG). So it was my assumption that this would notbe a
hardship in an AGI test, once the hard problem of computing p(f(.)) was
solved.  Decompression means computing p(y) again, then inverting f(.).  I
can't say for certain that inverting f(.) is not hard, but I don't believeit
will be.


--- Mark Waser <[EMAIL PROTECTED]> wrote:
> If a sentence can be rewritten in 1000 different ways without changing
> its meaning, then that only adds 10 bits.

    Yes, provided that you have an efficient encoding/decoding scheme for
that particular sentence.  Now, what is the overhead for having efficient
encoding/decoding schemes for *all* possible sentences?

    You state that "The amount of extra knowledge needed to encode the
choice of representations is small."  I strenuously disagree with this
statement. While the number of bits required in the encoded text issmall,the amount of extra knowledge required in the encoder and decoder ismuch,
*MUCH* larger.  What model did you have in mind that joins both deep
knowledge and the very shallow lossless algorithms that you cite? Idon'tbelieve that you can cite *any* deep knowledge algorithm/model thatdoesn't
suffer when you try to add losslessness.
Statistical language models such as n-gram backoff (aka PPM),distant-bigrammodels and LSA, and combinations thereof using information fusionapproachessuch as maximum entropy or context mixing, are all lossless. Theknowledgelearned by such models is the same size as the compressed output,typically 1
to 2 bits per character of the training data.  These models are generally
regarded as efficient: they learn thousands of times faster than humans.Of
course the models are low level, modeling only semantics and simple, flat
grammar perhaps at the level of a 2 or 3 year old child.  But there is
currently nothing better.
Can you cite any lossy text compression models or models that separatedeep
knowledge from representation?  Do you have any figures on how much
information is needed to encode meaning vs. representation?  Can you argue
that the representation is at least half of the information? For example,canyou think of a 100 character sentence that can be expressed in 2^50different
ways without changing its meaning? (assuming 1 bpc entropy)


-- Matt Mahoney, [EMAIL PROTECTED]

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&;



-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=231415&user_secret=fabd7936

Re: Goals of AGI (was Re: [agi] AGI interests)

Reply via email to