Matt,

I would appreciate it if you could separate your replies to different people into different e-mails.

Can you cite any lossy text compression models or models that separate deep
knowledge from representation?

Yes. *Any* system that is composed of 1) a parser which reads text and places the knowledge into a standardized KR scheme and 2) a text generator that outputs any data in that KR stream in English text is separating knowledge from representation (because it is discarding the representation).

Do you have any figures on how much
information is needed to encode meaning vs. representation?

Well, in the class of systems cited above, there is no attempt to encode representation so your question is meaningless. My entire point is that by encoding representation, you are wasting time and effort.

Do you have any figures on how much information is needed to encode meaning vs. representation in any system that does both? Do you even know of any system that attempts to maintain representation that can separate meaning and representation and provide such numbers? I'm sure not -- so why are you asking me? *You* are the one trying to do both, not me. I think that it's nonsensical.

Can you argue
that the representation is at least half of the information?

Yes, I can. Take any case that involves a unit of measurement. The statements "John is x inches tall", "John is y centimeters tall", "John is a.b feet tall", "John is c.d meters tall", "John is 1/e miles tall" can be reproduced nearly ad infinitum. If you combine that with a second set of statements as to how tall Jane is, you only double the amount of knowledge but increase the number of possible statements geometrically (since you can put the John statements first, you can put the Jane statements first, you can compare their heights in all the different units, you can include the comparison with one or both of the statements in any order). Actually, now that I think about it, you also need to consider the cases where you are either repeat a single statement n times or where you repeat a variable number of statements a variable number of times.

So . . . YES, I can quite easily make it so that the representation in text is approaching infinity in magnitude larger than the knowledge encoded in it.

For example, can
you think of a 100 character sentence that can be expressed in 2^50 different
ways without changing its meaning? (assuming 1 bpc entropy)

A 100 character sentence? Sure. The first time I say it once. The second time I say it twice in a row. The third time I say it three times . . . . The 2^50th + 1 time . . . .

And that's without doing anything else . . . .

Knowledge does *NOT* increase just because the amount of text increases. You can always add text without adding knowledge and in that case 100% of the added encoding is due to the representation (which sounds like 100% of added unnecessary effort to me).


----- Original Message ----- From: "Matt Mahoney" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Thursday, April 19, 2007 2:15 PM
Subject: Re: Goals of AGI (was Re: [agi] AGI interests)


--- David Clark <[EMAIL PROTECTED]> wrote:

Turing's test is obviously not sufficient for AGI. Why would an AGI waste it's time learning to lie, miscompute numbers, simulate a forgetful memory etc, to pass a test? Why would the creators of an AGI spend time and money
to create the worst aspects of being human?

I agree, these are all good reasons to use a different test.

I use a simple metaphor for *understanding*. If information was X/Y pairs of numbers, and they were plotted on a graph, the Y intercept and slope of
the resulting line would be *understanding*.

Now you are talking about compression. Encode the X,Y points as the X points
plus a function for computing Y.

> A common argument against compression as a test for AI is that humans
don't
> compress like a zip program.  Compression requires a *deterministic*
model.  A
> compressor codes string x using a code of length log 1/p(x) bits.  The
> decompressor must also compute p(x) exactly to invert the code.  Humans
can't
> do this because they use noisy neurons to compute p(x) that varies a > bit
each
> time.

Any test that requires the AGI to jump through hoops that a human (or any
human) can't pass, is a poor test. The idea isn't to make the potential AGI fail but to recognize when something approximating human level intelligence is achieved. To make a test so hard as to fail, obviously intelligent and
useful programs wouldn't have much value.

I don't think this is a hard hoop for a deterministic machine. The hard part
is figuring out how to compute p(x).  Once you can do this, computing it a
second time is trivial.


--- Andrew Babian <[EMAIL PROTECTED]> wrote:

It occurs to me the problem I'm having with this definition of AI as
compression. There are two different tasks here, recognition of "sensory" data and reproduction of it. It sounds like this definition proposes that they are exactly equivalent, or that any recognition system is automatically invertable. I simply doubt that this can be true, using a principle (I have
no proof for but I hold) that "meaning"--something we use to recognize
equivalence-- is just not the same for different peceptual events.

An another example I use to think about it is how difficult it is trying to draw a reproduction of a picture from memory, and how different the task is
from drawing a copy is from analyzing the elements in a picture.
Reproducing
 visual information is different from conceptual scene decomposition.

I should have discussed my motiviation for using lossy video compression as a test for AGI, as I did for lossless text. The idea is that lossy compression is not possible without an accurate model of human perception. Humans receive sensory information at about 1 Gb/s (b = bits, B = Bytes), and somehow filter
and compress this down to about 10 b/s by the time it reaches long term
memory.

A lossy compressor given input x must first compute the lossy function y =
f(x) that models human perception, then compress y using a lossless model
p(y).  All lossy compressors work this way.  For example, JPEG performs a
color transform and downsamples the two chroma components because the eye is less sensitive to high spatial frequencies in chroma than in luma. It uses 3 primary colors because the eye has 3 types of cones. Thus, there is no need to distinguish the pure spectral yellow in a rainbow from the yellow you see
on a monitor that results from mixing red and green.  After the lossy
transform (which also involves quantization that varies by spatial frequency), the remaining features are compressed losslessly (using e.g. run-length and
Huffman coding).

Humans are not capable of inverting visual perception (i.e. producing real
time video from memory), but nearly all compuer models, whether lossy or
lossless, can decompress at least as fast as they can compress, and often
faster (e.g. JPEG and MPEG). So it was my assumption that this would not be a
hardship in an AGI test, once the hard problem of computing p(f(.)) was
solved.  Decompression means computing p(y) again, then inverting f(.).  I
can't say for certain that inverting f(.) is not hard, but I don't believe it
will be.


--- Mark Waser <[EMAIL PROTECTED]> wrote:

> If a sentence can be rewritten in 1000 different ways without changing
> its meaning, then that only adds 10 bits.

    Yes, provided that you have an efficient encoding/decoding scheme for
that particular sentence.  Now, what is the overhead for having efficient
encoding/decoding schemes for *all* possible sentences?

    You state that "The amount of extra knowledge needed to encode the
choice of representations is small."  I strenuously disagree with this
statement. While the number of bits required in the encoded text is small, the amount of extra knowledge required in the encoder and decoder is much,
*MUCH* larger.  What model did you have in mind that joins both deep
knowledge and the very shallow lossless algorithms that you cite? I don't believe that you can cite *any* deep knowledge algorithm/model that doesn't
suffer when you try to add losslessness.

Statistical language models such as n-gram backoff (aka PPM), distant-bigram models and LSA, and combinations thereof using information fusion approaches such as maximum entropy or context mixing, are all lossless. The knowledge learned by such models is the same size as the compressed output, typically 1
to 2 bits per character of the training data.  These models are generally
regarded as efficient: they learn thousands of times faster than humans. Of
course the models are low level, modeling only semantics and simple, flat
grammar perhaps at the level of a 2 or 3 year old child.  But there is
currently nothing better.

Can you cite any lossy text compression models or models that separate deep
knowledge from representation?  Do you have any figures on how much
information is needed to encode meaning vs. representation?  Can you argue
that the representation is at least half of the information? For example, can you think of a 100 character sentence that can be expressed in 2^50 different
ways without changing its meaning? (assuming 1 bpc entropy)


-- Matt Mahoney, [EMAIL PROTECTED]

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&;



-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=231415&user_secret=fabd7936

Reply via email to