I don't disagree with "awesome compression abilities" as a test for
"advanced AGI"

However, I think that trying to achieve awesome compression by
incrementally improving current compressors, is sorta like trying to
reach the moon by incrementally improving current pogo sticks ;-)

A different sort of architecture is needed to achieve awesome levels
of compression, than to incrementally improve on the current levels...

Ben G

On 4/19/07, Matt Mahoney <[EMAIL PROTECTED]> wrote:
--- David Clark <[EMAIL PROTECTED]> wrote:

> Turing's test is obviously not sufficient for AGI.  Why would an AGI waste
> it's time learning to lie, miscompute numbers, simulate a forgetful memory
> etc, to pass a test?  Why would the creators of an AGI spend time and money
> to create the worst aspects of being human?

I agree, these are all good reasons to use a different test.

> I use a simple metaphor for *understanding*.  If information was X/Y pairs
> of numbers, and they were plotted on a graph, the Y intercept and slope of
> the resulting line would be *understanding*.

Now you are talking about compression.  Encode the X,Y points as the X points
plus a function for computing Y.

> > A common argument against compression as a test for AI is that humans
> don't
> > compress like a zip program.  Compression requires a *deterministic*
> model.  A
> > compressor codes string x using a code of length log 1/p(x) bits.  The
> > decompressor must also compute p(x) exactly to invert the code.  Humans
> can't
> > do this because they use noisy neurons to compute p(x) that varies a bit
> each
> > time.
>
> Any test that requires the AGI to jump through hoops that a human (or any
> human) can't pass, is a poor test.  The idea isn't to make the potential AGI
> fail but to recognize when something approximating human level intelligence
> is achieved.  To make a test so hard as to fail, obviously intelligent and
> useful programs wouldn't have much value.

I don't think this is a hard hoop for a deterministic machine.  The hard part
is figuring out how to compute p(x).  Once you can do this, computing it a
second time is trivial.


--- Andrew Babian <[EMAIL PROTECTED]> wrote:

> It occurs to me the problem I'm having with this definition of AI as
> compression.  There are two different tasks here, recognition of "sensory"
> data and reproduction of it.  It sounds like this definition proposes that
> they are exactly equivalent, or that any recognition system is automatically
> invertable.  I simply doubt that this can be true, using a principle (I have
> no proof for but I hold) that "meaning"--something we use to recognize
> equivalence-- is just not the same for different peceptual events.
>
> An another example I use to think about it is how difficult it is trying to
> draw a reproduction of a picture from memory, and how different the task is
> from drawing a copy is from analyzing the elements in a picture.
> Reproducing
>  visual information is different from conceptual scene decomposition.

I should have discussed my motiviation for using lossy video compression as a
test for AGI, as I did for lossless text.  The idea is that lossy compression
is not possible without an accurate model of human perception.  Humans receive
sensory information at about 1 Gb/s (b = bits, B = Bytes), and somehow filter
and compress this down to about 10 b/s by the time it reaches long term
memory.

A lossy compressor given input x must first compute the lossy function y =
f(x) that models human perception, then compress y using a lossless model
p(y).  All lossy compressors work this way.  For example, JPEG performs a
color transform and downsamples the two chroma components because the eye is
less sensitive to high spatial frequencies in chroma than in luma.  It uses 3
primary colors because the eye has 3 types of cones.  Thus, there is no need
to distinguish the pure spectral yellow in a rainbow from the yellow you see
on a monitor that results from mixing red and green.  After the lossy
transform (which also involves quantization that varies by spatial frequency),
the remaining features are compressed losslessly (using e.g. run-length and
Huffman coding).

Humans are not capable of inverting visual perception (i.e. producing real
time video from memory), but nearly all compuer models, whether lossy or
lossless, can decompress at least as fast as they can compress, and often
faster (e.g. JPEG and MPEG).  So it was my assumption that this would not be a
hardship in an AGI test, once the hard problem of computing p(f(.)) was
solved.  Decompression means computing p(y) again, then inverting f(.).  I
can't say for certain that inverting f(.) is not hard, but I don't believe it
will be.


--- Mark Waser <[EMAIL PROTECTED]> wrote:

> > If a sentence can be rewritten in 1000 different ways without changing
> > its meaning, then that only adds 10 bits.
>
>     Yes, provided that you have an efficient encoding/decoding scheme for
> that particular sentence.  Now, what is the overhead for having efficient
> encoding/decoding schemes for *all* possible sentences?
>
>     You state that "The amount of extra knowledge needed to encode the
> choice of representations is small."  I strenuously disagree with this
> statement.  While the number of bits required in the encoded text is small,
> the amount of extra knowledge required in the encoder and decoder is much,
> *MUCH* larger.  What model did you have in mind that joins both deep
> knowledge and the very shallow lossless algorithms that you cite?  I don't
> believe that you can cite *any* deep knowledge algorithm/model that doesn't
> suffer when you try to add losslessness.

Statistical language models such as n-gram backoff (aka PPM), distant-bigram
models and LSA, and combinations thereof using information fusion approaches
such as maximum entropy or context mixing, are all lossless.  The knowledge
learned by such models is the same size as the compressed output, typically 1
to 2 bits per character of the training data.  These models are generally
regarded as efficient: they learn thousands of times faster than humans.  Of
course the models are low level, modeling only semantics and simple, flat
grammar perhaps at the level of a 2 or 3 year old child.  But there is
currently nothing better.

Can you cite any lossy text compression models or models that separate deep
knowledge from representation?  Do you have any figures on how much
information is needed to encode meaning vs. representation?  Can you argue
that the representation is at least half of the information?  For example, can
you think of a 100 character sentence that can be expressed in 2^50 different
ways without changing its meaning? (assuming 1 bpc entropy)


-- Matt Mahoney, [EMAIL PROTECTED]

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&;


-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=231415&user_secret=fabd7936

Reply via email to