The
argument for lossy vs. lossless compression as a test for AI seems to be
motivated by the fact that humans use lossy compression to store memory, and
cannot do lossless compression at all. The reason is that lossless
compression requires the ability to do deterministic computation. Lossy
compression does not. So this distinction is not important for
machines.
The proof that an ideal language model implies passing the
Turing test requires a lossless model. A lossy model has only partial
knowledge of the distribution of strings in natural language dialogs.
Without full knowledge, it is not possible to duplicate the same distribution
of equivalent representations of the same idea, allowing such expressions to
be recognized as not human, even if the compression is ideal. For
example, a lossy compressor might compress all of the following to the same
code: "it is hot", "it is quite warm", "it is 107 degrees", "the burning
desert sun seared my skin", etc. This distribution of expressions of
equivalent ideas (or almost equivalent) is not uniform. Humans recognize
that some expressions are more common than others, but an ideal lossy
compressor is unable to regenerate the same distribution. (If it could,
it would be a lossless model). It only needs to know the sum of the
probabilities for ideal compression.
This example brings up another
issue. Who is to say if two expressions represent the same idea?
The problem itself requires AI.
The proper way to avoid coding
equivalent representations in an objective way is to remove all noise (e.g.
misspelled words, grammatical errors, arbitrary line breaks), from the data
set and put it in a canonical form, so there can only be one way to represent
the ideas within. This would remove any distinction between lossy and
lossless compression. However it would be a gargantuan task. It
would take a lifetime to read 1 GB of text. But by using Wikipedia, most
of this work has already been done. There are very few spelling or
grammar errors due to extensive review, and there is a rather uniform
style. Line breaks only occur on paragraph
boundaries.
Uncompressed video would be the absolutely worst type of
test data. Uncompressed video is about 10^8 to 10^9 bits per
second. The human brain has a long term learning rate of around 10 bits
per second. So all the rest is noise. How are you going to remove
that prior to compression?
There are no objective functions to compare
the quality of lossy decompression. For images, we have PSNR, which is
the RMS error of the pixel differences between the original and reconstructed
images. But this is a poor measure. For example, if I increased
the brightness of all pixels by 1%, you would not see any difference.
However if I increased the brightness of just the top half of the image by 1%,
then the PSNR would be reduced by 50% but there would be an obvious horizontal
line across the image. Any test of lossy quality has to be
subjective.
This is not to say that investigating how humans do lossy
compression isn't an important field of study. I think it is essential
to understanding how vision, hearing, and the other senses work and how that
data is processed. We currently do not have good models to describe how
human decide what to remember and what to discard.
But the Hutter prize
is to motivate better language models, not vision or hearing or
robotics. For that task, I think lossless text compression is the right
approach.
-- Matt Mahoney, [EMAIL PROTECTED]
-----
Original Message ----
From: boris <[EMAIL PROTECTED]>
To:
[email protected]Sent: Saturday, August 19, 2006 10:25:58 PM
Subject:
[agi] Lossy *&* lossless compression
It's been said that we have to go after lossless compression because
there's no way to objectively measure the quality of lossy compression. That
makes sense only in the context of dumb indiscriminate transforms
conventionally used for compression.
If compression is produced by pattern recognition we can quantify lossless
compression of individual patterns, which is a perfectly objective criterion
for selectively *losing* insufficiently compressed patterns. To make Hutter's
prize meaningful it must be awarded for compression of the *best* patterns,
rather than of the whole data set. And, of course, linguistic/semantic data is
a lousy place to start, it's already been heavily compressed by "algorithms"
unknown to any autonomous system. An uncompressed movie would be a far, far
better data sample. Also, the real criterion of intelligence is prediction,
which is a *projected* compression of future data. The difference is that
current compression is time-symmetrical, while prediction obviously
isn't.
To
unsubscribe, change your address, or temporarily deactivate your subscription,
please go to
http://v2.listbox.com/member/[EMAIL PROTECTED]