I want to first clarify my earlier proposed definition of AGI, and then
address the concerns that were posted in response to my claim of the
equivalence of compression and AI. I will propose just one specific
application: an operating system for personal computers. An AGI residing
in
your PC should be able to do the same tasks as a human assistant, at least
as
fast and as accurately. For example, you could ask your computer to write
and
submit a research paper on some topic, or to find consulting work and make
money, or to compose and play music, or you could hold up a scrambled
Rubik's
cube and show 5 of the sides to a camera mounted in your PC and ask it to
display what is on the hidden side. Unlike current operating systems,
there
would be no notion of files or programs or a GUI. You ask it to do work
and
it obeys.
The interface would be an Internet connection and standard peripherals
such as
a monitor, keyboard, mouse, speakers, and maybe a microphone and camera.
A
microphone (for speech) and camera (to identify the user and interpret
facial
expressions) would have negligible effect on requrements because the AGI
would
need a vision system anyway to interpret video on the Internet. The AGI
may
be distributed. It need not all reside locally.
An AGI should be complementary to the human mind, not a copy of one. It's
purpose would be to enhance communication between the user and the
computer,
and thus to much of the world. The AGI should be able to generate images
that
humans can interpret, even though this is not a capability of humans. The
AGI
need not have a physical body or the means to control it, but it must be
able
to interpret images or descriptions of human actions such as running,
eating,
falling in love, etc.
I proposed text compression and video compression as tests. For text, the
AGI
must be able to losslessly compress 1 GB of text with no initial training
(i.e. the size of the decompressor is included) to match humans in tests
of
text prediction, about 1 bit per character. I chose 1 GB because it is
equivalent to human language exposure from infancy to adulthood. Using
more
text is not allowed because it would make the problem easier by using
brute
force to compensate for a slow learning algorithm, e.g. Google's 5-gram
language model trained on 10 TB of text.
For video, I propose that the AGI be required to compress (using lossy
compression) 275 TB of DVD quality MPEG-2 video (or 14000 TB uncompressed
video) to 800 MB. The input represents 20 years of video for 16 hours per
day, or 60,000 DVD quality movies at 4.7 GB each (2 hours x 640 x 480 x 30
fps
x 3 colors x 8 bits at 0.65 bits per pixel compression). The output is a
rate
of 10 bits per second, 16 hours per day for 20 years. The quality of the
compression must be such that if an audience views a 2 hour movie, then 24
hours later views either the same movie or the movie after compression and
decompression, each with 50% probability, that the audience will not be
able
to guess with more than 75% accuracy which version they saw.
The video input size (275,000 GB) will be the maximum allowed, in order to
prevent brute force solutions.
For both text and video, the decompressor must finish in no more than 20
years
(real time). There should be no time limit for compression.
My proposed video output rate of 10 bits per second is a rough guess. It
will
depend on psychological tests of the rate of human long term memory.
Unfortunately, I don't know of any equivalence to Shannon's text guessing
game
[1] to measure this. My estimate is based on:
1. Speech converted to compressed text contains 10 bits per second.
Intuitively, video should be similar or a little higher in terms of what
you
learn and remember, and in the allocation of neurons in the cortex.
2. Landauer [2] estimated that the capacity of human long term memory is
10^9
bits for a wide range of modalities (words, images, music). Human life
expectancy is 2 x 10^9 seconds, or 0.5 bits per second.
3. Standing [3] had subject memorize 10,000 pictures, one every 5.6
seconds
over 5 days. Two days later they could recall about 80% in tests. This
is
about the result you would get if you reduced each picture to a 16 bit
feature
vector and checked for matches. This is a memory rate of 0.3 bits per
second.
Now let me address the responses to my earlier post.
--- "Kingma, D.P." <[EMAIL PROTECTED]> wrote:
I'm not convinced by this reasoning. First, the way individuals store
audiovisual information differs, simply because of slight differences in
brain development (nurture). Also, memory is condensed information about
the
actual high-level sensory/experience information. The actual 45kb memory
of
a movie is therefore quite personal to the subject. Recall of a
photo/video
is more like an impressionistic painting then an actual photo.
It is true that different people will notice different differences, so we
must
average over a large audience.
--- Mark Waser <[EMAIL PROTECTED]> wrote:
>> In http://cs.fit.edu/~mmahoney/compression/rationale.html I argue the
equivalence of text compression with AI.
We've had this argument before so I'll summarize . . . .
Knowledge compression may well be mostly equivalent with the "logical
view"
of AI. Text, however, can express the same knowledge in a near
infinitude
of different forms. Requiring an AI to decompress the same knowledge
into a
variety of different forms based upon what was input is a tremendously
more
difficult problem than AI without that requirement (and having that
requirement doesn't seem to have any benefit).
I am proposing a lossy test for video compression for this reason. A
lossless
video test would make no sense because the vast majority of the
information
content is inperceptable noise. But this is not the case for text. I
could
have used a lossy test by using human subjects to judge the equivalence of
the
reproduced output text, but it seemed like more trouble than it is worth.
The
lossless test is fair because everyone still has to encode the
(incompressible) choice of representation.
--- James Ratcliff <[EMAIL PROTECTED]> wrote:
This is jumping ahead of ourselves as well... we really have to
prioritize
and take small steps... We first have to get it to basic understand of
teh
words and the direct interaction of these words... and just from Text
stories even, not moveis, before we can go to global moral plots and long
terms thinking ahead.
I expect that a good video compressor will have to understand plots,
morals,
human emotions, etc. But first we need good models of the lower levels of
visual processing. The nice thing about video compression is we can
measure
this too.
--- David Clark <[EMAIL PROTECTED]> wrote:
If a huge statistical database of valid English information could be
parsed
(compressed or otherwise), it might be possible to predict with some
accuracy if a given sentence was likely to be grammatically correct or
not.
This capability seems far removed from an AGI IMHO.
Thus my reasoning behind limiting the size of the training set. The AGI
needs
to learn as well as a human with the same amount of data.
If a book is put in a computer and then I refer to that book by it's
title
1M times, what is my percentage compression? If you think it is high
then
show me where the intelligence lies in this reference? By using simple
references, humans compress huge amounts of data that would consume
storage
our brains couldn't physically handle. The problem is that this kind of
compression is accomplished by understanding. If you can crack the
*understanding* part of compression then you might have an AGI but I fail
to
see how just compressing data will result in understanding. Compression
and
understanding are not reciprocal concepts. If humans had unlimited
storage
and compression of information wasn't necessary, wouldn't the humans
*understanding* still confer intelligence to that human?
What is your definition of "understanding"? I know what it means in
people,
but what does it mean in a computer? If you accept Turing's definition of
AI,
then you have to accept the equivalence of passing the Turing test with
computing a probability distribution.
A common argument against compression as a test for AI is that humans
don't
compress like a zip program. Compression requires a *deterministic*
model. A
compressor codes string x using a code of length log 1/p(x) bits. The
decompressor must also compute p(x) exactly to invert the code. Humans
can't
do this because they use noisy neurons to compute p(x) that varies a bit
each
time.
References
[1] Shannon, Claude E. (1950), "Prediction and Entropy of Printed
English",
Bell Sys. Tech. J (3) p. 50-64.
[2] Landauer, Tom (1986), "How much do people remember? Some estimates of
the
quantity of learned information in long term memory", Cognitive Science
(10)
pp. 477-493.
[3] Standing, L. (1973), "Learning 10,000 Pictures", Quarterly Journal of
Experimental Psychology (25) pp. 207-222.
-- Matt Mahoney, [EMAIL PROTECTED]
-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&