|
I reject your nonsensical
claim.
>> If you claim that this is not in canonical form, then prove it.
Specify a criteria for canonical form, a pass/fail
test.
By definition, a canonical
form should not have duplication. Your data has massive duplication
(particularly when looked at on the knowledge level) and is therefore not
canonical. Simple enough for you?
>> Do you see my point now?
No, all I see if
that you're so invested in lossless (at
the bit-level) compression that you're not even willing to try to work
to get past it.
----- Original Message -----
Sent: Saturday, August 26, 2006 9:40
PM
Subject: Re: [agi] Lossy *&* lossless
compression
Suppose
I claim that text8.zip available at http://cs.fit.edu/~mmahoney/compression/textdata.html is in
canonical form. The procedure and a program for generating it is
described at the bottom of that page. The output consists of only the
lowercase letters a-z and spaces. If you claim that this is not in
canonical form, then prove it. Specify a criteria for canonical form, a
pass/fail test. I want an algorithm or a program, no hand waving or
generalities. Input an arbitrary string, output yes or
no.Do you see my point now?
-- Matt Mahoney, [EMAIL PROTECTED]
-----
Original Message ---- From: Mark Waser <[EMAIL PROTECTED]> To:
[email protected]Sent: Saturday, August 26, 2006 8:52:27 PM Subject:
Re: [agi] Lossy *&* lossless compression
>> I
think that either putting Wikipedia in canonical form, or recognizing that it
is in canonical form, are two equally difficult problems. So the problem
does not go away easily.
Um. I think you missed my point. The
compression program should be able to take the Wikipedia in it's current form
and the decompression program should be able to output it in canonical
form. Make the contestants do all the difficult work, not the
judges. (and recognizing canonical form should be easy, ensuring it's
completeness is likely to be a real problem, but that's what you have the
other contestants for . . . . :-)
-----
Original Message -----
Sent:
Saturday, August 26, 2006 5:33 PM
Subject:
Re: [agi] Lossy *&* lossless compression
I
think that either putting Wikipedia in canonical form, or recognizing that
it is in canonical form, are two equally difficult problems. So the
problem does not go away easily.
-- Matt Mahoney, [EMAIL PROTECTED]
-----
Original Message ---- From: Mark Waser < [EMAIL PROTECTED]> To: [email protected]Sent: Saturday, August 26, 2006
4:51:07 PM Subject: Re: [agi] Lossy *&* lossless compression
>> Mark suggested putting Wikipedia in a
canonical form, which would remove the distinction between lossless and
lossy compression.
Hmmm. Interesting . . . . Actually,
I didn't suggest exactly that -- though I can see
how you got that impression. I suggested that the decompression
program should output the Wikipedia in canonical form meaning that it would
be lossy as far as information is concerned (i.e. it loses the exact bit
sequence of the input) but it would be lossless as far as knowledge is
concerned. Putting the Wikipedia in a canonical form (or -- developing
a good canonical form to put the Wikipedia into) strikes me as the largest
part of the challenge (and thus, not something that you want to -- or should
-- take on as contest organizers).
----- Original Message -----
Sent: Saturday, August 26, 2006 3:29
PM
Subject: Re: [agi] Lossy *&* lossless
compression
> First let me respond to Boris and Mark. I agree.
Mark suggested putting Wikipedia in a canonical form, which would remove the
distinction between lossless and lossy compression. This will be hard,
but Boris made an important observation that useful data is generally
compressable and useless data (noise) is not. I don't think the
problem can be solved completely but there is clearly room for
improvement. > > Eliezer suggests putting a model of the
universe on a USB drive and then running the model to predict how many
fingers he is holding up. Let's assume that is possible. Stephen
Wolfram suggests the model, if one exists, might only be a few lines of
code. > http://en.wikipedia.org/wiki/A_New_Kind_of_Science> > But we must solve a few other problems
first. > > 1. It may be hard to find such a model. We
cannot tell whether the apparent randomness of quantum mechanics is truly
random or generated by a deterministic, but random appearing process.
This happens in cryptography. The only way to distinguis between true
random data and an encrypted block of zero bits is to break the
decryption. The former is not compressable, the latter is. >
> 2. Assuming we solve this mystery of the universe and it turns out
to be deterministic, we still have the problem of running the code on a
computer that resides within the universe. If the universe is
infinite, then it is possible because one Turing machine can simulate
another. If the universe is finite (as quantum theory and the Big Bang
suggest, also the lack of real Turing machines), then it is not possible
because a state machine cannot simulate itself. Having the USB drive
simulate all of the universe except itself would resolve this problem, but
then if the USB drive resides outside the universe, how do we read the
result? > > 3. Assuming we overcome this obstacle, it may be
that the program will say how many fingers, but in that case the program
also completely determines my behavior and might not allow me to
answer. > > -- Matt Mahoney, [EMAIL PROTECTED]> > ----- Original Message ---- > From: Eliezer S.
Yudkowsky <[EMAIL PROTECTED]> >
To: [email protected]> Sent: Friday, August 25, 2006 8:08:02 PM >
Subject: Re: [agi] Lossy *&* lossless compression > > Matt
Mahoney wrote: >> >> DEL has a lossy model, and nothing
compresses smaller. Is it smarter >> than PKZip? >>
>> Let me state one more time why a lossless model has more
knowledge. >> If x and x' have the same meaning to a lossy
compressor (they >> compress to identical codes), then the lossy
model only knows >> p(x)+p(x'). A lossless model also knows
p(x) and p(x'). You can >> argue that if x and x' are not
distinguishable then this extra >> knowledge is not
important. But all text strings are distinguishable >> to
humans. > > Suppose I give you a USB drive that contains a
lossless model of the > entire universe excluding the USB drive - a
bitwise copy of all quark > positions and field strengths. >
> (Because deep in your heart, you know that underneath the atoms,
> underneath the quarks, at the uttermost bottom of reality, are tiny
> little XML files...) > > Let's say that you've got the
entire database, and a Python interpreter > that can process it at
any finite speed you care to specify. > > Now write a program
that looks at those endless fields of numbers, and > says how many
fingers I'm holding up behind my back. > > Looks like you'll
have to compress that data first. > > -- > Eliezer S.
Yudkowsky
http://singinst.org/>
Research Fellow, Singularity Institute for Artificial Intelligence >
> ------- > To unsubscribe, change your address, or temporarily
deactivate your subscription, > please go to http://v2.listbox.com/member/[EMAIL PROTECTED]> > > > ------- > To
unsubscribe, change your address, or temporarily deactivate your
subscription, > please go to http://v2.listbox.com/member/[EMAIL PROTECTED]>
To unsubscribe, change your address, or temporarily deactivate your
subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To
unsubscribe, change your address, or temporarily deactivate your
subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To
unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your
subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
|