|
I agree with your first two
paragraphs.
The Wikipedia in its current form is a good,
relatively compact form of that *knowledge*. It is, however, neither
canonical nor the most accurate representation of that *knowledge" (as
demonstrated by the fact that you could easily find a minor irrelevant deletion
to make a "better" version).
If you're still at the information (i.e. bit)
level, then your claim in the third paragraph is correct but not particularly
useful for your goals relating to knowledge and artificial
intelligence.
Mark
----- Original Message -----
Sent: Sunday, August 27, 2006 3:07
PM
Subject: Re: [agi] Lossy *&* lossless
compression
I
agree that putting Wikipedia in canonical form is a good idea, if it were
possible. My point is that testing whether a string is in canonical form
is a hard problem, just as hard as actually putting it in that form. Or
do you still disagree with that? The definition of a canonical form is
that there is only one way to express an idea. When you say there is
more than one canonical form, I assume you mean there is more than one way to
choose among these representations. I agree. But how do we choose
the "best" form? Keep in mind who will be doing the work. I claim
that Wikipedia in its current form is a valid canonical form of that
information and is the most accurate representation of it.
-- Matt Mahoney, [EMAIL PROTECTED]
-----
Original Message ---- From: Mark Waser <[EMAIL PROTECTED]> To:
[email protected]Sent: Sunday, August 27, 2006 12:36:25 PM Subject:
Re: [agi] Lossy *&* lossless compression
Matt,
Unless you raise the level of
your replies, I won't bother responding any more.
You asked about canonical
forms. I gave a very simple, very clear explanation of why your previous
assertions were nonsensical. However, I will try one last time . . . .
>> I
really only need to know if a string is a canonical representation of
Wikipedia.
OK. A string is
definitely NOT a canonical representation if substrings can be deleted and
nothing is lost (this is just a rephrasing of my duplication statement --
nothing new at all). Clearly, your Wikipedia string is not a canonical
representation of the knowledge in itself since major chunks can be deleted
without losing any knowledge.
The rest of the definition is
that the canonical string must contain all of the knowledge in the original
string. Both of us realize that this is not easy to test (so you don't
need snotty requests for an attachment/program that would effectively solve a
large percentage of the open questions in knowledge representation and
AGI).
>> Oh,
wait... there can only be one canonical form.
False. Badly, badly
wrong in fact. Would you like to retract this statement or should I
assume that you're not worth conversing with until you've gotten up to speed
enough to realize why this is such a silly statement?
-----
Original Message -----
Sent:
Sunday, August 27, 2006 12:00 PM
Subject:
Re: [agi] Lossy *&* lossless compression
Mark,
I didn't get your attachment, the program that tells me if an arbitrary text
string is in canonical form or not. Actually, if it will make it any
easier, I really only need to know if a string is a canonical representation
of Wikipedia. Oh, wait... there can only be one canonical form.
I guess then all you have to do is store the canonical form and compare the
input with it. After you solve this simple, easy problem and send me
the program, I will solve the much harder problem of converting Wikipedia to
canonical form.
-- Matt Mahoney, [EMAIL PROTECTED]
-----
Original Message ---- From: Mark Waser
<[EMAIL PROTECTED]> To: [email protected]Sent: Sunday,
August 27, 2006 11:30:44 AM Subject: Re: [agi] Lossy *&* lossless
compression
I reject your
nonsensical claim.
>> If you claim that this is not in canonical form, then prove it.
Specify a criteria for canonical form, a pass/fail
test.
By definition, a
canonical form should not have duplication. Your data has massive
duplication (particularly when looked at on the knowledge level) and is
therefore not canonical. Simple enough for you?
>> Do you see my point now?
No, all I see if
that you're so invested in lossless
(at the bit-level) compression that you're not even willing to try
to work to get past it.
-----
Original Message -----
Sent:
Saturday, August 26, 2006 9:40 PM
Subject:
Re: [agi] Lossy *&* lossless compression
Suppose
I claim that text8.zip available at http://cs.fit.edu/~mmahoney/compression/textdata.html is
in canonical form. The procedure and a program for generating it is
described at the bottom of that page. The output consists of only
the lowercase letters a-z and spaces. If you claim that this is not
in canonical form, then prove it. Specify a criteria for canonical
form, a pass/fail test. I want an algorithm or a program, no hand
waving or generalities. Input an arbitrary string, output yes or
no.Do you see my point now?
-- Matt Mahoney, [EMAIL PROTECTED]
-----
Original Message ---- From: Mark Waser
<[EMAIL PROTECTED]> To: [email protected]Sent: Saturday,
August 26, 2006 8:52:27 PM Subject: Re: [agi] Lossy *&* lossless
compression
>>
I think that either putting Wikipedia in canonical form, or recognizing
that it is in canonical form, are two equally difficult problems. So
the problem does not go away easily.
Um. I think you missed my point. The
compression program should be able to take the Wikipedia in it's current
form and the decompression program should be able to output it in
canonical form. Make the contestants do all the difficult work, not
the judges. (and recognizing canonical form should be easy, ensuring
it's completeness is likely to be a real problem, but that's what you have
the other contestants for . . . . :-)
-----
Original Message -----
Sent:
Saturday, August 26, 2006 5:33 PM
Subject:
Re: [agi] Lossy *&* lossless compression
I
think that either putting Wikipedia in canonical form, or recognizing
that it is in canonical form, are two equally difficult problems.
So the problem does not go away easily.
-- Matt Mahoney, [EMAIL PROTECTED]
-----
Original Message ---- From: Mark Waser < [EMAIL PROTECTED]> To: [email protected]Sent: Saturday, August 26, 2006
4:51:07 PM Subject: Re: [agi] Lossy *&* lossless
compression
>> Mark suggested putting Wikipedia
in a canonical form, which would remove the distinction between lossless
and lossy compression.
Hmmm. Interesting . . . .
Actually, I didn't suggest exactly that --
though I can see how you got that impression. I suggested that the
decompression program should output the Wikipedia in canonical form
meaning that it would be lossy as far as information is concerned (i.e.
it loses the exact bit sequence of the input) but it would be lossless
as far as knowledge is concerned. Putting the Wikipedia in a
canonical form (or -- developing a good canonical form to put the
Wikipedia into) strikes me as the largest part of the challenge (and
thus, not something that you want to -- or should -- take on as contest
organizers).
----- Original Message -----
Sent: Saturday, August 26, 2006 3:29
PM
Subject: Re: [agi] Lossy *&* lossless
compression
> First let me respond to Boris and Mark. I
agree. Mark suggested putting Wikipedia in a canonical form, which
would remove the distinction between lossless and lossy
compression. This will be hard, but Boris made an important
observation that useful data is generally compressable and useless data
(noise) is not. I don't think the problem can be solved completely
but there is clearly room for improvement. > > Eliezer
suggests putting a model of the universe on a USB drive and then running
the model to predict how many fingers he is holding up. Let's
assume that is possible. Stephen Wolfram suggests the model, if
one exists, might only be a few lines of code. > http://en.wikipedia.org/wiki/A_New_Kind_of_Science> > But we must solve a few other problems
first. > > 1. It may be hard to find such a model. We
cannot tell whether the apparent randomness of quantum mechanics is
truly random or generated by a deterministic, but random appearing
process. This happens in cryptography. The only way to
distinguis between true random data and an encrypted block of zero bits
is to break the decryption. The former is not compressable, the
latter is. > > 2. Assuming we solve this mystery of the
universe and it turns out to be deterministic, we still have the problem
of running the code on a computer that resides within the
universe. If the universe is infinite, then it is possible because
one Turing machine can simulate another. If the universe is finite
(as quantum theory and the Big Bang suggest, also the lack of real
Turing machines), then it is not possible because a state machine cannot
simulate itself. Having the USB drive simulate all of the universe
except itself would resolve this problem, but then if the USB drive
resides outside the universe, how do we read the result? >
> 3. Assuming we overcome this obstacle, it may be that the
program will say how many fingers, but in that case the program also
completely determines my behavior and might not allow me to
answer. > > -- Matt Mahoney, [EMAIL PROTECTED]> > ----- Original Message ---- > From: Eliezer
S. Yudkowsky <[EMAIL PROTECTED]> > To: [email protected]> Sent: Friday, August 25, 2006 8:08:02 PM > Subject:
Re: [agi] Lossy *&* lossless compression > > Matt
Mahoney wrote: >> >> DEL has a lossy model, and
nothing compresses smaller. Is it smarter >> than
PKZip? >> >> Let me state one more time why a
lossless model has more knowledge. >> If x and x' have the same
meaning to a lossy compressor (they >> compress to identical
codes), then the lossy model only knows >> p(x)+p(x'). A
lossless model also knows p(x) and p(x'). You can >>
argue that if x and x' are not distinguishable then this
extra >> knowledge is not important. But all text strings
are distinguishable >> to humans. > > Suppose I
give you a USB drive that contains a lossless model of the >
entire universe excluding the USB drive - a bitwise copy of all quark
> positions and field strengths. > > (Because deep
in your heart, you know that underneath the atoms, > underneath
the quarks, at the uttermost bottom of reality, are tiny > little
XML files...) > > Let's say that you've got the entire
database, and a Python interpreter > that can process it at any
finite speed you care to specify. > > Now write a program
that looks at those endless fields of numbers, and > says how
many fingers I'm holding up behind my back. > > Looks like
you'll have to compress that data first. > > -- >
Eliezer S.
Yudkowsky
http://singinst.org/>
Research Fellow, Singularity Institute for Artificial
Intelligence > > ------- > To unsubscribe, change
your address, or temporarily deactivate your subscription, >
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]> > > > ------- > To
unsubscribe, change your address, or temporarily deactivate your
subscription, > please go to http://v2.listbox.com/member/[EMAIL PROTECTED]>
To unsubscribe, change your address, or temporarily deactivate your
subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To
unsubscribe, change your address, or temporarily deactivate your
subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To
unsubscribe, change your address, or temporarily deactivate your
subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To
unsubscribe, change your address, or temporarily deactivate your
subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To
unsubscribe, change your address, or temporarily deactivate your
subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To
unsubscribe, change your address, or temporarily deactivate your
subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To
unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your
subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
|