|
>> You
group the strings into a fixed set and a variable set and concatenate
them. The variable set could be just "I only used red and yellow paint in
the painting", and you compare the CDM replacing "yellow" with "white".
Of course your compressor must be capable of abstract reasoning and have a world
model.
Very nice
example of
"homonculous"/"turtles-all-the-way-down" reasoning.
>> The
problem is that many people do not believe that text compression is related to
AI (even though speech recognition researchers have been evaluating models by
perplexity since the early 1990's).
I believe that
it's related to AI . . . . but that the dumbest models kill intelligent models
every time . . . . which then makes AI useless for text
compression
And bit-level text storage and reproduction is unnecessary for
AI (and adds a lot of needless complexity) . . . .
So why are combining the two?
----- Original Message -----
Sent: Tuesday, August 15, 2006 6:02
PM
Subject: Re: Mahoney/Sampo: [agi] Marcus
Hutter's lossless compression of human knowledge prize
Mark
wrote:
>Huh? By definition, the compressor with the
best language model is the one with the highest compression
ratio. I'm glad we
finally agree :-)
>> You
could use Keogh's compression dissimilarity measure to test for
inconsistency.
I don't think so. Take the following
strings: "I only used red and yellow paint in the painting", "I painted the
rose in my favorite color", "My favorite color is pink", "Orange is created by
mixing red and yellow", "Pink is created by mixing red and white". How
is Keogh's measure going to help you with that? You group the
strings into a fixed set and a variable set and concatenate them. The
variable set could be just "I only used red and yellow paint in the painting",
and you compare the CDM replacing "yellow" with "white". Of course your
compressor must be capable of abstract reasoning and have a world
model. To answer Phil's post: Text compression is only
near the theoretical limts for small files. For large files, there is
progress to be made integrating known syntactic and semantic modeling
techniques into general purpose compressors. The theoretical limit is
about 1 bpc and we are not there yet. See the graph at http://cs.fit.edu/~mmahoney/dissertation/The
proof that I gave that a language model implies passing the Turing test is for
the ideal case where all people share identical models. The ideal case
is deterministic. For the real case where models differ, passing the
test is easier because a judge will attribute some machine errors to normal
human variation. I discuss this in more detail at http://cs.fit.edu/~mmahoney/compression/rationale.html (text
compression is equivalent to AI).It is really hard to get
funding for text compression research (or AI). I had to change my
dissertation topic to network security in 1999 because my advisor had funding
for that. As a postdoc I applied for a $50K NSF grant for a text
compression contest. It was rejected, so I started one without funding
(which we now have). The problem is that many people do not believe that
text compression is related to AI (even though speech recognition researchers
have been evaluating models by perplexity since the early 1990's).
-- Matt Mahoney, [EMAIL PROTECTED]
-----
Original Message ---- From: Mark Waser <[EMAIL PROTECTED]> To:
[email protected]Sent: Tuesday, August 15, 2006 5:00:47 PM Subject:
Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human
knowledge prize
>> You
could use Keogh's compression dissimilarity measure to test for
inconsistency.
I don't think so. Take the following
strings: "I only used red and yellow paint in the painting", "I painted the
rose in my favorite color", "My favorite color is pink", "Orange is created by
mixing red and yellow", "Pink is created by mixing red and white". How
is Keogh's measure going to help you with that?
The problem is that Keogh's measure is intended
for data-mining where you have separate instances, not one big entwined
Gordian knot.
>> Now
if only we had some test to tell which compressors have the best language
models...
Huh? By definition, the compressor with the best
language model is the one with the highest compression ratio.
-----
Original Message -----
Sent:
Tuesday, August 15, 2006 3:54 PM
Subject:
Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human
knowledge prize
You
could use Keogh's compression dissimilarity measure to test for
inconsistency. http://www.cs.ucr.edu/~eamonn/SIGKDD_2004_long.pdf
CDM(x,y) = C(xy)/(C(x)+C(y)). where x and y are strings, and C(x)
means the compressed size of x (lossless). The measure ranges from
about 0.5 if x = y to about 1.0 if x and y do not share any
information. Then, CDM("it is hot", "it is very warm")
< CDM("it is hot", "it is cold"). assuming your compressor uses a
good language model. Now if only we had some test to tell which
compressors have the best language models...
-- Matt Mahoney,
[EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your
subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
|