|
>> If
dumb models kill smart ones in text compression, then how do you know they are
dumb?
They are dumb
because they are inflexible and always use the same very simple rules.
Fortunately, those "dumb" rules are good enough.
>> What
is your objective test of "smart"?
My definition of smart is that it is
1) flexible, 2) shows a wide variety of behaviors, 3) uses many different
rules based upon circumstances to optimize results, and, most importantly, 4) is
successful.
My objective test is that if it fails one or more
of the above criteria, it is dumb.
>> The
fact is that in speech recognition research, language models with a lower
perplexity also have lower word error rates.
Yes, and what does that have to do with the price
of tea in China? You're confusing yourself with irrelevant facts.
Reflexes are dumb but the person with good reflexes is far more likely to
avoid/survive a car crash than someone who has to think when things go
wrong. However, if two people both don't have reflexes, the one with the
faster thought process is more likely to avoid/survive a car crash. I'd
still go with the reflexes everytime though.
>> We have "smart" statistical parsers that are 60% accurate when
trained and tested on manually labeled text. So why haven't we solved the
AI problem?
Because 60% accuracy sucks. Because what you
mean by "accurate" has nothing to do with AI. Because 60% accuracy is NOT
successful and I would call your "smart" statistical parser dumb. I would
also call it dumb because statistical parsers are also generally one-trick
ponies.
>> Who is
smart and who is dumb?
Your "smart" parser is dumb because it doesn't
work. The Google method is dumb because it is inflexible and always uses
the same simple rules. Dumb often works and "really smart" is smart enough
to use dumb when it works.
----- Original Message -----
Sent: Wednesday, August 16, 2006 2:05
PM
Subject: Re: Mahoney/Sampo: [agi] Marcus
Hutter's lossless compression of human knowledge prize
If
dumb models kill smart ones in text compression, then how do you know they are
dumb? What is your objective test of "smart"? The fact is that in
speech recognition research, language models with a lower perplexity also have
lower word error rates. We have "smart" statistical parsers that are
60% accurate when trained and tested on manually labeled text. So why
haven't we solved the AI problem? Meanwhile, a "dumb" model like
matching query words to document words enables Google to answer natural
language queries, while our smart parsers choke when you misspell a
word. Who is smart and who is dumb? -- Matt Mahoney,
[EMAIL PROTECTED]
----- Original Message ---- From: Mark Waser
<[EMAIL PROTECTED]> To: [email protected]Sent: Wednesday,
August 16, 2006 9:17:52 AM Subject: Re: Mahoney/Sampo: [agi] Marcus
Hutter's lossless compression of human knowledge prize
>> You
group the strings into a fixed set and a variable set and concatenate
them. The variable set could be just "I only used red and yellow paint
in the painting", and you compare the CDM replacing "yellow" with "white".
Of course your compressor must be capable of abstract reasoning and
have a world model.
Very nice
example of
"homonculous"/"turtles-all-the-way-down" reasoning.
>> The
problem is that many people do not believe that text compression is related to
AI (even though speech recognition researchers have been evaluating models by
perplexity since the early 1990's).
I believe
that it's related to AI . . . . but that the dumbest models kill intelligent
models every time . . . . which then makes AI useless for text
compression
And bit-level text storage and reproduction is unnecessary
for AI (and adds a lot of needless complexity) . . . .
So why are combining the two?
-----
Original Message -----
Sent:
Tuesday, August 15, 2006 6:02 PM
Subject:
Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human
knowledge prize
Mark
wrote:
>Huh? By definition, the compressor with the
best language model is the one with the highest compression
ratio. I'm glad we
finally agree :-)
>> You could use Keogh's compression dissimilarity measure to test for
inconsistency.
I don't think so. Take the following
strings: "I only used red and yellow paint in the painting", "I painted the
rose in my favorite color", "My favorite color is pink", "Orange is created
by mixing red and yellow", "Pink is created by mixing red and white".
How is Keogh's measure going to help you with that? You
group the strings into a fixed set and a variable set and concatenate
them. The variable set could be just "I only used red and yellow paint
in the painting", and you compare the CDM replacing "yellow" with "white".
Of course your compressor must be capable of abstract reasoning and
have a world model. To answer Phil's post: Text
compression is only near the theoretical limts for small files. For
large files, there is progress to be made integrating known syntactic and
semantic modeling techniques into general purpose compressors. The
theoretical limit is about 1 bpc and we are not there yet. See the
graph at http://cs.fit.edu/~mmahoney/dissertation/The
proof that I gave that a language model implies passing the Turing test is
for the ideal case where all people share identical models. The ideal
case is deterministic. For the real case where models differ, passing
the test is easier because a judge will attribute some machine errors to
normal human variation. I discuss this in more detail at http://cs.fit.edu/~mmahoney/compression/rationale.html
(text compression is equivalent to AI).It is really hard to
get funding for text compression research (or AI). I had to change my
dissertation topic to network security in 1999 because my advisor had
funding for that. As a postdoc I applied for a $50K NSF grant for a
text compression contest. It was rejected, so I started one without
funding (which we now have). The problem is that many people do not
believe that text compression is related to AI (even though speech
recognition researchers have been evaluating models by perplexity since the
early 1990's).
-- Matt Mahoney, [EMAIL PROTECTED]
-----
Original Message ---- From: Mark Waser
<[EMAIL PROTECTED]> To: [email protected]Sent: Tuesday,
August 15, 2006 5:00:47 PM Subject: Re: Mahoney/Sampo: [agi] Marcus
Hutter's lossless compression of human knowledge prize
>> You could use Keogh's compression dissimilarity measure to test for
inconsistency.
I don't think so. Take the following
strings: "I only used red and yellow paint in the painting", "I painted the
rose in my favorite color", "My favorite color is pink", "Orange is created
by mixing red and yellow", "Pink is created by mixing red and white".
How is Keogh's measure going to help you with that?
The problem is that Keogh's measure is intended
for data-mining where you have separate instances, not one big entwined
Gordian knot.
>> Now if only we had some test to tell which compressors have the best
language models...
Huh? By definition, the compressor with the
best language model is the one with the highest compression
ratio.
-----
Original Message -----
Sent:
Tuesday, August 15, 2006 3:54 PM
Subject:
Re: Mahoney/Sampo: [agi] Marcus Hutter's lossless compression of human
knowledge prize
You
could use Keogh's compression dissimilarity measure to test for
inconsistency. http://www.cs.ucr.edu/~eamonn/SIGKDD_2004_long.pdf
CDM(x,y) = C(xy)/(C(x)+C(y)). where x and y are strings, and C(x)
means the compressed size of x (lossless). The measure ranges from
about 0.5 if x = y to about 1.0 if x and y do not share any
information. Then, CDM("it is hot", "it is very warm")
< CDM("it is hot", "it is cold"). assuming your compressor uses
a good language model. Now if only we had some test to tell which
compressors have the best language models...
-- Matt Mahoney,
[EMAIL PROTECTED] To
unsubscribe, change your address, or temporarily deactivate your
subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To
unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your
subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]
|