I think SHRDLU (Blocks World) would have been more interesting if the language 
model was learned rather than programmed.  There is an important lesson here, 
and Winograd knew it: this route is a dead end.  Adult English has a complexity 
of about 10^9 bits (my estimate).  SHRDLU has a complexity of less than 7 x 
10^5 bits.  (I measured the upper bound by compressing the source code from 
http://hci.stanford.edu/winograd/shrdlu/code/ with paq8f).  One lesson I hope 
we learned is that there is no shortcut around complexity.  We have tried that 
route for 50 years.  There is no "simple" algorithm for AGI.  OpenCyc 1.0 has a 
download size (zip) of 147 MB.

It does not help that words in SHRDLU are grounded in an artificial world.  Its 
failure to scale hints that approaches such as AGI-Sim will have similar 
problems.  You cannot simulate complexity.  I learned this not from studying 
language, but from my dissertation work in a seemingly unrelated area: network 
intrusion detection.  In 1998 and 1999 MIT Lincoln Labs and DARPA developed a 
data set of simulated network traffic with various simulated attacks and ran 
contests to see which intrusion detection systems were best at detecting them.  
They spent probably millions of dollars trying to make the traffic seem 
realistic as possible, simulating hundreds of machines on a local network and 
thousands more on the Internet, generating fake email using word bigram models, 
web page downloads from public sites, etc, based on studies of real traffic.  
My approach was to use anomaly detection - model normal traffic and flag 
anything unusual as suspicious.  The problem turned out to be ridiculously 
easy: look at the first few dozen bytes of each network packet and flag any 
byte value you haven't seen before in that position.  It easily beat every 
system in the original contest.  If only it worked in real traffic.  The result 
of my studies was to basically discredit the data set.  What happened here can 
be explained in terms of algorithmic complexity.  The program that generated 
the artificial traffic was much smaller than the "program" that generates real 
traffic, so that inserting the attacks disproportionally increased the total 
complexity, making the traffic less predictable (or compressable).

In a similar way, SHRDLU performed well in its artificial, simple world.  But 
how would you measure its performance in a real world?  

If we are going to study AGI, we need a way to perform tests and measure 
results.  It is not just that we need to know what works and what doesn't.  The 
systems we build will be too complex to know what we have built.  How would you 
measure them?  The Turing test is the most widely accepted, but it is somewhat 
subjective and not really appropriate for an AGI with sensorimotor I/O.  I have 
proposed text compression.  It gives hard numbers, but it seems limited to 
measuring ungrounded language models.  What else would you use?  Suppose that 
in 10 years, NARS, Novamente, Cyc, and maybe several other
systems all claim to have solved the AGI problem.  How would you test
their claims?  How would you decide the winner?
 
-- Matt Mahoney, [EMAIL PROTECTED]


-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Reply via email to