With regard to the computational requirements of AI, there is a very clear 
relation showing that the quality of a language model improves by adding time 
and memory, as shown in the following table:  
http://cs.fit.edu/~mmahoney/compression/text.html

And with the size of the training set, as shown in this graph: 
http://cs.fit.edu/~mmahoney/dissertation/

Before you argue that text compression has nothing to do with AI, please read 
http://cs.fit.edu/~mmahoney/compression/rationale.html

I recognize that language modeling is just one small aspect of AGI.  But 
compression gives us hard numbers to compare the work of over 80 researchers 
spanning decades.  The best performing systems push the hardware to its limits. 
 This, and the evolutionary arguments I gave earlier lead me to believe that 
AGI will require a lot of computing power.  Exactly how much, nobody knows.

Whether or not AGI can be accomplished most efficiently with neural networks is 
an open question.  But the one working system we know of is based on it, and we 
ought to study it.  One critical piece of missing knowledge is the density of 
synapses in the human brain.  I think this could be resolved by putting some 
brain tissue under an electron microscope, but I guess that the number is not 
important to neurobiologists.

I read Pei Wang's paper, http://nars.wang.googlepages.com/wang.AGI-CNN.pdf
Some of the shortcomings of neural networks mentioned only apply to classical 
(feedforward or symmetric) neural networks, not to asymmetric networks with 
recurrent circuits and time delay elements, as exist in the brain.  Such 
circuits allow for short term stable or oscillating states which overcome some 
shortcomings such as the inability to train on multiple goals, which could be 
accomplished by turning parts of the network on or off.  Also, it is not true 
that training has to be offline using multiple passes, as with backpropagation. 
 Human language is structured so that layers can be trained progressively 
without need to search over hidden units.  Word associations like "sun-moon" or 
"to-from" are linear.  Some of the top compressors mentioned above (paq8, 
WinRK) use online, single pass neural networks to combine models, alternating 
prediction and training.

But it is interesting that most of the remaining shortcomings are also 
shortcomings of human thought, such as the inability to insert or represent 
structured knowledge accurately.  This is evidence that our models are correct. 
 This does not mean they are the best answer.  We don't want to duplicate the 
shortcomings of humans.  We do not want to slow down our responses and insert 
errors in order to pass the Turing test (as in Turing's 1950 example).


-- Matt Mahoney, [EMAIL PROTECTED]


-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Reply via email to