Re: Language modeling (was Re: [agi] draft for comment)

Matt Mahoney Fri, 05 Sep 2008 15:19:11 -0700

--- On Fri, 9/5/08, Pei Wang <[EMAIL PROTECTED]> wrote:

> NARS indeed can learn semantics before syntax --- see
> http://nars.wang.googlepages.com/wang.roadmap.pdf


Yes, I see this corrects many of the problems with Cyc and with traditional 
language models. I didn't see a description of a mechanism for learning new 
terms in your other paper. Clearly this could be added, although I believe it 
should be a statistical process.

I am interested in determining the computational cost of language modeling. The 
evidence I have so far is that it is high. I believe the algorithmic complexity 
of a model is 10^9 bits. This is consistent with Turing's 1950 prediction that 
AI would require this much memory, with Landauer's estimate of human long term 
memory, and is about how much language a person processes by adulthood assuming 
an information content of 1 bit per character as Shannon estimated in 1950. 
This is why I use a 1 GB data set in my compression benchmark.

However there is a 3 way tradeoff between CPU speed, memory, and model accuracy 
(as measured by compression ratio). I added two graphs to my benchmark at 
http://cs.fit.edu/~mmahoney/compression/text.html (below the main table) which 
shows this clearly. In particular the size-memory tradeoff is an almost 
perfectly straight line (with memory on a log scale) over tests of 104 
compressors. These tests suggest to me that CPU and memory are indeed 
bottlenecks to language modeling. The best models in my tests use simple 
semantic and grammatical models, well below adult human level. The 3 top 
programs on the memory graph map words to tokens using dictionaries that group 
semantically and syntactically related words together, but only one 
(paq8hp12any) uses a semantic space of more than one dimension. All have large 
vocabularies, although not implausibly large for an educated person. Other top 
programs like nanozipltcb and WinRK use smaller dictionaries and
 strictly lexical models. Lesser programs model only at the n-gram level.

I don't yet have an answer to my question, but I believe efficient human-level 
NLP will require hundreds of GB or perhaps 1 TB of memory. The slowest programs 
are already faster than real time, given that equivalent learning in humans 
would take over a decade. I think you could use existing hardware in a 
speed-memory tradeoff to get real time NLP, but it would not be practical for 
doing experiments where each source code change requires training the model 
from scratch. Model development typically requires thousands of tests.


-- Matt Mahoney, [EMAIL PROTECTED]



-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com

Re: Language modeling (was Re: [agi] draft for comment)

Reply via email to