--- On Fri, 9/5/08, Pei Wang <[EMAIL PROTECTED]> wrote: > NARS indeed can learn semantics before syntax --- see > http://nars.wang.googlepages.com/wang.roadmap.pdf
Yes, I see this corrects many of the problems with Cyc and with traditional language models. I didn't see a description of a mechanism for learning new terms in your other paper. Clearly this could be added, although I believe it should be a statistical process. I am interested in determining the computational cost of language modeling. The evidence I have so far is that it is high. I believe the algorithmic complexity of a model is 10^9 bits. This is consistent with Turing's 1950 prediction that AI would require this much memory, with Landauer's estimate of human long term memory, and is about how much language a person processes by adulthood assuming an information content of 1 bit per character as Shannon estimated in 1950. This is why I use a 1 GB data set in my compression benchmark. However there is a 3 way tradeoff between CPU speed, memory, and model accuracy (as measured by compression ratio). I added two graphs to my benchmark at http://cs.fit.edu/~mmahoney/compression/text.html (below the main table) which shows this clearly. In particular the size-memory tradeoff is an almost perfectly straight line (with memory on a log scale) over tests of 104 compressors. These tests suggest to me that CPU and memory are indeed bottlenecks to language modeling. The best models in my tests use simple semantic and grammatical models, well below adult human level. The 3 top programs on the memory graph map words to tokens using dictionaries that group semantically and syntactically related words together, but only one (paq8hp12any) uses a semantic space of more than one dimension. All have large vocabularies, although not implausibly large for an educated person. Other top programs like nanozipltcb and WinRK use smaller dictionaries and strictly lexical models. Lesser programs model only at the n-gram level. I don't yet have an answer to my question, but I believe efficient human-level NLP will require hundreds of GB or perhaps 1 TB of memory. The slowest programs are already faster than real time, given that equivalent learning in humans would take over a decade. I think you could use existing hardware in a speed-memory tradeoff to get real time NLP, but it would not be practical for doing experiments where each source code change requires training the model from scratch. Model development typically requires thousands of tests. -- Matt Mahoney, [EMAIL PROTECTED] ------------------------------------------- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244&id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
