RE: Language modeling (was Re: [agi] draft for comment)
From: Matt Mahoney [mailto:[EMAIL PROTECTED] --- On Sun, 9/7/08, John G. Rose [EMAIL PROTECTED] wrote: From: John G. Rose [EMAIL PROTECTED] Subject: RE: Language modeling (was Re: [agi] draft for comment) To: agi@v2.listbox.com Date: Sunday, September 7, 2008, 9:15 AM From: Matt Mahoney [mailto:[EMAIL PROTECTED] --- On Sat, 9/6/08, John G. Rose [EMAIL PROTECTED] wrote: Compression in itself has the overriding goal of reducing storage bits. Not the way I use it. The goal is to predict what the environment will do next. Lossless compression is a way of measuring how well we are doing. Predicting the environment in order to determine which data to pack where, thus achieving higher compression ratio. Or compression as an integral part of prediction? Some types of prediction are inherently compressed I suppose. Predicting the environment to maximize reward. Hutter proved that universal intelligence is a compression problem. The optimal behavior of an AIXI agent is to guess the shortest program consistent with observation so far. That's algorithmic compression. Oh I see. Guessing shortest program = compression. OK right. But yeah like Pei said the word compression is misleading. It implies a reduction where you are actually increasing understanding :) John --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
RE: Language modeling (was Re: [agi] draft for comment)
From: Matt Mahoney [mailto:[EMAIL PROTECTED] --- On Sat, 9/6/08, John G. Rose [EMAIL PROTECTED] wrote: Compression in itself has the overriding goal of reducing storage bits. Not the way I use it. The goal is to predict what the environment will do next. Lossless compression is a way of measuring how well we are doing. Predicting the environment in order to determine which data to pack where, thus achieving higher compression ratio. Or compression as an integral part of prediction? Some types of prediction are inherently compressed I suppose. John --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
RE: Language modeling (was Re: [agi] draft for comment)
--- On Sun, 9/7/08, John G. Rose [EMAIL PROTECTED] wrote: From: John G. Rose [EMAIL PROTECTED] Subject: RE: Language modeling (was Re: [agi] draft for comment) To: agi@v2.listbox.com Date: Sunday, September 7, 2008, 9:15 AM From: Matt Mahoney [mailto:[EMAIL PROTECTED] --- On Sat, 9/6/08, John G. Rose [EMAIL PROTECTED] wrote: Compression in itself has the overriding goal of reducing storage bits. Not the way I use it. The goal is to predict what the environment will do next. Lossless compression is a way of measuring how well we are doing. Predicting the environment in order to determine which data to pack where, thus achieving higher compression ratio. Or compression as an integral part of prediction? Some types of prediction are inherently compressed I suppose. Predicting the environment to maximize reward. Hutter proved that universal intelligence is a compression problem. The optimal behavior of an AIXI agent is to guess the shortest program consistent with observation so far. That's algorithmic compression. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: AI isn't cheap (was Re: Real vs. simulated environments (was Re: [agi] draft for comment.. P.S.))
/aixigentle.htm 3. Legg, Shane, (2006), Is There an Elegant Universal Theory of Prediction?, Technical Report IDSIA-12-06, IDSIA / USI-SUPSI, Dalle Molle Institute for Artificial Intelligence, Galleria 2, 6928 Manno, Switzerland. http://www.vetta.org/documents/IDSIA-12-06-1.pdf -- Matt Mahoney, [EMAIL PROTECTED] --- On *Sat, 9/6/08, Steve Richfield [EMAIL PROTECTED]* wrote: From: Steve Richfield [EMAIL PROTECTED] Subject: Re: AI isn't cheap (was Re: Real vs. simulated environments (was Re: [agi] draft for comment.. P.S.)) To: agi@v2.listbox.com Date: Saturday, September 6, 2008, 2:58 PM Matt, I heartily disagree with your view as expressed here, and as stated to my by heads of CS departments and other high ranking CS PhDs, nearly (but not quite) all of whom have lost the fire in the belly that we all once had for CS/AGI. I DO agree that CS is like every other technological endeavor, in that almost everything that can be done as a PhD thesis has already been done. but there is a HUGE gap between a PhD thesis scale project and what that same person can do with another few more millions and a couple more years, especially if allowed to ignore the naysayers. The reply is a even more complex than your well documented statement, but I'll take my best shot at it, time permitting. Here, the angel is in the details. On 9/5/08, Matt Mahoney [EMAIL PROTECTED] wrote: --- On Fri, 9/5/08, Steve Richfield [EMAIL PROTECTED] wrote: I think that a billion or so, divided up into small pieces to fund EVERY disparate approach to see where the low hanging fruit is, would go a LONG way in guiding subsequent billions. I doubt that it would take a trillion to succeed. Sorry, the low hanging fruit was all picked by the early 1960's. By then we had neural networks [1,6,7,11,12], ... but we STILL do not have any sort of useful *unsupervised* NN, the equivalent of which seems to be needed for any good AGI. Note my recent postings about a potential theory of everything that would most directly hit unsupervised NN, providing not only a good way of operating these, but possibly the provably best way of operating. natural language processing and language translation [2], My Dr. Eliza is right there and showing that useful understanding out of precise context is almost certainly impossible. I regularly meet with the folks working on the Russian translator project, and rest assured, things are STILL advancing fairly rapidly. Here, there is continuing funding, and I expect that the Russian translator will eventually succeed (they already claim success). models of human decision making [3], These are curious but I believe them to be an emergent properties of processes that we don't understand at all, so they have no value other than for testing of future systems. Note that human decision making does NOT generally include many advanced sorts of logic that simply don't occur to ordinary humans, which is where an AGI could shine. Hence, understanding the human but not the non-human processes is nearly worthless. automatic theorem proving [4,8,10], Great for when you already have the answer - but what is it good for?! natural language databases [5], Which are only useful if/when the provably false presumption is true that NL understanding is generally possible. game playing programs [9,13], Note relevant for AGI. optical character recognition [14], Only recently have methods emerged that are truly font-independent. This SHOULD have been accomplished long ago (like shortly after your 1960 reference), but no one wanted to throw significant money at it. I nearly launched an OCR company (Cognitext) in 1981, but funding eventually failed *because* I had done the research and had a new (but *un*proven) method that was truly font-independent. handwriting and speech recognition [15], ... both of which are now good enough for AI interaction (e.g. my Gracie speech I/O interface to Dr. Eliza), but NOT good enough for general dictation. Unfortunately, the methods used don't seem to shed much light on how the underlying processes work in us. and important theoretical work [16,17,18]. Note again my call for work/help on what I call computing's theory of everything leveraging off of principal component analysis. Since then we have had mostly just incremental improvements. YES. This only shows that the support process has long been broken. and NOT that there isn't a LOT of value that is just out of reach of PhD-sized projects. Big companies like Google and Microsoft have strong incentives to develop AI Internal politics at both (that I have personally run into) restrict expenditures to PROVEN methods, as a single technical failure spells doom for the careers of everyone working on them. Hence, their RD is all D and no R. and have billions to spend. Not one dollar of which goes into what I would call genuine research. Maybe the problem
Re: [agi] draft for comment
Pei:As I said before, you give symbol a very narrow meaning, and insist that it is the only way to use it. In the current discussion, symbols are not 'X', 'Y', 'Z', but 'table', 'time', 'intelligence'. BTW, what images you associate with the latter two? Since you prefer to use person as example, let me try the same. All of my experience about 'Mike Tintner' is symbolic, nothing visual, but it still makes you real enough to me... I'm sorry if it sounds rude Pei, You attribute to symbols far too broad powers that they simply don't have - and demonstrably, scientifically, don't have. For example, you think that your experience of Mike Tintner - the rude guy - is entirely symbolic. Yes, all your experience of me has been mediated entirely via language/symbols -these posts. But by far the most important parts of it have actually been images. Ridiculous, huh? Look at this sentence: If you want to hear about it, you'll probably want to know where I was born, and what a lousy childhood I had, and how my parents were occupied before they had me, and all the David Copperfield crap, but if you want to know the truth, I don't really want to get into it. In 60 words, one of the great opening sentences of a novel, Salinger has created a whole character. How? He did it by creating a voice. He did it by what is called prosody (and also diction). No current AGI method has the least idea of how to process that prosody. But your brain does. Pei doesn't. But his/your brain does. And your experience of MT has been heavily based similarly on processing the *sound* images - the voice behind my words. Hence your I'm sorry if it *sounds* rude.. Words, even written words, aren't just symbols, they are sounds. And your brain hears those sounds and from their music can tell many, many things, including the emotions of the speaker, and whether they're being angry or ironic or rude. Now, if you had had more of a literary/arts education, you would probably be alive to that dimension. But, as it is, you've missed it, and you're missing all kinds of dimensions of how symbols work. Similarly, if you had more of a visual education, and also more of a psychological developmental background, you wouldn't find time and intelligence so daunting to visualise. You would realise that it takes a great deal of time and preparatory sensory/imaginative to build up abstract concepts You would realise that it takes time for an infant to come to use that word, and still more for a child to understand the word intelligence. I doubt that any child will understand time before they've seen a watch or clock, and that's what they will probably visualise time as, first. Your capacity to abstract time still further, will have come from having become gradually acquainted with a whole range of time-measuring devices, and seeing the word time and associating that with many other kinds of measurement especially in relation to maths. and science. Similarly, a person's concept of intelligence will come from seeing and hearing people solving problems in different ways - quickly and slowly, for example.. It will be deeply grounded in sensory images and experience. All the most abstract maths and logic that you may think totally abstract are similarly and necessarily grounded. Ben, in parallel to you, didn't realise that the decimal numeral system is digital, based on the hand, and so, a little less obviously, is the roman numeral system. Numbers and logic have to be built up out of experience. [You might profit BTW by looking at Barsalou, [many of his papers online], to see how the mind modally simulates concepts - with lots of experimental evidence] I, as you know, am very ignorant about computers; but you are also very ignorant about all kinds of dimensions of how symbols work, and intelligence generally, that are absolutely essential for AGI. You can continue to look down on me, or you can open your mind, recognize that general intelligence can only be achieved by a confluence of disciplines way beyond the reach of any single individual, and see that maybe useful exchanges can take place. --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
Mike, If you think your AGI know-how is superior to the know-how of those who already built testable thinking machines then why don't you try to build one yourself? Maybe you would learn more that way than when spending significant amount of time trying to sort out great incompatibilities between your views and views of the other AGI researchers. If you don't have resources to build the system then, perhaps, you could just put together some architecture doc (including your definitions of important terms) for your as-simple-as-possible AGI. The talk could then be more specific/interesting/fruitful for everyone involved. Sorry if I'm missing something. I'm reading this list only occasionally. But when I get to your posts, I often see things very differently and I know I'm not alone. I guess, if you try to view things from developers perspective + if you systematically move forward improving a particular AGI design, your views would change drastically. Just my opinion.. Regards, Jiri Jelinek --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
RE: Language modeling (was Re: [agi] draft for comment)
Thinking out loud here as I find the relationship between compression and intelligence interesting: Compression in itself has the overriding goal of reducing storage bits. Intelligence has coincidental compression. There is resource management there. But I do think that it is not ONLY coincidental. Knowledge has structure which can be organized and naturally can collapse into a lower complexity storage state. Things have order, based on physics and other mathematical relationships. The relationship between compression and stored knowledge and intelligence is intriguing. But knowledge can be compressed inefficiently to where it inhibits extraction and other operations so there are differences with compression and intelligence related to computational expense. Optimal intelligence would have a variational compression structure IOW some stuff needs fast access time with minimal decompression resource expenditure and other stuff has high storage priority but computational expense and access time are not a priority. And then when you say the word compression there is a complicity of utility. The result of a compressor that has general intelligence still has a goal of reducing storage bits. I think that compression can be a byproduct of the stored knowledge created by a general intelligence. But if you have a compressor with general intelligence built in and you assign it a goal of taking input data and reducing the storage space it still may result in a series of hacks because that may be the best way of accomplishing that goal. Sure there may be some new undiscovered hacks that require general intelligence to uncover. And a compressor that is generally intelligent may produce more rich lossily compressed data from varied sources. The best lossy compressor is probably generally intelligent. They are very similar as you indicate... but when you start getting real lossy, when you start asking questions from your lossy compressed data that are not related to just the uncompressed input there is a difference there. Compression itself is just one dimensional. Intelligence is multi. John -Original Message- From: Matt Mahoney [mailto:[EMAIL PROTECTED] Sent: Friday, September 05, 2008 6:39 PM To: agi@v2.listbox.com Subject: Re: Language modeling (was Re: [agi] draft for comment) --- On Fri, 9/5/08, Pei Wang [EMAIL PROTECTED] wrote: Like to many existing AI works, my disagreement with you is not that much on the solution you proposed (I can see the value), but on the problem you specified as the goal of AI. For example, I have no doubt about the theoretical and practical values of compression, but don't think it has much to do with intelligence. In http://cs.fit.edu/~mmahoney/compression/rationale.html I explain why text compression is an AI problem. To summarize, if you know the probability distribution of text, then you can compute P(A|Q) for any question Q and answer A to pass the Turing test. Compression allows you to precisely measure the accuracy of your estimate of P. Compression (actually, word perplexity) has been used since the early 1990's to measure the quality of language models for speech recognition, since it correlates well with word error rate. The purpose of this work is not to solve general intelligence, such as the universal intelligence proposed by Legg and Hutter [1]. That is not computable, so you have to make some arbitrary choice with regard to test environments about what problems you are going to solve. I believe the goal of AGI should be to do useful work for humans, so I am making a not so arbitrary choice to solve a problem that is central to what most people regard as useful intelligence. I had hoped that my work would lead to an elegant theory of AI, but that hasn't been the case. Rather, the best compression programs were developed as a series of thousands of hacks and tweaks, e.g. change a 4 to a 5 because it gives 0.002% better compression on the benchmark. The result is an opaque mess. I guess I should have seen it coming, since it is predicted by information theory (e.g. [2]). Nevertheless the architectures of the best text compressors are consistent with cognitive development models, i.e. phoneme (or letter) sequences - lexical - semantics - syntax, which are themselves consistent with layered neural architectures. I already described a neural semantic model in my last post. I also did work supporting Hutchens and Alder showing that lexical models can be learned from n- gram statistics, consistent with the observation that babies learn the rules for segmenting continuous speech before they learn any words [3]. I agree it should also be clear that semantics is learned before grammar, contrary to the way artificial languages are processed. Grammar requires semantics, but not the other way around. Search engines work using semantics only. Yet we cannot parse sentences like I ate pizza with Bob, I
Re: AI isn't cheap (was Re: Real vs. simulated environments (was Re: [agi] draft for comment.. P.S.))
Matt, I heartily disagree with your view as expressed here, and as stated to my by heads of CS departments and other high ranking CS PhDs, nearly (but not quite) all of whom have lost the fire in the belly that we all once had for CS/AGI. I DO agree that CS is like every other technological endeavor, in that almost everything that can be done as a PhD thesis has already been done. but there is a HUGE gap between a PhD thesis scale project and what that same person can do with another few more millions and a couple more years, especially if allowed to ignore the naysayers. The reply is a even more complex than your well documented statement, but I'll take my best shot at it, time permitting. Here, the angel is in the details. On 9/5/08, Matt Mahoney [EMAIL PROTECTED] wrote: --- On Fri, 9/5/08, Steve Richfield [EMAIL PROTECTED] wrote: I think that a billion or so, divided up into small pieces to fund EVERY disparate approach to see where the low hanging fruit is, would go a LONG way in guiding subsequent billions. I doubt that it would take a trillion to succeed. Sorry, the low hanging fruit was all picked by the early 1960's. By then we had neural networks [1,6,7,11,12], ... but we STILL do not have any sort of useful *unsupervised* NN, the equivalent of which seems to be needed for any good AGI. Note my recent postings about a potential theory of everything that would most directly hit unsupervised NN, providing not only a good way of operating these, but possibly the provably best way of operating. natural language processing and language translation [2], My Dr. Eliza is right there and showing that useful understanding out of precise context is almost certainly impossible. I regularly meet with the folks working on the Russian translator project, and rest assured, things are STILL advancing fairly rapidly. Here, there is continuing funding, and I expect that the Russian translator will eventually succeed (they already claim success). models of human decision making [3], These are curious but I believe them to be an emergent properties of processes that we don't understand at all, so they have no value other than for testing of future systems. Note that human decision making does NOT generally include many advanced sorts of logic that simply don't occur to ordinary humans, which is where an AGI could shine. Hence, understanding the human but not the non-human processes is nearly worthless. automatic theorem proving [4,8,10], Great for when you already have the answer - but what is it good for?! natural language databases [5], Which are only useful if/when the provably false presumption is true that NL understanding is generally possible. game playing programs [9,13], Note relevant for AGI. optical character recognition [14], Only recently have methods emerged that are truly font-independent. This SHOULD have been accomplished long ago (like shortly after your 1960 reference), but no one wanted to throw significant money at it. I nearly launched an OCR company (Cognitext) in 1981, but funding eventually failed * because* I had done the research and had a new (but *un*proven) method that was truly font-independent. handwriting and speech recognition [15], ... both of which are now good enough for AI interaction (e.g. my Gracie speech I/O interface to Dr. Eliza), but NOT good enough for general dictation. Unfortunately, the methods used don't seem to shed much light on how the underlying processes work in us. and important theoretical work [16,17,18]. Note again my call for work/help on what I call computing's theory of everything leveraging off of principal component analysis. Since then we have had mostly just incremental improvements. YES. This only shows that the support process has long been broken. and NOT that there isn't a LOT of value that is just out of reach of PhD-sized projects. Big companies like Google and Microsoft have strong incentives to develop AI Internal politics at both (that I have personally run into) restrict expenditures to PROVEN methods, as a single technical failure spells doom for the careers of everyone working on them. Hence, their RD is all D and no R. and have billions to spend. Not one dollar of which goes into what I would call genuine research. Maybe the problem really is hard. ... and maybe it is just a little difficult. My own Dr. Eliza program has seemingly unbelievable NL-stated problem solving capabilities, but is built mostly on the same sort of 1960s technology you cited. Why wasn't it built before 1970? I see two simple reasons: 1. Joe Weizenbaum, in his *Computer Power and Human Reason,* explained why this approach could never work. That immediately made it impossible to get any related effort funded or acceptable in a university setting. 2. It took about a year to make a demonstrable real-world NL problem solving system, which would have been at the outer reaches of a PhD or casual personal project. I have
Re: Language modeling (was Re: [agi] draft for comment)
--- On Fri, 9/5/08, Pei Wang [EMAIL PROTECTED] wrote: Thanks for taking the time to explain your ideas in detail. As I said, our different opinions on how to do AI come from our very different understanding of intelligence. I don't take passing Turing Test as my research goal (as explained in http://nars.wang.googlepages.com/wang.logic_intelligence.pdf and http://nars.wang.googlepages.com/wang.AI_Definitions.pdf). I disagree with Hutter's approach, not because his SOLUTION is not computable, but because his PROBLEM is too idealized and simplified to be relevant to the actual problems of AI. I don't advocate the Turing test as the ideal test of intelligence. Turing himself was aware of the problem when he gave an example of a computer answering an arithmetic problem incorrectly in his famous 1950 paper: Q: Please write me a sonnet on the subject of the Forth Bridge. A: Count me out on this one. I never could write poetry. Q: Add 34957 to 70764. A: (Pause about 30 seconds and then give as answer) 105621. Q: Do you play chess? A: Yes. Q: I have K at my K1, and no other pieces. You have only K at K6 and R at R1. It is your move. What do you play? A: (After a pause of 15 seconds) R-R8 mate. I prefer a preference test, which a machine passes if you prefer to talk to it over a human. Such a machine would be too fast and make too few errors to pass a Turing test. For example, if you had to add two large numbers, I think you would prefer to use a calculator than ask someone. You could, I suppose, measure intelligence as the fraction of questions for which the machine gives the preferred answer, which would be 1/4 in Turing's example. If you know the probability distribution P of text, and therefore know the distribution P(A|Q) for any question Q and answer A, then to pass the Turing test you would randomly choose answers from this distribution. But to pass the preference test for all Q, you would choose A that maximizes P(A|Q) because the most probable answer is usually the correct one. Text compression measures progress toward either test. I believe that compression measures your definition of intelligence, i.e. adaptation given insufficient knowledge and resources. In my benchmark, there are two parts: the size of the decompression program, which measures the initial knowledge, and the compressed size, which measures prediction errors that occur as the system adapts. Programs must also meet practical time and memory constraints to be listed in most benchmarks. Compression is also consistent with Legg and Hutter's universal intelligence, i.e. expected reward of an AIXI universal agent in an environment simulated by a random program. Suppose you have a compression oracle that inputs any string x and outputs the shortest program that outputs a string with prefix x. Then this reduces the (uncomputable) AIXI problem to using the oracle to guess which environment is consistent with the interaction so far, and figuring out which future outputs by the agent will maximize reward. Of course universal intelligence is also not testable because it requires an infinite number of environments. Instead, we have to choose a practical data set. I use Wikipedia text, which has fewer errors than average text, but I believe that is consistent with my goal of passing the preference test. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
RE: Language modeling (was Re: [agi] draft for comment)
--- On Sat, 9/6/08, John G. Rose [EMAIL PROTECTED] wrote: Compression in itself has the overriding goal of reducing storage bits. Not the way I use it. The goal is to predict what the environment will do next. Lossless compression is a way of measuring how well we are doing. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: Language modeling (was Re: [agi] draft for comment)
I won't argue against your preference test here, since this is a big topic, and I've already made my position clear in the papers I mentioned. As for compression, yes every intelligent system needs to 'compress' its experience in the sense of keeping the essence but using less space. However, it is clearly not loseless. It is even not what we usually call loosy compression, because what to keep and in what form is highly context-sensitive. Consequently, this process is not reversible --- no decompression, though the result can be applied in various ways. Therefore I prefer not to call it compression to avoid confusing this process with the technical sense of compression, which is reversible, at least approximately. Legg and Hutter's universal intelligence definition is way too narrow to cover various attempts towards AI, even as an idealization. Therefore, I don't take it as a goal to aim at and to approach to as close as possible. However, as I said before, I'd rather leave this topic for the future, when I have enough time to give it a fair treatment. Pei On Sat, Sep 6, 2008 at 4:29 PM, Matt Mahoney [EMAIL PROTECTED] wrote: --- On Fri, 9/5/08, Pei Wang [EMAIL PROTECTED] wrote: Thanks for taking the time to explain your ideas in detail. As I said, our different opinions on how to do AI come from our very different understanding of intelligence. I don't take passing Turing Test as my research goal (as explained in http://nars.wang.googlepages.com/wang.logic_intelligence.pdf and http://nars.wang.googlepages.com/wang.AI_Definitions.pdf). I disagree with Hutter's approach, not because his SOLUTION is not computable, but because his PROBLEM is too idealized and simplified to be relevant to the actual problems of AI. I don't advocate the Turing test as the ideal test of intelligence. Turing himself was aware of the problem when he gave an example of a computer answering an arithmetic problem incorrectly in his famous 1950 paper: Q: Please write me a sonnet on the subject of the Forth Bridge. A: Count me out on this one. I never could write poetry. Q: Add 34957 to 70764. A: (Pause about 30 seconds and then give as answer) 105621. Q: Do you play chess? A: Yes. Q: I have K at my K1, and no other pieces. You have only K at K6 and R at R1. It is your move. What do you play? A: (After a pause of 15 seconds) R-R8 mate. I prefer a preference test, which a machine passes if you prefer to talk to it over a human. Such a machine would be too fast and make too few errors to pass a Turing test. For example, if you had to add two large numbers, I think you would prefer to use a calculator than ask someone. You could, I suppose, measure intelligence as the fraction of questions for which the machine gives the preferred answer, which would be 1/4 in Turing's example. If you know the probability distribution P of text, and therefore know the distribution P(A|Q) for any question Q and answer A, then to pass the Turing test you would randomly choose answers from this distribution. But to pass the preference test for all Q, you would choose A that maximizes P(A|Q) because the most probable answer is usually the correct one. Text compression measures progress toward either test. I believe that compression measures your definition of intelligence, i.e. adaptation given insufficient knowledge and resources. In my benchmark, there are two parts: the size of the decompression program, which measures the initial knowledge, and the compressed size, which measures prediction errors that occur as the system adapts. Programs must also meet practical time and memory constraints to be listed in most benchmarks. Compression is also consistent with Legg and Hutter's universal intelligence, i.e. expected reward of an AIXI universal agent in an environment simulated by a random program. Suppose you have a compression oracle that inputs any string x and outputs the shortest program that outputs a string with prefix x. Then this reduces the (uncomputable) AIXI problem to using the oracle to guess which environment is consistent with the interaction so far, and figuring out which future outputs by the agent will maximize reward. Of course universal intelligence is also not testable because it requires an infinite number of environments. Instead, we have to choose a practical data set. I use Wikipedia text, which has fewer errors than average text, but I believe that is consistent with my goal of passing the preference test. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed:
Re: AI isn't cheap (was Re: Real vs. simulated environments (was Re: [agi] draft for comment.. P.S.))
Steve, where are you getting your cost estimate for AGI? Is it a gut feeling, or something like the common management practice of I can afford $X so it will cost $X? My estimate of $10^15 is based on the value of the world economy, US $66 trillion per year and increasing 5% annually over the next 30 years, which is how long it will take for the internet to grow to the computational power of 10^10 human brains (at 10^15 bits and 10^16 OPS each) at the current rate of growth, doubling every couple of years. Even if you disagree with these numbers by a factor of 1000, it only moves the time to AGI by a few years, so the cost estimate hardly changes. And even if the hardware is free, you still have to program or teach about 10^16 to 10^17 bits of knowledge, assuming 10^9 bits of knowledge per brain [1] and 1% to 10% of this is not known by anyone else. Software and training costs are not affected by Moore's law. Even if we assume human level language understanding and perfect sharing of knowledge, the training cost will be 1% to 10% of your working life to train the AGI to do your job. Also, we have made *some* progress toward AGI since 1965, but it is mainly a better understanding of why it is so hard, e.g. - We know that general intelligence is not computable [2] or provable [3]. There is no neat theory. - From Cyc, we know that coding common sense is more than a 20 year effort. Lenat doesn't know how much more, but guesses it is maybe between 0.1% and 10% finished. - Google is the closest we have to AI after a half trillion dollar effort. 1. Landauer, Tom (1986), “How much do people remember? Some estimates of the quantity of learned information in long term memory”, Cognitive Science (10) pp. 477-493. 2. Hutter, Marcus (2003), A Gentle Introduction to The Universal Algorithmic Agent {AIXI}, in Artificial General Intelligence, B. Goertzel and C. Pennachin eds., Springer. http://www.idsia.ch/~marcus/ai/aixigentle.htm 3. Legg, Shane, (2006), Is There an Elegant Universal Theory of Prediction?, Technical Report IDSIA-12-06, IDSIA / USI-SUPSI, Dalle Molle Institute for Artificial Intelligence, Galleria 2, 6928 Manno, Switzerland. http://www.vetta.org/documents/IDSIA-12-06-1.pdf -- Matt Mahoney, [EMAIL PROTECTED] --- On Sat, 9/6/08, Steve Richfield [EMAIL PROTECTED] wrote: From: Steve Richfield [EMAIL PROTECTED] Subject: Re: AI isn't cheap (was Re: Real vs. simulated environments (was Re: [agi] draft for comment.. P.S.)) To: agi@v2.listbox.com Date: Saturday, September 6, 2008, 2:58 PM Matt, I heartily disagree with your view as expressed here, and as stated to my by heads of CS departments and other high ranking CS PhDs, nearly (but not quite) all of whom have lost the fire in the belly that we all once had for CS/AGI. I DO agree that CS is like every other technological endeavor, in that almost everything that can be done as a PhD thesis has already been done. but there is a HUGE gap between a PhD thesis scale project and what that same person can do with another few more millions and a couple more years, especially if allowed to ignore the naysayers. The reply is a even more complex than your well documented statement, but I'll take my best shot at it, time permitting. Here, the angel is in the details. On 9/5/08, Matt Mahoney [EMAIL PROTECTED] wrote: --- On Fri, 9/5/08, Steve Richfield [EMAIL PROTECTED] wrote: I think that a billion or so, divided up into small pieces to fund EVERY disparate approach to see where the low hanging fruit is, would go a LONG way in guiding subsequent billions. I doubt that it would take a trillion to succeed. Sorry, the low hanging fruit was all picked by the early 1960's. By then we had neural networks [1,6,7,11,12], ... but we STILL do not have any sort of useful unsupervised NN, the equivalent of which seems to be needed for any good AGI. Note my recent postings about a potential theory of everything that would most directly hit unsupervised NN, providing not only a good way of operating these, but possibly the provably best way of operating. natural language processing and language translation [2], My Dr. Eliza is right there and showing that useful understanding out of precise context is almost certainly impossible. I regularly meet with the folks working on the Russian translator project, and rest assured, things are STILL advancing fairly rapidly. Here, there is continuing funding, and I expect that the Russian translator will eventually succeed (they already claim success). models of human decision making [3], These are curious but I believe them to be an emergent properties of processes that we don't understand at all, so they have no value other than for testing of future systems. Note that human decision making does NOT generally include many advanced sorts of logic that simply don't occur to ordinary humans, which is where an AGI could shine
Re: Language modeling (was Re: [agi] draft for comment)
--- On Sat, 9/6/08, Pei Wang [EMAIL PROTECTED] wrote: As for compression, yes every intelligent system needs to 'compress' its experience in the sense of keeping the essence but using less space. However, it is clearly not loseless. It is even not what we usually call loosy compression, because what to keep and in what form is highly context-sensitive. Consequently, this process is not reversible --- no decompression, though the result can be applied in various ways. Therefore I prefer not to call it compression to avoid confusing this process with the technical sense of compression, which is reversible, at least approximately. I think you misunderstand my use of compression. The goal is modeling or prediction. Given a string, predict the next symbol. I use compression to estimate how accurate the model is. It is easy to show that if your model is accurate, then when you connect your model to an ideal coder (such as an arithmetic coder), then compression will be optimal. You could actually skip the coding step, but it is cheap, so I use it so that there is no question of making a mistake in the measurement. If a bug in the coder produces a too small output, then the decompression step won't reproduce the original file. In fact, many speech recognition experiments do skip the coding step in their tests and merely calculate what the compressed size would be. (More precisely, they calculate word perplexity, which is equivalent). The goal of speech recognition is to find the text y that maximizes P(y|x) for utterance x. It is common to factor the model using Bayes law: P(y|x) = P(x|y)P(y)/P(x). We can drop P(x) since it is constant, leaving the acoustic model P(x|y) and language model P(y) to evaluate. We know from experiments that compression tests on P(y) correlate well with word error rates for the overall system. Internally, all lossless compressors use lossy compression or data reduction to make predictions. Most commonly, a context is truncated and possibly hashed before looking up the statistics for the next symbol. The top lossless compressors in my benchmark use more sophisticated forms of data reduction, such as mapping upper and lower case letters together, or mapping groups of semantically or syntactically related words to the same context. As a test, lossless compression is only appropriate for text. For other hard AI problems such as vision, art, and music, incompressible noise would overwhelm the human-perceptible signal. Theoretically you could compress video to 2 bits per second (the rate of human long term memory) by encoding it as a script. The decompressor would read the script and create a new movie. The proper test would be lossy compression, but this requires human judgment to evaluate how well the reconstructed data matches the original. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Language modeling (was Re: [agi] draft for comment)
--- On Thu, 9/4/08, Pei Wang [EMAIL PROTECTED] wrote: I guess you still see NARS as using model-theoretic semantics, so you call it symbolic and contrast it with system with sensors. This is not correct --- see http://nars.wang.googlepages.com/wang.semantics.pdf and http://nars.wang.googlepages.com/wang.AI_Misconceptions.pdf I mean NARS is symbolic in the sense that you write statements in Narsese like raven - bird 0.97, 0.92 (probability=0.97, confidence=0.92). I realize that the meanings of raven and bird are determined by their relations to other symbols in the knowledge base and that the probability and confidence change with experience. But in practice you are still going to write statements like this because it is the easiest way to build the knowledge base. You aren't going to specify the brightness of millions of pixels in a vision system in Narsese, and there is no mechanism I am aware of to collect this knowledge from a natural language text corpus. There is no mechanism to add new symbols to the knowledge base through experience. You have to explicitly add them. You have made this point on CPU power several times, and I'm still not convinced that the bottleneck of AI is hardware capacity. Also, there is no reason to believe an AGI must be designed in a biologically plausible way. Natural language has evolved to be learnable on a massively parallel network of slow computing elements. This should be apparent when we compare successful language models with unsuccessful ones. Artificial language models usually consist of tokenization, parsing, and semantic analysis phases. This does not work on natural language because artificial languages have precise specifications and natural languages do not. No two humans use exactly the same language, nor does the same human at two points in time. Rather, language is learnable by example, so that each message causes the language of the receiver to be a little more like that of the sender. Children learn semantics before syntax, which is the opposite order from which you would write an artificial language interpreter. An example of a successful language model is a search engine. We know that most of the meaning of a text document depends only on the words it contains, ignoring word order. A search engine matches the semantics of the query with the semantics of a document mostly by matching words, but also by matching semantically related words like water to wet. Here is an example of a computationally intensive but biologically plausible language model. A semantic model is a word-word matrix A such that A_ij is the degree to which words i and j are related, which you can think of as the probability of finding i and j together in a sliding window over a huge text corpus. However, semantic relatedness is a fuzzy identity relation, meaning it is reflexive, commutative, and transitive. If i is related to j and j to k, then i is related to k. Deriving transitive relations in A, also known as latent semantic analysis, is performed by singular value decomposition, factoring A = USV where S is diagonal, then discarding the small terms of S, which has the effect of lossy compression. Typically, A has about 10^6 elements and we keep only a few hundred elements of S. Fortunately there is a parallel algorithm that incrementally updates the matrices as the system learns: a 3 layer neural network where S is the hidden layer (which can grow) and U and V are weight matrices. [1]. Traditional language processing has failed because the task of converting natural language statements like ravens are birds to formal language is itself an AI problem. It requires humans who have already learned what ravens are and how to form and recognize grammatically correct sentences so they understand all of the hundreds of ways to express the same statement. You have to have human level understand of the logic to realize that ravens are coming doesn't mean ravens - coming. If you solve the translation problem, then you must have already solved the natural language problem. You can't take a shortcut directly to the knowledge base, tempting as it might be. You have to learn the language first, going through all the childhood stages. I would have hoped we have learned a lesson from Cyc. 1. Gorrell, Genevieve (2006), Generalized Hebbian Algorithm for Incremental Singular Value Decomposition in Natural Language Processing, Proceedings of EACL 2006, Trento, Italy. http://www.aclweb.org/anthology-new/E/E06/E06-1013.pdf -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: Language modeling (was Re: [agi] draft for comment)
On Fri, Sep 5, 2008 at 11:15 AM, Matt Mahoney [EMAIL PROTECTED] wrote: --- On Thu, 9/4/08, Pei Wang [EMAIL PROTECTED] wrote: I guess you still see NARS as using model-theoretic semantics, so you call it symbolic and contrast it with system with sensors. This is not correct --- see http://nars.wang.googlepages.com/wang.semantics.pdf and http://nars.wang.googlepages.com/wang.AI_Misconceptions.pdf I mean NARS is symbolic in the sense that you write statements in Narsese like raven - bird 0.97, 0.92 (probability=0.97, confidence=0.92). I realize that the meanings of raven and bird are determined by their relations to other symbols in the knowledge base and that the probability and confidence change with experience. But in practice you are still going to write statements like this because it is the easiest way to build the knowledge base. Yes. You aren't going to specify the brightness of millions of pixels in a vision system in Narsese, and there is no mechanism I am aware of to collect this knowledge from a natural language text corpus. Of course not. To have visual experience, there must be a devise to convert visual signals into internal representation in Narsese. I never suggested otherwise. There is no mechanism to add new symbols to the knowledge base through experience. You have to explicitly add them. New symbols either come from the outside in experience (experience can be verbal), or composed by the concept-formation rules from existing ones. The latter case is explained in my book. Natural language has evolved to be learnable on a massively parallel network of slow computing elements. This should be apparent when we compare successful language models with unsuccessful ones. Artificial language models usually consist of tokenization, parsing, and semantic analysis phases. This does not work on natural language because artificial languages have precise specifications and natural languages do not. It depends on which aspect of the language you talk about. Narsese has precise specifications in syntax, but the meaning of the terms is a function of experience, and change from time to time. No two humans use exactly the same language, nor does the same human at two points in time. Rather, language is learnable by example, so that each message causes the language of the receiver to be a little more like that of the sender. Same thing in NARS --- if two implementations of NARS have different experience, they will disagree on what is the meaning of a term. When they begin to learn natural language, it will also be true for grammar. Since I haven't done any concrete NLP yet, I don't expect you to believe me on the second point, but you cannot rule out that possibility just because no traditional system can do that. Children learn semantics before syntax, which is the opposite order from which you would write an artificial language interpreter. NARS indeed can learn semantics before syntax --- see http://nars.wang.googlepages.com/wang.roadmap.pdf I won't comment on the following detailed statements, since I agree with your criticism on the traditional processing of formal language, but that is not how NARS handles languages. Don't think NARS as another Cyc just because both use formal language. The same ravens are birds in these two systems are treated very differently in them. Pei An example of a successful language model is a search engine. We know that most of the meaning of a text document depends only on the words it contains, ignoring word order. A search engine matches the semantics of the query with the semantics of a document mostly by matching words, but also by matching semantically related words like water to wet. Here is an example of a computationally intensive but biologically plausible language model. A semantic model is a word-word matrix A such that A_ij is the degree to which words i and j are related, which you can think of as the probability of finding i and j together in a sliding window over a huge text corpus. However, semantic relatedness is a fuzzy identity relation, meaning it is reflexive, commutative, and transitive. If i is related to j and j to k, then i is related to k. Deriving transitive relations in A, also known as latent semantic analysis, is performed by singular value decomposition, factoring A = USV where S is diagonal, then discarding the small terms of S, which has the effect of lossy compression. Typically, A has about 10^6 elements and we keep only a few hundred elements of S. Fortunately there is a parallel algorithm that incrementally updates the matrices as the system learns: a 3 layer neural network where S is the hidden layer (which can grow) and U and V are weight matrices. [1]. Traditional language processing has failed because the task of converting natural language statements like ravens are birds to formal language is itself an AI problem. It
Re: Real vs. simulated environments (was Re: [agi] draft for comment.. P.S.)
Matt, FINALLY, someone here is saying some of the same things that I have been saying. With general agreement with your posting, I will make some comments... On 9/4/08, Matt Mahoney [EMAIL PROTECTED] wrote: --- On Thu, 9/4/08, Valentina Poletti [EMAIL PROTECTED] wrote: Ppl like Ben argue that the concept/engineering aspect of intelligence is independent of the type of environment. That is, given you understand how to make it in a virtual environment you can then tarnspose that concept into a real environment more safely. This is probably a good starting point, to avoid beating the world up during the debugging process. Some other ppl on the other hand believe intelligence is a property of humans only. Only people who haven't had a pet believe such things. I have seen too many animals find clever solutions to problems. So you have to simulate every detail about humans to get that intelligence. I'd say that among the two approaches the first one (Ben's) is safer and more realistic. The issue is not what is intelligence, but what do you want to create? In order for machines to do more work for us, they may need language and vision, which we associate with human intelligence. Not necessarily, as even text-interfaced knowledge engines can handily outperform humans in many complex problem solving tasks. The still open question is: What would best do what we need done but can NOT presently do (given computers, machinery, etc.). So far, the talk here on this forum has been about what we could do and how we might do it, rather than about what we NEED done. Right now, we NEED resources to work productively in the directions that we have been discussing, yet the combined intelligence of those here on this forum is apparently unable to solve even this seemingly trivial problem. Perhaps something more than raw intelligence is needed? But building artificial humans is not necessarily useful. We already know how to create humans, and we are doing so at an unsustainable rate. I suggest that instead of the imitation game (Turing test) for AI, we should use a preference test. If you prefer to talk to a machine vs. a human, then the machine passes the test. YES, like what is it that our AGI can do that we need done but can NOT presently do? Prediction is central to intelligence. If you can predict a text stream, then for any question Q and any answer A, you can compute the probability distribution P(A|Q) = P(QA)/P(Q). This passes the Turing test. More importantly, it allows you to output max_A P(QA), the most likely answer from a group of humans. This passes the preference test because a group is usually more accurate than any individual member. (It may fail a Turing test for giving too few wrong answers, a problem Turing was aware of in 1950 when he gave an example of a computer incorrectly answering an arithmetic problem). Unfortunately, this also tests the ability to incorporate the very misunderstandings that presently limit our thinking. We need to give credit for compression algorithms that cleans up our grammar, corrects our technical errors, etc., as these can probably be done in the process of better compressing the text. Text compression is equivalent to AI because we have already solved the coding problem. Given P(x) for string x, we know how to optimally and efficiently code x in log_2(1/P(x)) bits (e.g. arithmetic coding). Text compression has an advantage over the Turing or preference tests in that that incremental progress in modeling can be measured precisely and the test is repeatable and verifiable. If I want to test a text compressor, it is important to use real data (human generated text) rather than simulated data, i.e. text generated by a program. Otherwise, I know there is a concise code for the input data, which is the program that generated it. When you don't understand the source distribution (i.e. the human brain), the problem is much harder, and you have a legitimate test. Wouldn't it be better to understand the problem domain while ignoring human (mis)understandings? After all, if humans need an AGI to work in a difficult domain, it is probably made more difficult by incorporating human misunderstandings. Of course, humans state human problems, so it is important to be able to semantically communicate, but also useful to separate the communications from the problems. I understand that Ben is developing AI for virtual worlds. This might produce interesting results, but I wouldn't call it AGI. The value of AGI is on the order of US $1 quadrillion. It is a global economic system running on a smarter internet. I believe that any attempt to develop AGI on a budget of $1 million or $1 billion or $1 trillion is just wishful thinking. I think that a billion or so, divided up into small pieces to fund EVERY disparate approach to see where the low hanging fruit is, would go a LONG way in guiding subsequent billions. I doubt that it would
Re: Language modeling (was Re: [agi] draft for comment)
--- On Fri, 9/5/08, Pei Wang [EMAIL PROTECTED] wrote: NARS indeed can learn semantics before syntax --- see http://nars.wang.googlepages.com/wang.roadmap.pdf Yes, I see this corrects many of the problems with Cyc and with traditional language models. I didn't see a description of a mechanism for learning new terms in your other paper. Clearly this could be added, although I believe it should be a statistical process. I am interested in determining the computational cost of language modeling. The evidence I have so far is that it is high. I believe the algorithmic complexity of a model is 10^9 bits. This is consistent with Turing's 1950 prediction that AI would require this much memory, with Landauer's estimate of human long term memory, and is about how much language a person processes by adulthood assuming an information content of 1 bit per character as Shannon estimated in 1950. This is why I use a 1 GB data set in my compression benchmark. However there is a 3 way tradeoff between CPU speed, memory, and model accuracy (as measured by compression ratio). I added two graphs to my benchmark at http://cs.fit.edu/~mmahoney/compression/text.html (below the main table) which shows this clearly. In particular the size-memory tradeoff is an almost perfectly straight line (with memory on a log scale) over tests of 104 compressors. These tests suggest to me that CPU and memory are indeed bottlenecks to language modeling. The best models in my tests use simple semantic and grammatical models, well below adult human level. The 3 top programs on the memory graph map words to tokens using dictionaries that group semantically and syntactically related words together, but only one (paq8hp12any) uses a semantic space of more than one dimension. All have large vocabularies, although not implausibly large for an educated person. Other top programs like nanozipltcb and WinRK use smaller dictionaries and strictly lexical models. Lesser programs model only at the n-gram level. I don't yet have an answer to my question, but I believe efficient human-level NLP will require hundreds of GB or perhaps 1 TB of memory. The slowest programs are already faster than real time, given that equivalent learning in humans would take over a decade. I think you could use existing hardware in a speed-memory tradeoff to get real time NLP, but it would not be practical for doing experiments where each source code change requires training the model from scratch. Model development typically requires thousands of tests. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: Language modeling (was Re: [agi] draft for comment)
On Fri, Sep 5, 2008 at 6:15 PM, Matt Mahoney [EMAIL PROTECTED] wrote: --- On Fri, 9/5/08, Pei Wang [EMAIL PROTECTED] wrote: NARS indeed can learn semantics before syntax --- see http://nars.wang.googlepages.com/wang.roadmap.pdf Yes, I see this corrects many of the problems with Cyc and with traditional language models. I didn't see a description of a mechanism for learning new terms in your other paper. Clearly this could be added, although I believe it should be a statistical process. I don't have a separate paper on term composition, so you'd have to read my book. It is indeed a statistical process, in the sense that most of the composed terms won't be useful, so will be forgot gradually. Only the useful patterns will be kept for long time in the form of compound terms. I am interested in determining the computational cost of language modeling. The evidence I have so far is that it is high. I believe the algorithmic complexity of a model is 10^9 bits. This is consistent with Turing's 1950 prediction that AI would require this much memory, with Landauer's estimate of human long term memory, and is about how much language a person processes by adulthood assuming an information content of 1 bit per character as Shannon estimated in 1950. This is why I use a 1 GB data set in my compression benchmark. I see your point, though I think to analyze this problem in terms of computational complexity is not the correct way to go, because this process does not follow a predetermined algorithm. Instead, language learning is an incremental process, without a well-defined beginning and ending. However there is a 3 way tradeoff between CPU speed, memory, and model accuracy (as measured by compression ratio). I added two graphs to my benchmark at http://cs.fit.edu/~mmahoney/compression/text.html (below the main table) which shows this clearly. In particular the size-memory tradeoff is an almost perfectly straight line (with memory on a log scale) over tests of 104 compressors. These tests suggest to me that CPU and memory are indeed bottlenecks to language modeling. The best models in my tests use simple semantic and grammatical models, well below adult human level. The 3 top programs on the memory graph map words to tokens using dictionaries that group semantically and syntactically related words together, but only one (paq8hp12any) uses a semantic space of more than one dimension. All have large vocabularies, although not implausibly large for an educated person. Other top programs like nanozipltcb and WinRK use smaller dictionaries and strictly lexical models. Lesser programs model only at the n-gram level. Like to many existing AI works, my disagreement with you is not that much on the solution you proposed (I can see the value), but on the problem you specified as the goal of AI. For example, I have no doubt about the theoretical and practical values of compression, but don't think it has much to do with intelligence. I don't think this kind of issue can be efficient handled by email discussion like this one. I've been thinking about to write a paper to compare my ideas with the ideas represented by AIXI, which is closely related to yours, though this project hasn't got enough priority in my to-do list. Hopefully I'll find the time to make myself clear on this topic. I don't yet have an answer to my question, but I believe efficient human-level NLP will require hundreds of GB or perhaps 1 TB of memory. The slowest programs are already faster than real time, given that equivalent learning in humans would take over a decade. I think you could use existing hardware in a speed-memory tradeoff to get real time NLP, but it would not be practical for doing experiments where each source code change requires training the model from scratch. Model development typically requires thousands of tests. I guess we are exploring very different paths in NLP, and now it is too early to tell which one will do better. Pei --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: Language modeling (was Re: [agi] draft for comment)
--- On Fri, 9/5/08, Pei Wang [EMAIL PROTECTED] wrote: Like to many existing AI works, my disagreement with you is not that much on the solution you proposed (I can see the value), but on the problem you specified as the goal of AI. For example, I have no doubt about the theoretical and practical values of compression, but don't think it has much to do with intelligence. In http://cs.fit.edu/~mmahoney/compression/rationale.html I explain why text compression is an AI problem. To summarize, if you know the probability distribution of text, then you can compute P(A|Q) for any question Q and answer A to pass the Turing test. Compression allows you to precisely measure the accuracy of your estimate of P. Compression (actually, word perplexity) has been used since the early 1990's to measure the quality of language models for speech recognition, since it correlates well with word error rate. The purpose of this work is not to solve general intelligence, such as the universal intelligence proposed by Legg and Hutter [1]. That is not computable, so you have to make some arbitrary choice with regard to test environments about what problems you are going to solve. I believe the goal of AGI should be to do useful work for humans, so I am making a not so arbitrary choice to solve a problem that is central to what most people regard as useful intelligence. I had hoped that my work would lead to an elegant theory of AI, but that hasn't been the case. Rather, the best compression programs were developed as a series of thousands of hacks and tweaks, e.g. change a 4 to a 5 because it gives 0.002% better compression on the benchmark. The result is an opaque mess. I guess I should have seen it coming, since it is predicted by information theory (e.g. [2]). Nevertheless the architectures of the best text compressors are consistent with cognitive development models, i.e. phoneme (or letter) sequences - lexical - semantics - syntax, which are themselves consistent with layered neural architectures. I already described a neural semantic model in my last post. I also did work supporting Hutchens and Alder showing that lexical models can be learned from n-gram statistics, consistent with the observation that babies learn the rules for segmenting continuous speech before they learn any words [3]. I agree it should also be clear that semantics is learned before grammar, contrary to the way artificial languages are processed. Grammar requires semantics, but not the other way around. Search engines work using semantics only. Yet we cannot parse sentences like I ate pizza with Bob, I ate pizza with pepperoni, I ate pizza with chopsticks, without semantics. My benchmark does not prove that there aren't better language models, but it is strong evidence. It represents the work of about 100 researchers who have tried and failed to find more accurate, faster, or less memory intensive models. The resource requirements seem to increase as we go up the chain from n-grams to grammar, contrary to symbolic approaches. This is my argument why I think AI is bound by lack of hardware, not lack of theory. 1. Legg, Shane, and Marcus Hutter (2006), A Formal Measure of Machine Intelligence, Proc. Annual machine learning conference of Belgium and The Netherlands (Benelearn-2006). Ghent, 2006. http://www.vetta.org/documents/ui_benelearn.pdf 2. Legg, Shane, (2006), Is There an Elegant Universal Theory of Prediction?, Technical Report IDSIA-12-06, IDSIA / USI-SUPSI, Dalle Molle Institute for Artificial Intelligence, Galleria 2, 6928 Manno, Switzerland. http://www.vetta.org/documents/IDSIA-12-06-1.pdf 3. M. Mahoney (2000), A Note on Lexical Acquisition in Text without Spaces, http://cs.fit.edu/~mmahoney/dissertation/lex1.html -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
AI isn't cheap (was Re: Real vs. simulated environments (was Re: [agi] draft for comment.. P.S.))
--- On Fri, 9/5/08, Steve Richfield [EMAIL PROTECTED] wrote: I think that a billion or so, divided up into small pieces to fund EVERY disparate approach to see where the low hanging fruit is, would go a LONG way in guiding subsequent billions. I doubt that it would take a trillion to succeed. Sorry, the low hanging fruit was all picked by the early 1960's. By then we had neural networks [1,6,7,11,12], natural language processing and language translation [2], models of human decision making [3], automatic theorem proving [4,8,10], natural language databases [5], game playing programs [9,13], optical character recognition [14], handwriting and speech recognition [15], and important theoretical work [16,17,18]. Since then we have had mostly just incremental improvements. Big companies like Google and Microsoft have strong incentives to develop AI and have billions to spend. Maybe the problem really is hard. References 1. Ashby, W. Ross (1960), Design for a Brain, 2’nd Ed., London: Wiley. Describes a 4 neuron electromechanical neural network. 2. Borko, Harold (1967), Automated Language Processing, The State of the Art, New York: Wiley. Cites 72 NLP systems prior to 1965, and the 1959-61 U.S. government Russian-English translation project. 3. Feldman, Julian (1961), Simulation of Behavior in the Binary Choice Experiment, Proceedings of the Western Joint Computer Conference 19:133-144 4. Gelernter, H. (1959), Realization of a Geometry-Theorem Proving Machine, Proceedings of an International Conference on Information Processing, Paris: UNESCO House, pp. 273-282. 5. Green, Bert F. Jr., Alice K. Wolf, Carol Chomsky, and Kenneth Laughery (1961), Baseball: An Automatic Question Answerer, Proceedings of the Western Joint Computer Conference, 19:219-224. 6. Hebb, D. O. (1949), The Organization of Behavior, New York: Wiley. Proposed the first model of learning in neurons: when two neurons fire simultaneously, the synapse between them becomes stimulating. 7. McCulloch, Warren S., and Walter Pitts (1943), A logical calculus of the ideas immanent in nervous activity, Buletin of Mathematical Biophysics (5) pp. 115-133. 8. Newell, Allen, J. C. Shaw, H. A. Simon (1957), Empirical Explorations with the Logic Theory Machine: A Case Study in Heuristics, Proceedings of the Western Joint Computer Conference, 15:218-239. 9. Newell, Allen, J. C. Shaw, and H. A. Simon (1958), Chess-Playing Programs and the Problem of Complexity, IBM Journal of Research and Development, 2:320-335. 10. Newell, Allen, H. A. Simon (1961), GPS: A Program that Simulates Human Thought, Lernende Automaten, Munich: R. Oldenbourg KG. 11. Rochester, N., J. J. Holland, L. H. Haibt, and Wl L. Duda (1956), Tests on a cell assembly theory of the action of the brain, using a large digital computer, IRE Transactions on Information Theory IT-2: pp. 80-93. 12. Rosenblatt, F. (1958), The perceptron: a probabilistic model for information storage and organization in the brain, Psychological Review (65) pp. 386-408. 13. Samuel, A. L. (1959), Some Studies in Machine Learning using the Game of Checkers, IBM Journal of Research and Development, 3:211-229. 14. Selfridge, Oliver G., Ulric Neisser (1960), Pattern Recognition by Machine, Scientific American, Aug., 203:60-68. 15. Uhr, Leonard, Charles Vossler (1963) A Pattern-Recognition Program that Generates, Evaluates, and Adjusts its own Operators, Computers and Thought, E. A. Feigenbaum and J. Feldman eds, New York: McGraw Hill, pp. 251-268. 16. Turing, A. M., (1950) Computing Machinery and Intelligence, Mind, 59:433-460. 17. Shannon, Claude, and Warren Weaver (1949), The Mathematical Theory of Communication, Urbana: University of Illinois Press. 18. Minsky, Marvin (1961), Steps toward Artificial Intelligence, Proceedings of the Institute of Radio Engineers, 49:8-30. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: Language modeling (was Re: [agi] draft for comment)
Matt, Thanks for taking the time to explain your ideas in detail. As I said, our different opinions on how to do AI come from our very different understanding of intelligence. I don't take passing Turing Test as my research goal (as explained in http://nars.wang.googlepages.com/wang.logic_intelligence.pdf and http://nars.wang.googlepages.com/wang.AI_Definitions.pdf). I disagree with Hutter's approach, not because his SOLUTION is not computable, but because his PROBLEM is too idealized and simplified to be relevant to the actual problems of AI. Even so, I'm glad that we can still agree on somethings, like semantics comes before syntax. In my plan for NLP, there won't be separate 'parsing' and 'semantic mapping' stages. I'll say more when I have concrete results to share. Pei On Fri, Sep 5, 2008 at 8:39 PM, Matt Mahoney [EMAIL PROTECTED] wrote: --- On Fri, 9/5/08, Pei Wang [EMAIL PROTECTED] wrote: Like to many existing AI works, my disagreement with you is not that much on the solution you proposed (I can see the value), but on the problem you specified as the goal of AI. For example, I have no doubt about the theoretical and practical values of compression, but don't think it has much to do with intelligence. In http://cs.fit.edu/~mmahoney/compression/rationale.html I explain why text compression is an AI problem. To summarize, if you know the probability distribution of text, then you can compute P(A|Q) for any question Q and answer A to pass the Turing test. Compression allows you to precisely measure the accuracy of your estimate of P. Compression (actually, word perplexity) has been used since the early 1990's to measure the quality of language models for speech recognition, since it correlates well with word error rate. The purpose of this work is not to solve general intelligence, such as the universal intelligence proposed by Legg and Hutter [1]. That is not computable, so you have to make some arbitrary choice with regard to test environments about what problems you are going to solve. I believe the goal of AGI should be to do useful work for humans, so I am making a not so arbitrary choice to solve a problem that is central to what most people regard as useful intelligence. I had hoped that my work would lead to an elegant theory of AI, but that hasn't been the case. Rather, the best compression programs were developed as a series of thousands of hacks and tweaks, e.g. change a 4 to a 5 because it gives 0.002% better compression on the benchmark. The result is an opaque mess. I guess I should have seen it coming, since it is predicted by information theory (e.g. [2]). Nevertheless the architectures of the best text compressors are consistent with cognitive development models, i.e. phoneme (or letter) sequences - lexical - semantics - syntax, which are themselves consistent with layered neural architectures. I already described a neural semantic model in my last post. I also did work supporting Hutchens and Alder showing that lexical models can be learned from n-gram statistics, consistent with the observation that babies learn the rules for segmenting continuous speech before they learn any words [3]. I agree it should also be clear that semantics is learned before grammar, contrary to the way artificial languages are processed. Grammar requires semantics, but not the other way around. Search engines work using semantics only. Yet we cannot parse sentences like I ate pizza with Bob, I ate pizza with pepperoni, I ate pizza with chopsticks, without semantics. My benchmark does not prove that there aren't better language models, but it is strong evidence. It represents the work of about 100 researchers who have tried and failed to find more accurate, faster, or less memory intensive models. The resource requirements seem to increase as we go up the chain from n-grams to grammar, contrary to symbolic approaches. This is my argument why I think AI is bound by lack of hardware, not lack of theory. 1. Legg, Shane, and Marcus Hutter (2006), A Formal Measure of Machine Intelligence, Proc. Annual machine learning conference of Belgium and The Netherlands (Benelearn-2006). Ghent, 2006. http://www.vetta.org/documents/ui_benelearn.pdf 2. Legg, Shane, (2006), Is There an Elegant Universal Theory of Prediction?, Technical Report IDSIA-12-06, IDSIA / USI-SUPSI, Dalle Molle Institute for Artificial Intelligence, Galleria 2, 6928 Manno, Switzerland. http://www.vetta.org/documents/IDSIA-12-06-1.pdf 3. M. Mahoney (2000), A Note on Lexical Acquisition in Text without Spaces, http://cs.fit.edu/~mmahoney/dissertation/lex1.html -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by
Re: [agi] draft for comment
Hi, What I think is that the set of patterns in perceptual and motoric data has radically different statistical properties than the set of patterns in linguistic and mathematical data ... and that the properties of the set of patterns in perceptual and motoric data is intrinsically better suited to the needs of a young, ignorant, developing mind. Sure it is. Systems with different sensory channels will never fully understand each other. I'm not saying that one channel (verbal) can replace another (visual), but that both of them (and many others) can give symbol/representation/concept/pattern/whatever-you-call-it meaning. No on is more real than others. True, but some channels may -- due to the statistical properties of the data coming across them -- be more conducive to the development of AGI than others... All these different domains of pattern display what I've called a dual network structure ... a collection of hierarchies (of progressively more and more complex, hierarchically nested patterns) overlayed with a heterarchy (of overlapping, interrelated patterns). But the statistics of the dual networks in the different domains is different. I haven't fully plumbed the difference yet ... but, among the many differences is that in perceptual/motoric domains, you have a very richly connected dual network at a very low level of the overall dual network hierarchy -- i.e., there's a richly connected web of relatively simple stuff to understand ... and then these simple things are related to (hence useful for learning) the more complex things, etc. True, but can you say that the relations among words, or concepts, are simpler? I think the set of relations among words (considered in isolation, without their referents) is less rich than the set of relations among perceptions of a complex world, and far less rich than the set of relations among {perceptions of a complex world, plus words referring to these perceptions} And I think that this lesser richness makes sequences of words a much worse input stream for a developing AGI I realize that quantifying less rich in the above is a significant challenge, but I'm presenting my intuition anyway... Also, relatedly and just as critically, the set of perceptions regarding the body and its interactions with the environment, are well-structured to give the mind a sense of its own self. This primitive infantile sense of body-self gives rise to the more sophisticated phenomenal self of the child and adult mind, which gives rise to reflective consciousness, the feeling of will, and other characteristic structures of humanlike general intelligence. A stream of words doesn't seem to give an AI the same kind of opportunity for self-development In this short paper, I make no attempt to settle all issues, but just to point out a simple fact --- a laptop has a body, and is not less embodied than Roomba or Mindstorms --- that seems have been ignored in the previous discussion. I agree with your point, but I wonder if it's partially a straw man argument. The proponents of embodiment as a key aspect of AGI don't of course think that Cyc is disembodied in a maximally strong sense -- they know it interacts with the world via physical means. What they mean by embodied is something different. I don't have the details at my finger tips, but I know that Maturana, Varela and Eleanor Rosch took some serious pains to carefully specify the sense in which they feel embodiment is critical to intelligence, and to distinguish their sense of embodiment from the trivial sense of communicating via physical signals. I suggest your paper should probably include a careful response to the characterization of embodiment presented in http://www.*amazon*.com/*Embodied*-*Mind* -Cognitive-Science-Experience/dp/0262720213 I note that I do not agree with the arguments of Varela, Rosch, Brooks, etc. I just think their characterization of embodiment is an interesting and nontrivial one, and I'm not sure NARS with a text stream as input would be embodied according to their definition... -- Ben --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
Also, relatedly and just as critically, the set of perceptions regarding the body and its interactions with the environment, are well-structured to give the mind a sense of its own self. This primitive infantile sense of body-self gives rise to the more sophisticated phenomenal self of the child and adult mind, which gives rise to reflective consciousness, the feeling of will, and other characteristic structures of humanlike general intelligence. A stream of words doesn't seem to give an AI the same kind of opportunity for self-development To put it perhaps more clearly: I think that a standard laptop is too lacking in -- proprioceptive perception -- perception of its own relationship to other entities in the world around it to form a physical self-image based on its perceptions ... hence a standard laptop will not likely be driven by its experience to develop a phenomenal self ... hence, I suspect, no generally intelligent mind... -- Ben G --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment.. P.S.
That's if you aim at getting an AGI that is intelligent in the real world. I think some people on this list (incl Ben perhaps) might argue that for now - for safety purposes but also due to costs - it might be better to build an AGI that is intelligent in a simulated environment. Ppl like Ben argue that the concept/engineering aspect of intelligence is *independent of the type of environment*. That is, given you understand how to make it in a virtual environment you can then tarnspose that concept into a real environment more safely. Some other ppl on the other hand believe intelligence is a property of humans only. So you have to simulate every detail about humans to get that intelligence. I'd say that among the two approaches the first one (Ben's) is safer and more realistic. I am more concerned with the physics aspect of the whole issue I guess. --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
On Thu, Sep 4, 2008 at 2:10 AM, Ben Goertzel [EMAIL PROTECTED] wrote: Sure it is. Systems with different sensory channels will never fully understand each other. I'm not saying that one channel (verbal) can replace another (visual), but that both of them (and many others) can give symbol/representation/concept/pattern/whatever-you-call-it meaning. No on is more real than others. True, but some channels may -- due to the statistical properties of the data coming across them -- be more conducive to the development of AGI than others... I haven't seen any evidence for that. For human intelligence, maybe, but for intelligence in general, I doubt it. I think the set of relations among words (considered in isolation, without their referents) is less rich than the set of relations among perceptions of a complex world, and far less rich than the set of relations among {perceptions of a complex world, plus words referring to these perceptions} Not necessarily. Actually some people may even make the opposite argument: relations among non-linguistic components in experience are basically temporal or spatial, while the relations among words and concepts have much more types. I won't go that far, but I guess in some sense all channels may have the same (potential) richness. And I think that this lesser richness makes sequences of words a much worse input stream for a developing AGI I realize that quantifying less rich in the above is a significant challenge, but I'm presenting my intuition anyway... If your condition is true, then your conclusion follows, but the problem is in that IF. Also, relatedly and just as critically, the set of perceptions regarding the body and its interactions with the environment, are well-structured to give the mind a sense of its own self. We can say the same for every input/out operation set of an intelligent system. SELF is defined by what the system can feel and do. This primitive infantile sense of body-self gives rise to the more sophisticated phenomenal self of the child and adult mind, which gives rise to reflective consciousness, the feeling of will, and other characteristic structures of humanlike general intelligence. Agree. A stream of words doesn't seem to give an AI the same kind of opportunity for self-development If the system just sits there and passively accept whatever words come into it, what you said is true. If the incoming words is causally related to its outgoing words, will you still say that? I agree with your point, but I wonder if it's partially a straw man argument. If you read Brooks or Pfeifer, you'll see that most of their arguments are explicitly or implicitly based on the myth that only a robot has a body, have real sensor, live in a real world, ... The proponents of embodiment as a key aspect of AGI don't of course think that Cyc is disembodied in a maximally strong sense -- they know it interacts with the world via physical means. What they mean by embodied is something different. Whether a system is embodied does not depends on hardware, but on semantics. I don't have the details at my finger tips, but I know that Maturana, Varela and Eleanor Rosch took some serious pains to carefully specify the sense in which they feel embodiment is critical to intelligence, and to distinguish their sense of embodiment from the trivial sense of communicating via physical signals. That is different. The embodiment school in CogSci doesn't focus on body (they know every human already has one), but on experience. However, they have their misconception about AI. As I mentioned, Barsalou and Lakoff both thought strong AI is unlikely because computer cannot have human experience --- I agree what they said except their narrow conception of intelligence (CogSci people tend to take intelligence as human intelligence). I suggest your paper should probably include a careful response to the characterization of embodiment presented in http://www.amazon.com/Embodied-Mind-Cognitive-Science-Experience/dp/0262720213 I note that I do not agree with the arguments of Varela, Rosch, Brooks, etc. I just think their characterization of embodiment is an interesting and nontrivial one, and I'm not sure NARS with a text stream as input would be embodied according to their definition... If I got the time (and motivation) to extend the paper into a journal paper, I'll double the length by discussing embodiment in CogSci. In the current version, as a short conference paper, I'd rather focus on embodiment in AI, and only attack the robot myth. Pei --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
On Thu, Sep 4, 2008 at 2:12 AM, Ben Goertzel [EMAIL PROTECTED] wrote: Also, relatedly and just as critically, the set of perceptions regarding the body and its interactions with the environment, are well-structured to give the mind a sense of its own self. This primitive infantile sense of body-self gives rise to the more sophisticated phenomenal self of the child and adult mind, which gives rise to reflective consciousness, the feeling of will, and other characteristic structures of humanlike general intelligence. A stream of words doesn't seem to give an AI the same kind of opportunity for self-development To put it perhaps more clearly: I think that a standard laptop is too lacking in -- proprioceptive perception -- perception of its own relationship to other entities in the world around it Obviously you didn't consider the potential a laptop has with its network connection, which in theory can give it all kinds of perception by connecting it to some input/output device. Even if we exclude network, your conclusion is still problematic. Why a touchpad cannot provide proprioceptive perception? I agree it usually doesn't, because the way it is used, but that doesn't mean it cannot, under all possible usage. The same is true for keyboard. The current limitation of the standard computer is more in the way we use them than in the hardware itself. to form a physical self-image based on its perceptions ... hence a standard laptop will not likely be driven by its experience to develop a phenomenal self ... hence, I suspect, no generally intelligent mind... Of course it won't have a visual concept of self, but a system like NARS has the potential to grow into an intelligent operating system, with a notion of self based on what it can feel and do, as well as the causal relations among them --- If there is a file in this folder, then I should have felt it, it cannot be there because I've deleted the contents. I know some people won't agree there is a self in such a system, because it doesn't look like themselves. Too bad human intelligence is the only known example of intelligence ... Pei --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
I agree with Pei in that a robot's experience is not necessarily more real than that of a, say, web-embedded agent - if anything it is closer to the * human* experience of the world. But who knows how limited our own sensory experience is anyhow. Perhaps a better intelligence would comprehend the world better through a different emboyment. However, could you guys be more specific regarding the statistical differences of different types of data? What kind of differences are you talking about specifically (mathematically)? And what about the differences at the various levels of the dual-hierarchy? Has any of your work or research suggested this hypothesis, if so which? --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
Obviously you didn't consider the potential a laptop has with its network connection, which in theory can give it all kinds of perception by connecting it to some input/output device. yes, that's true ... I was considering the laptop w/ only a power cable as the AI system in question. Of course my point does not apply to a laptop that's being used as an on-board control system for an android robot, or a laptop that's connected to a network of sensors and actuators via the net, etc. Sorry I did not clarify my terms better! Similarly the human brain lacks much proprioception and control in isolation, and probably would not be able to achieve a high level of general intelligence without the right peripherals (such as the rest of the human body ;-) Even if we exclude network, your conclusion is still problematic. Why a touchpad cannot provide proprioceptive perception? I agree it usually doesn't, because the way it is used, but that doesn't mean it cannot, under all possible usage. The same is true for keyboard. The current limitation of the standard computer is more in the way we use them than in the hardware itself. I understand that a keyboard and touchpad do provide proprioceptive input, but I think it's too feeble, and too insensitively respondent to changes in the environment and the relation btw the laptop and the environment, to serve as the foundation for a robust self-model or a powerful general intelligence. to form a physical self-image based on its perceptions ... hence a standard laptop will not likely be driven by its experience to develop a phenomenal self ... hence, I suspect, no generally intelligent mind... Of course it won't have a visual concept of self, but a system like NARS has the potential to grow into an intelligent operating system, with a notion of self based on what it can feel and do, as well as the causal relations among them --- If there is a file in this folder, then I should have felt it, it cannot be there because I've deleted the contents. My suggestion is that the file system lacks the complexity of structure and dynamics to support the emergence of a robust self-model, and powerful general intelligence... Not in principle ... potentially a file system *could* display the needed complexity, but I don't think any file systems on laptops now come close... Whether the Internet as a whole contains the requisite complexity is a subtler question. I know some people won't agree there is a self in such a system, because it doesn't look like themselves. Too bad human intelligence is the only known example of intelligence ... I would call a self any internal, explicit model that a system creates that allows it to predict its own behaviors in a sufficient variety of contexts This need not have a visual aspect nor a great similarity to a human self. -- Ben --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
Hi Pei, I think your point is correct that the notion of embodiment presented by Brooks and some other roboticists is naive. I'm not sure whether their actual conceptions are naive, or whether they just aren't presenting their foundational philosophical ideas clearly in their writings (being ultimately more engineering-oriented people, and probably not that accustomed to the philosophical style of discourse in which these sorts of definitional distinctions need to be more precisely drawn). I do think (in approximate concurrence with your paper) that ANY control system physically embodied in a physical system S, that has an input and output stream, and whose input and output stream possess correlation with the physical state of S, should be considered as psychologically embodied. Clearly, whether it's a robot or a laptop (w/o network connection if you like), such a system has the basic property of embodiment. Furthermore S doesn't need to be a physical system ... it could be a virtual system inside some virtual world (and then there's the question of what properties characterize a valid virtual world ... but let's leave that for another email thread...) However, I think that not all psychologically-embodied systems possess a sufficiently rich psychological-embodiment to lead to significantly general intelligence My suggestion is that a laptop w/o network connection or odd sensor-peripherals, probably does not have sufficiently rich correlations btw its I/O stream and its physical state, to allow it to develop a robust self-model of its physical self (which can then be used as a basis for a more general phenomenal self). I think that Varela and crew understood the value of this rich network of correlations, but mistakenly assumed it to be a unique property of biological systems... I realize that the points you made in your paper do not contradict the suggestions I've made in this email. I don't think anything significant in your paper is wrong, actually. It just seems to me not to address the most interesting aspects of the embodiment issue as related to AGI. -- Ben G On Thu, Sep 4, 2008 at 7:06 AM, Pei Wang [EMAIL PROTECTED] wrote: On Thu, Sep 4, 2008 at 2:10 AM, Ben Goertzel [EMAIL PROTECTED] wrote: Sure it is. Systems with different sensory channels will never fully understand each other. I'm not saying that one channel (verbal) can replace another (visual), but that both of them (and many others) can give symbol/representation/concept/pattern/whatever-you-call-it meaning. No on is more real than others. True, but some channels may -- due to the statistical properties of the data coming across them -- be more conducive to the development of AGI than others... I haven't seen any evidence for that. For human intelligence, maybe, but for intelligence in general, I doubt it. I think the set of relations among words (considered in isolation, without their referents) is less rich than the set of relations among perceptions of a complex world, and far less rich than the set of relations among {perceptions of a complex world, plus words referring to these perceptions} Not necessarily. Actually some people may even make the opposite argument: relations among non-linguistic components in experience are basically temporal or spatial, while the relations among words and concepts have much more types. I won't go that far, but I guess in some sense all channels may have the same (potential) richness. And I think that this lesser richness makes sequences of words a much worse input stream for a developing AGI I realize that quantifying less rich in the above is a significant challenge, but I'm presenting my intuition anyway... If your condition is true, then your conclusion follows, but the problem is in that IF. Also, relatedly and just as critically, the set of perceptions regarding the body and its interactions with the environment, are well-structured to give the mind a sense of its own self. We can say the same for every input/out operation set of an intelligent system. SELF is defined by what the system can feel and do. This primitive infantile sense of body-self gives rise to the more sophisticated phenomenal self of the child and adult mind, which gives rise to reflective consciousness, the feeling of will, and other characteristic structures of humanlike general intelligence. Agree. A stream of words doesn't seem to give an AI the same kind of opportunity for self-development If the system just sits there and passively accept whatever words come into it, what you said is true. If the incoming words is causally related to its outgoing words, will you still say that? I agree with your point, but I wonder if it's partially a straw man argument. If you read Brooks or Pfeifer, you'll see that most of their arguments are explicitly or implicitly based on the myth that only a robot has a
Re: [agi] draft for comment
However, could you guys be more specific regarding the statistical differences of different types of data? What kind of differences are you talking about specifically (mathematically)? And what about the differences at the various levels of the dual-hierarchy? Has any of your work or research suggested this hypothesis, if so which? Sorry I've been fuzzy on this ... I'm engaging in this email conversation in odd moments while at a conference (Virtual Worlds 2008, in Los Angeles...) Specifically I think that patterns interrelating the I/O stream of system S with the relation between the system S's embodiment and its environment, are important. It is these patterns that let S build a self-model of its physical embodiment, which then leads S to a more abstract self-model (aka Metzinger's phenomenal self) Considering patterns in the above category, it seems critical to have a rich variety of patterns at varying levels of complexity... so that the patterns at complexity level L are largely approximable as compositions of patterns at complexity less than L. This way a mind can incrementally build up its self-model via recognizing slightly complex self-related patterns, then acting based on these patterns, then recognizing somewhat more complex self-related patterns involving its recent actions, and so forth. It seems that a human body's sensors and actuators are suited to create and recognize patterns of the above sort whereas the sensors and actuators of a laptop w/o network cables or odd peripherals are not... -- Ben G --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
On 9/4/08, Ben Goertzel [EMAIL PROTECTED] wrote: However, could you guys be more specific regarding the statistical differences of different types of data? What kind of differences are you talking about specifically (mathematically)? And what about the differences at the various levels of the dual-hierarchy? Has any of your work or research suggested this hypothesis, if so which? Sorry I've been fuzzy on this ... I'm engaging in this email conversation in odd moments while at a conference (Virtual Worlds 2008, in Los Angeles...) Specifically I think that patterns interrelating the I/O stream of system S with the relation between the system S's embodiment and its environment, are important. It is these patterns that let S build a self-model of its physical embodiment, which then leads S to a more abstract self-model (aka Metzinger's phenomenal self) So in short you are saying that the main difference between I/O data by a motor embodyed system (such as robot or human) and a laptop is the ability to interact with the data: make changes in its environment to systematically change the input? Considering patterns in the above category, it seems critical to have a rich variety of patterns at varying levels of complexity... so that the patterns at complexity level L are largely approximable as compositions of patterns at complexity less than L. This way a mind can incrementally build up its self-model via recognizing slightly complex self-related patterns, then acting based on these patterns, then recognizing somewhat more complex self-related patterns involving its recent actions, and so forth. Definitely. It seems that a human body's sensors and actuators are suited to create and recognize patterns of the above sort whereas the sensors and actuators of a laptop w/o network cables or odd peripherals are not... Agree. --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
So in short you are saying that the main difference between I/O data by a motor embodyed system (such as robot or human) and a laptop is the ability to interact with the data: make changes in its environment to systematically change the input? Not quite ... but, to interact w/ the data in a way that gives rise to a hierarchy of nested, progressively more complex patterns that correlate the system and its environment (and that the system can recognize and act upon) ben --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
Hi Ben, You may have stated this explicitly in the past, but I just want to clarify - you seem to be suggesting that a phenomenological self is important if not critical to the actualization of general intelligence. Is this your belief, and if so, can you provide a brief justification of that? (I happen to believe this myself.. just trying to understand your philosophy better.) Terren --- On Thu, 9/4/08, Ben Goertzel [EMAIL PROTECTED] wrote: However, I think that not all psychologically-embodied systems possess a sufficiently rich psychological-embodiment to lead to significantly general intelligence My suggestion is that a laptop w/o network connection or odd sensor-peripherals, probably does not have sufficiently rich correlations btw its I/O stream and its physical state, to allow it to develop a robust self-model of its physical self (which can then be used as a basis for a more general phenomenal self). --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Real vs. simulated environments (was Re: [agi] draft for comment.. P.S.)
--- On Thu, 9/4/08, Valentina Poletti [EMAIL PROTECTED] wrote: Ppl like Ben argue that the concept/engineering aspect of intelligence is independent of the type of environment. That is, given you understand how to make it in a virtual environment you can then tarnspose that concept into a real environment more safely. Some other ppl on the other hand believe intelligence is a property of humans only. So you have to simulate every detail about humans to get that intelligence. I'd say that among the two approaches the first one (Ben's) is safer and more realistic. The issue is not what is intelligence, but what do you want to create? In order for machines to do more work for us, they may need language and vision, which we associate with human intelligence. But building artificial humans is not necessarily useful. We already know how to create humans, and we are doing so at an unsustainable rate. I suggest that instead of the imitation game (Turing test) for AI, we should use a preference test. If you prefer to talk to a machine vs. a human, then the machine passes the test. Prediction is central to intelligence. If you can predict a text stream, then for any question Q and any answer A, you can compute the probability distribution P(A|Q) = P(QA)/P(Q). This passes the Turing test. More importantly, it allows you to output max_A P(QA), the most likely answer from a group of humans. This passes the preference test because a group is usually more accurate than any individual member. (It may fail a Turing test for giving too few wrong answers, a problem Turing was aware of in 1950 when he gave an example of a computer incorrectly answering an arithmetic problem). Text compression is equivalent to AI because we have already solved the coding problem. Given P(x) for string x, we know how to optimally and efficiently code x in log_2(1/P(x)) bits (e.g. arithmetic coding). Text compression has an advantage over the Turing or preference tests in that that incremental progress in modeling can be measured precisely and the test is repeatable and verifiable. If I want to test a text compressor, it is important to use real data (human generated text) rather than simulated data, i.e. text generated by a program. Otherwise, I know there is a concise code for the input data, which is the program that generated it. When you don't understand the source distribution (i.e. the human brain), the problem is much harder, and you have a legitimate test. I understand that Ben is developing AI for virtual worlds. This might produce interesting results, but I wouldn't call it AGI. The value of AGI is on the order of US $1 quadrillion. It is a global economic system running on a smarter internet. I believe that any attempt to develop AGI on a budget of $1 million or $1 billion or $1 trillion is just wishful thinking. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
--- On Wed, 9/3/08, Pei Wang [EMAIL PROTECTED] wrote: TITLE: Embodiment: Who does not have a body? AUTHOR: Pei Wang ABSTRACT: In the context of AI, ``embodiment'' should not be interpreted as ``giving the system a body'', but as ``adapting to the system's experience''. Therefore, being a robot is neither a sufficient condition nor a necessary condition of being embodied. What really matters is the assumption about the environment for which the system is designed. URL: http://nars.wang.googlepages.com/wang.embodiment.pdf The paper seems to argue that embodiment applies to any system with inputs and outputs, and therefore all AI systems are embodied. However, there are important differences between symbolic systems like NARS and systems with external sensors such as robots and humans. The latter are analog, e.g. the light intensity of a particular point in the visual field, or the position of a joint in an arm. In humans, there is a tremendous amount of data reduction from the senses, from 137 million rods and cones in each eye each firing up to 300 pulses per second, down to 2 bits per second by the time our high level visual perceptions reach long term memory. AI systems have traditionally avoided this type of processing because they lacked the necessary CPU power. IMHO this has resulted in biologically implausible symbolic language models with only a small number of connections between concepts, rather than the tens of thousands of connections per neuron. Another aspect of embodiment (as the term is commonly used), is the false appearance of intelligence. We associate intelligence with humans, given that there are no other examples. So giving an AI a face or a robotic body modeled after a human can bias people to believe there is more intelligence than is actually present. -- Matt Mahoney, [EMAIL PROTECTED] --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
On Thu, Sep 4, 2008 at 8:56 AM, Valentina Poletti [EMAIL PROTECTED] wrote: I agree with Pei in that a robot's experience is not necessarily more real than that of a, say, web-embedded agent - if anything it is closer to the human experience of the world. But who knows how limited our own sensory experience is anyhow. Perhaps a better intelligence would comprehend the world better through a different emboyment. Exactly, the world to a system is always limited by the system's I/O channels, and for systems with different I/O channels, their worlds are different in many aspects, but no one is more real than the others. However, could you guys be more specific regarding the statistical differences of different types of data? What kind of differences are you talking about specifically (mathematically)? And what about the differences at the various levels of the dual-hierarchy? Has any of your work or research suggested this hypothesis, if so which? It is Ben who suggested the statistical differences and the dual-hierarchy, while I'm still not convinced about their value. My own constructive work on this topic can be found in http://nars.wang.googlepages.com/wang.semantics.pdf Pei --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
On Thu, Sep 4, 2008 at 9:35 AM, Ben Goertzel [EMAIL PROTECTED] wrote: I understand that a keyboard and touchpad do provide proprioceptive input, but I think it's too feeble, and too insensitively respondent to changes in the environment and the relation btw the laptop and the environment, to serve as the foundation for a robust self-model or a powerful general intelligence. Compared to what? Of course the human sensors are much more complicated, but many robot sensors are no better, so why they are considered as real, while keyboard and touchpad are not? Of course I'm not really arguing that keyboard and touchpad are all we'll need for AGI (I plan to play with robots myself), but that there is no fundamental difference between what we call 'robot' and what we call 'computer', as far as the 'embodiment' discussion is concerned. Robot is just special-purpose computer with I/O not designed for human users. Of course it won't have a visual concept of self, but a system like NARS has the potential to grow into an intelligent operating system, with a notion of self based on what it can feel and do, as well as the causal relations among them --- If there is a file in this folder, then I should have felt it, it cannot be there because I've deleted the contents. My suggestion is that the file system lacks the complexity of structure and dynamics to support the emergence of a robust self-model, and powerful general intelligence... Sure. I just used file managing as a simple example. What if the AI have full control of the system's hardware and software, and can use them in novel ways to solve all kinds of problems unknown to it previously, without human involvement? I would call a self any internal, explicit model that a system creates that allows it to predict its own behaviors in a sufficient variety of contexts This need not have a visual aspect nor a great similarity to a human self. I'd rather not call it a 'model', though won't argue on this topic --- 'embodiment' is already confusing enough, so 'self' is better to wait, otherwise someone will even add 'consciousness' into the discussion. ;-) Pei --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
On Thursday 04 September 2008, Matt Mahoney wrote: Another aspect of embodiment (as the term is commonly used), is the false appearance of intelligence. We associate intelligence with humans, given that there are no other examples. So giving an AI a face or a robotic body modeled after a human can bias people to believe there is more intelligence than is actually present. I'm still waiting until you guys could show me a psychometric test that has a one-to-one correlation with the bioinformatics and neuroinformatics and then thus could be approached with a physical model down at the biophysics. Otherwise the 'false appearance of intelligence' is a truism - intelligence is false. What then? (Would you give up making brains and such systems? I'm just wondering. It's an interesting scenario.) - Bryan http://heybryan.org/ Engineers: http://heybryan.org/exp.html irc.freenode.net #hplusroadmap --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
On Thu, Sep 4, 2008 at 10:04 AM, Ben Goertzel [EMAIL PROTECTED] wrote: Hi Pei, I think your point is correct that the notion of embodiment presented by Brooks and some other roboticists is naive. I'm not sure whether their actual conceptions are naive, or whether they just aren't presenting their foundational philosophical ideas clearly in their writings (being ultimately more engineering-oriented people, and probably not that accustomed to the philosophical style of discourse in which these sorts of definitional distinctions need to be more precisely drawn). To a large extent, their position is an reaction to the 'disembodied' symbolic AI, though they get the issue wrong. The symbolic AI is indeed 'disembodied', but it is not because computers have no body (or sensorimotor devices), but that the systems are designed to ignore their body and their experience. Therefore, the solution should not be to get a (robotic) body, but to take experience into account. I do think (in approximate concurrence with your paper) that ANY control system physically embodied in a physical system S, that has an input and output stream, and whose input and output stream possess correlation with the physical state of S, should be considered as psychologically embodied. Clearly, whether it's a robot or a laptop (w/o network connection if you like), such a system has the basic property of embodiment. Yes, though I'd neither say possess correlation with the physical state (which is the terminology of model-theoretic semantics), nor psychologically embodied (which still sounds like a second-rate substitute of physically embodied). Furthermore S doesn't need to be a physical system ... it could be a virtual system inside some virtual world (and then there's the question of what properties characterize a valid virtual world ... but let's leave that for another email thread...) Every system (in this discussion) is a physical system. It is just that sometimes we can ignore its physical properties. However, I think that not all psychologically-embodied systems possess a sufficiently rich psychological-embodiment to lead to significantly general intelligence My suggestion is that a laptop w/o network connection or odd sensor-peripherals, probably does not have sufficiently rich correlations btw its I/O stream and its physical state, to allow it to develop a robust self-model of its physical self (which can then be used as a basis for a more general phenomenal self). That is a separate issue. If a system's I/O devices are very simple, it cannot produce rich behaviors. However, the problem is not caused by 'disembodiment'. We cannot say that a body much reach a certain complexity to be called a 'body'. I think that Varela and crew understood the value of this rich network of correlations, but mistakenly assumed it to be a unique property of biological systems... Agree. I realize that the points you made in your paper do not contradict the suggestions I've made in this email. I don't think anything significant in your paper is wrong, actually. It just seems to me not to address the most interesting aspects of the embodiment issue as related to AGI. Understand. Pei --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
On Thu, Sep 4, 2008 at 2:22 PM, Matt Mahoney [EMAIL PROTECTED] wrote: The paper seems to argue that embodiment applies to any system with inputs and outputs, and therefore all AI systems are embodied. No. It argues that since every system has inputs and outputs, 'embodiment', as a non-trivial notion, should be interpreted as taking experience into account when behaves. Therefore, traditional symbolic AI systems, like CYC, is still disembodied. However, there are important differences between symbolic systems like NARS and systems with external sensors such as robots and humans. NARS, when implemented, has input/output, and therefore has external sensors. I guess you still see NARS as using model-theoretic semantics, so you call it symbolic and contrast it with system with sensors. This is not correct --- see http://nars.wang.googlepages.com/wang.semantics.pdf and http://nars.wang.googlepages.com/wang.AI_Misconceptions.pdf The latter are analog, e.g. the light intensity of a particular point in the visual field, or the position of a joint in an arm. In humans, there is a tremendous amount of data reduction from the senses, from 137 million rods and cones in each eye each firing up to 300 pulses per second, down to 2 bits per second by the time our high level visual perceptions reach long term memory. Within a certain accuracy, 'digital' and 'analog' have no fundamental difference. I hope you are not arguing that only analog system can be embodied. AI systems have traditionally avoided this type of processing because they lacked the necessary CPU power. IMHO this has resulted in biologically implausible symbolic language models with only a small number of connections between concepts, rather than the tens of thousands of connections per neuron. You have made this point on CPU power several times, and I'm still not convinced that the bottleneck of AI is hardware capacity. Also, there is no reason to believe an AGI must be designed in a biologically plausible way. Another aspect of embodiment (as the term is commonly used), is the false appearance of intelligence. We associate intelligence with humans, given that there are no other examples. So giving an AI a face or a robotic body modeled after a human can bias people to believe there is more intelligence than is actually present. I agree with you on this point, though will not argue so in the paper --- it is like to call the roboticists cheating, even though it is indeed the case that works in robotics are much easier to get public attention. Pei --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
[agi] draft for comment
TITLE: Embodiment: Who does not have a body? AUTHOR: Pei Wang ABSTRACT: In the context of AI, ``embodiment'' should not be interpreted as ``giving the system a body'', but as ``adapting to the system's experience''. Therefore, being a robot is neither a sufficient condition nor a necessary condition of being embodied. What really matters is the assumption about the environment for which the system is designed. URL: http://nars.wang.googlepages.com/wang.embodiment.pdf --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
Pei:it is important to understand that both linguistic experience and non-linguistic experience are both special cases of experience, and the latter is not more real than the former. In the previous discussions, many people implicitly suppose that linguistic experience is nothing but Dictionary-Go-Round [Harnad, 1990], and only non-linguistic experience can give symbols meaning. This is a misconception coming from traditional semantics, which determines meaning by referred object, so that an image of the object seems to be closer to the real thing than a verbal description [Wang, 2007]. 1. Of course the image is more real than the symbol or word. Simple test of what should be obvious: a) use any amount of symbols you like, incl. Narsese, to describe Pei Wang. Give your description to any intelligence, human or AI, and see if it can pick out Pei in a lineup of similar men. b) give the same intelligence a photo of Pei - apply the same test. Guess which method will win. Only images can represent *INDIVIDUAL objects* - incl Pei/Ben or this keyboard on my desk. And in the final analysis, only indvidual objects *are* real. There are no chairs or oranges for example - those general concepts are, in the final analysis, useful fictions. There is only this chair here and that chair over there. And if you want to refer to them, individually, - so that you communicate successfully with another person/intelligence - you have no choice but to use images, (flat or solid). 2. Symbols are abstract - they can't refer to anything unless you already know, via images, what they refer to. If you think not, please draw a cheggnutAgain, if I give you an image of a cheggnut, you will have no problem. 3. You talk of a misconception of semantics, but give no reason why it is such, merely state it is. 4. You leave out the most important thing of all - you argue that experience is composed of symbols and images. And...? Hey, there's also the real thing(s). The real objects that they refer to. You certainly can't do science without looking at the real objects. And science is only a systematic version of all intelligence. That's how every functioning general intelligence is able to be intelligent about the world - by being grounded in the real world, composed of real objects. which it can go out and touch, walk round, look at and interact with. A box like Nars can't do that, can it? Do you realise what you're saying, Pei? To understand statements is to *realise* what they mean - what they refer to - to know that they refer to real objects, which you can really go and interact with and test - and to try (or have your brain try automatically) to connect those statements to real objects. When you or I are given words or images, find this man [Pei], or cook a Chinese meal tonight, we know that those signs must be tested in the real world and are only valid if so tested. We know that it's possible that that man over there who looks v. like the photo may not actually be Pei, or that Pei may have left the country and be impossible to find. We know that it may be impossible to cook such a meal, because there's no such food around. - And all such tests can only be conducted in the real world (and not say by going and looking at other texts or photos - living in a Web world). Your concept of AI is not so much un-grounded as unreal. 5. Why on earth do you think that evolution shows us general intelligences very successfully dealing with the problems of the world for over a billion years *without* any formal symbols? Why do infants take time to acquire l;anguage and are therefore able to survive without it? The conception of AI that you are advancing is the equivalent of Creationism - it both lacks and denies an evolutionary perspective on intelligence - a (correctly) cardinal sin in modern science.. --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
Pei, I have a different sort of reason for thinking embodiment is important ... it's a deeper reason that I think underlies the embodiment is important because of symbol grounding argument. Linguistic data, mathematical data, visual data, motoric data etc. are all just bits ... and intelligence needs to work by recognizing patterns among these bits, especially patterns related to system goals. What I think is that the set of patterns in perceptual and motoric data has radically different statistical properties than the set of patterns in linguistic and mathematical data ... and that the properties of the set of patterns in perceptual and motoric data is intrinsically better suited to the needs of a young, ignorant, developing mind. All these different domains of pattern display what I've called a dual network structure ... a collection of hierarchies (of progressively more and more complex, hierarchically nested patterns) overlayed with a heterarchy (of overlapping, interrelated patterns). But the statistics of the dual networks in the different domains is different. I haven't fully plumbed the difference yet ... but, among the many differences is that in perceptual/motoric domains, you have a very richly connected dual network at a very low level of the overall dual network hierarchy -- i.e., there's a richly connected web of relatively simple stuff to understand ... and then these simple things are related to (hence useful for learning) the more complex things, etc. In short, Pei, I agree that the arguments typically presented in favor of embodiment in AI suck. However, I think there are deeper factors going on which do imply a profound value of embodiment for AGI. Unfortunately, we currently lack a really appropriate scientific language for describing the differences in statistical organization between different pattern-sets, so it's almost as difficult to articulate these differences as it is to understand them... -- Ben G On Wed, Sep 3, 2008 at 4:58 PM, Pei Wang [EMAIL PROTECTED] wrote: TITLE: Embodiment: Who does not have a body? AUTHOR: Pei Wang ABSTRACT: In the context of AI, ``embodiment'' should not be interpreted as ``giving the system a body'', but as ``adapting to the system's experience''. Therefore, being a robot is neither a sufficient condition nor a necessary condition of being embodied. What really matters is the assumption about the environment for which the system is designed. URL: http://nars.wang.googlepages.com/wang.embodiment.pdf --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?; Powered by Listbox: http://www.listbox.com -- Ben Goertzel, PhD CEO, Novamente LLC and Biomind LLC Director of Research, SIAI [EMAIL PROTECTED] Nothing will ever be attempted if all possible objections must be first overcome - Dr Samuel Johnson --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
Mike, As I said before, you give symbol a very narrow meaning, and insist that it is the only way to use it. In the current discussion, symbols are not 'X', 'Y', 'Z', but 'table', 'time', 'intelligence'. BTW, what images you associate with the latter two? Since you prefer to use person as example, let me try the same. All of my experience about 'Mike Tintner' is symbolic, nothing visual, but it still makes you real enough to me, and I've got more information about you than a photo of you can provide. For instance, this experience tells me that to argue this issue with you will very likely be a waste of time, which is something that no photo can teach me. I still cannot pick you out in a lineup, but it doesn't mean your name is meaningless to me. I'm sorry if it sounds rude --- I rarely talk to people in this tone, but you are exceptional, in my experience of personal communication. Again, the meaning of your name, in my mind, is not the person it refers, but its relations with other concepts in my experience, this experience can either be visual, verbal, or something else. Pei On Wed, Sep 3, 2008 at 6:07 PM, Mike Tintner [EMAIL PROTECTED] wrote: Pei:it is important to understand that both linguistic experience and non-linguistic experience are both special cases of experience, and the latter is not more real than the former. In the previous discussions, many people implicitly suppose that linguistic experience is nothing but Dictionary-Go-Round [Harnad, 1990], and only non-linguistic experience can give symbols meaning. This is a misconception coming from traditional semantics, which determines meaning by referred object, so that an image of the object seems to be closer to the real thing than a verbal description [Wang, 2007]. 1. Of course the image is more real than the symbol or word. Simple test of what should be obvious: a) use any amount of symbols you like, incl. Narsese, to describe Pei Wang. Give your description to any intelligence, human or AI, and see if it can pick out Pei in a lineup of similar men. b) give the same intelligence a photo of Pei - apply the same test. Guess which method will win. Only images can represent *INDIVIDUAL objects* - incl Pei/Ben or this keyboard on my desk. And in the final analysis, only indvidual objects *are* real. There are no chairs or oranges for example - those general concepts are, in the final analysis, useful fictions. There is only this chair here and that chair over there. And if you want to refer to them, individually, - so that you communicate successfully with another person/intelligence - you have no choice but to use images, (flat or solid). 2. Symbols are abstract - they can't refer to anything unless you already know, via images, what they refer to. If you think not, please draw a cheggnutAgain, if I give you an image of a cheggnut, you will have no problem. 3. You talk of a misconception of semantics, but give no reason why it is such, merely state it is. 4. You leave out the most important thing of all - you argue that experience is composed of symbols and images. And...? Hey, there's also the real thing(s). The real objects that they refer to. You certainly can't do science without looking at the real objects. And science is only a systematic version of all intelligence. That's how every functioning general intelligence is able to be intelligent about the world - by being grounded in the real world, composed of real objects. which it can go out and touch, walk round, look at and interact with. A box like Nars can't do that, can it? Do you realise what you're saying, Pei? To understand statements is to *realise* what they mean - what they refer to - to know that they refer to real objects, which you can really go and interact with and test - and to try (or have your brain try automatically) to connect those statements to real objects. When you or I are given words or images, find this man [Pei], or cook a Chinese meal tonight, we know that those signs must be tested in the real world and are only valid if so tested. We know that it's possible that that man over there who looks v. like the photo may not actually be Pei, or that Pei may have left the country and be impossible to find. We know that it may be impossible to cook such a meal, because there's no such food around. - And all such tests can only be conducted in the real world (and not say by going and looking at other texts or photos - living in a Web world). Your concept of AI is not so much un-grounded as unreal. 5. Why on earth do you think that evolution shows us general intelligences very successfully dealing with the problems of the world for over a billion years *without* any formal symbols? Why do infants take time to acquire l;anguage and are therefore able to survive without it? The conception of AI that you are advancing is the equivalent of Creationism - it both lacks and denies an
Re: [agi] draft for comment.. P.S.
I think I have an appropriate term for what I was trying to conceptualise. It is that intelligence has not only to be embodied, but it has to be EMBEDDED in the real world - that's the only way it can test whether information about the world and real objects is really true. If you want to know whether Jane Doe is great at sex, you can't take anyone's word for it, you have to go to bed with her. [Comments on the term esp. welcome). --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
Re: [agi] draft for comment
On Wed, Sep 3, 2008 at 6:24 PM, Ben Goertzel [EMAIL PROTECTED] wrote: What I think is that the set of patterns in perceptual and motoric data has radically different statistical properties than the set of patterns in linguistic and mathematical data ... and that the properties of the set of patterns in perceptual and motoric data is intrinsically better suited to the needs of a young, ignorant, developing mind. Sure it is. Systems with different sensory channels will never fully understand each other. I'm not saying that one channel (verbal) can replace another (visual), but that both of them (and many others) can give symbol/representation/concept/pattern/whatever-you-call-it meaning. No on is more real than others. All these different domains of pattern display what I've called a dual network structure ... a collection of hierarchies (of progressively more and more complex, hierarchically nested patterns) overlayed with a heterarchy (of overlapping, interrelated patterns). But the statistics of the dual networks in the different domains is different. I haven't fully plumbed the difference yet ... but, among the many differences is that in perceptual/motoric domains, you have a very richly connected dual network at a very low level of the overall dual network hierarchy -- i.e., there's a richly connected web of relatively simple stuff to understand ... and then these simple things are related to (hence useful for learning) the more complex things, etc. True, but can you say that the relations among words, or concepts, are simpler? In short, Pei, I agree that the arguments typically presented in favor of embodiment in AI suck. However, I think there are deeper factors going on which do imply a profound value of embodiment for AGI. Unfortunately, we currently lack a really appropriate scientific language for describing the differences in statistical organization between different pattern-sets, so it's almost as difficult to articulate these differences as it is to understand them... In this short paper, I make no attempt to settle all issues, but just to point out a simple fact --- a laptop has a body, and is not less embodied than Roomba or Mindstorms --- that seems have been ignored in the previous discussion. Pei --- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com