RE: Language modeling (was Re: [agi] draft for comment)

2008-09-08 Thread John G. Rose
 From: Matt Mahoney [mailto:[EMAIL PROTECTED]
 
 --- On Sun, 9/7/08, John G. Rose [EMAIL PROTECTED] wrote:
 
  From: John G. Rose [EMAIL PROTECTED]
  Subject: RE: Language modeling (was Re: [agi] draft for comment)
  To: agi@v2.listbox.com
  Date: Sunday, September 7, 2008, 9:15 AM
   From: Matt Mahoney [mailto:[EMAIL PROTECTED]
  
   --- On Sat, 9/6/08, John G. Rose
  [EMAIL PROTECTED] wrote:
  
Compression in itself has the overriding goal of
  reducing
storage bits.
  
   Not the way I use it. The goal is to predict what the
  environment will
   do next. Lossless compression is a way of measuring
  how well we are
   doing.
  
 
  Predicting the environment in order to determine which data
  to pack where,
  thus achieving higher compression ratio. Or compression as
  an integral part
  of prediction? Some types of prediction are inherently
  compressed I suppose.
 
 Predicting the environment to maximize reward. Hutter proved that
 universal intelligence is a compression problem. The optimal behavior of
 an AIXI agent is to guess the shortest program consistent with
 observation so far. That's algorithmic compression.
 

Oh I see. Guessing shortest program = compression. OK right. But yeah like
Pei said the word compression is misleading. It implies a reduction where
you are actually increasing understanding :)

John




---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


RE: Language modeling (was Re: [agi] draft for comment)

2008-09-07 Thread John G. Rose
 From: Matt Mahoney [mailto:[EMAIL PROTECTED]
 
 --- On Sat, 9/6/08, John G. Rose [EMAIL PROTECTED] wrote:
 
  Compression in itself has the overriding goal of reducing
  storage bits.
 
 Not the way I use it. The goal is to predict what the environment will
 do next. Lossless compression is a way of measuring how well we are
 doing.
 

Predicting the environment in order to determine which data to pack where,
thus achieving higher compression ratio. Or compression as an integral part
of prediction? Some types of prediction are inherently compressed I suppose.


John



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


RE: Language modeling (was Re: [agi] draft for comment)

2008-09-07 Thread Matt Mahoney
--- On Sun, 9/7/08, John G. Rose [EMAIL PROTECTED] wrote:

 From: John G. Rose [EMAIL PROTECTED]
 Subject: RE: Language modeling (was Re: [agi] draft for comment)
 To: agi@v2.listbox.com
 Date: Sunday, September 7, 2008, 9:15 AM
  From: Matt Mahoney [mailto:[EMAIL PROTECTED]
  
  --- On Sat, 9/6/08, John G. Rose
 [EMAIL PROTECTED] wrote:
  
   Compression in itself has the overriding goal of
 reducing
   storage bits.
  
  Not the way I use it. The goal is to predict what the
 environment will
  do next. Lossless compression is a way of measuring
 how well we are
  doing.
  
 
 Predicting the environment in order to determine which data
 to pack where,
 thus achieving higher compression ratio. Or compression as
 an integral part
 of prediction? Some types of prediction are inherently
 compressed I suppose.

Predicting the environment to maximize reward. Hutter proved that universal 
intelligence is a compression problem. The optimal behavior of an AIXI agent is 
to guess the shortest program consistent with observation so far. That's 
algorithmic compression.

-- Matt Mahoney, [EMAIL PROTECTED]



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: AI isn't cheap (was Re: Real vs. simulated environments (was Re: [agi] draft for comment.. P.S.))

2008-09-07 Thread Steve Richfield
/aixigentle.htm

 3. Legg, Shane, (2006), Is There an Elegant Universal Theory of
 Prediction?, Technical Report IDSIA-12-06, IDSIA / USI-SUPSI, Dalle Molle
 Institute for Artificial Intelligence, Galleria 2, 6928 Manno, Switzerland.

 http://www.vetta.org/documents/IDSIA-12-06-1.pdf


 -- Matt Mahoney, [EMAIL PROTECTED]

 --- On *Sat, 9/6/08, Steve Richfield [EMAIL PROTECTED]* wrote:

 From: Steve Richfield [EMAIL PROTECTED]
 Subject: Re: AI isn't cheap (was Re: Real vs. simulated environments (was
 Re: [agi] draft for comment.. P.S.))
 To: agi@v2.listbox.com
 Date: Saturday, September 6, 2008, 2:58 PM

  Matt,

 I heartily disagree with your view as expressed here, and as stated to my
 by heads of CS departments and other high ranking CS PhDs, nearly (but not
 quite) all of whom have lost the fire in the belly that we all once had
 for CS/AGI.

 I DO agree that CS is like every other technological endeavor, in that
 almost everything that can be done as a PhD thesis has already been done.
 but there is a HUGE gap between a PhD thesis scale project and what that
 same person can do with another few more millions and a couple more years,
 especially if allowed to ignore the naysayers.

 The reply is a even more complex than your well documented statement, but
 I'll take my best shot at it, time permitting. Here, the angel is in the
 details.

  On 9/5/08, Matt Mahoney [EMAIL PROTECTED] wrote:

 --- On Fri, 9/5/08, Steve Richfield [EMAIL PROTECTED] wrote:
 I think that a billion or so, divided up into small pieces to fund EVERY
 disparate approach to see where the low hanging fruit is, would go a
 LONG way in guiding subsequent billions. I doubt that it would take a
 trillion to succeed.

 Sorry, the low hanging fruit was all picked by the early 1960's. By then
 we had neural networks [1,6,7,11,12],


 ... but we STILL do not have any sort of useful *unsupervised* NN, the
 equivalent of which seems to be needed for any good AGI. Note my recent
 postings about a potential theory of everything that would most directly
 hit unsupervised NN, providing not only a good way of operating these, but
 possibly the provably best way of operating.

 natural language processing and language translation [2],


 My Dr. Eliza is right there and showing that useful understanding out of
 precise context is almost certainly impossible. I regularly meet with the
 folks working on the Russian translator project, and rest assured, things
 are STILL advancing fairly rapidly. Here, there is continuing funding, and I
 expect that the Russian translator will eventually succeed (they already
 claim success).

 models of human decision making [3],


 These are curious but I believe them to be an emergent properties of
 processes that we don't understand at all, so they have no value other than
 for testing of future systems. Note that human decision making does NOT
 generally include many advanced sorts of logic that simply don't occur to
 ordinary humans, which is where an AGI could shine. Hence, understanding the
 human but not the non-human processes is nearly worthless.

 automatic theorem proving [4,8,10],


 Great for when you already have the answer - but what is it good for?!

 natural language databases [5],


 Which are only useful if/when the provably false presumption is true that
 NL understanding is generally possible.

 game playing programs [9,13],


 Note relevant for AGI.

 optical character recognition [14],


 Only recently have methods emerged that are truly font-independent. This
 SHOULD have been accomplished long ago (like shortly after your 1960
 reference), but no one wanted to throw significant money at it. I nearly
 launched an OCR company (Cognitext) in 1981, but funding eventually failed
 *because* I had done the research and had a new (but *un*proven) method
 that was truly font-independent.

 handwriting and speech recognition [15],


 ... both of which are now good enough for AI interaction (e.g. my Gracie
 speech I/O interface to Dr. Eliza), but NOT good enough for general
 dictation. Unfortunately, the methods used don't seem to shed much light on
 how the underlying processes work in us.

 and important theoretical work [16,17,18].


 Note again my call for work/help on what I call computing's theory of
 everything leveraging off of principal component analysis.

 Since then we have had mostly just incremental improvements.


 YES. This only shows that the support process has long been broken. and NOT
 that there isn't a LOT of value that is just out of reach of PhD-sized
 projects.

 Big companies like Google and Microsoft have strong incentives to develop
 AI


 Internal politics at both (that I have personally run into) restrict
 expenditures to PROVEN methods, as a single technical failure spells doom
 for the careers of everyone working on them. Hence, their RD is all D and
 no R.

 and have billions to spend.


 Not one dollar of which goes into what I would call genuine research.

 Maybe the problem

Re: [agi] draft for comment

2008-09-07 Thread Mike Tintner

Pei:As I said before, you give symbol a very narrow meaning, and insist
that it is the only way to use it. In the current discussion,
symbols are not 'X', 'Y', 'Z', but 'table', 'time', 'intelligence'.
BTW, what images you associate with the latter two?

Since you prefer to use person as example, let me try the same. All of
my experience about 'Mike Tintner' is symbolic, nothing visual, but it
still makes you real enough to me...

I'm sorry if it sounds rude


Pei,

You attribute to symbols far too broad powers that they simply don't have - 
and demonstrably, scientifically, don't have.


For example, you think that your experience of Mike Tintner - the rude 
guy - is entirely symbolic. Yes, all your experience of me has been mediated 
entirely via language/symbols -these posts.  But by far the most important 
parts of it have actually been images. Ridiculous, huh?


Look at this sentence:

If you want to hear about it, you'll probably want to know where I was 
born, and what a lousy childhood I had, and how my parents were occupied 
before they had me, and all the David Copperfield crap, but if you want to 
know the truth, I don't really want to get into it.


In 60 words,  one of the great opening sentences of a novel, Salinger has 
created a whole character. How? He did it by creating a voice. He did it by 
what is called prosody (and also diction). No current AGI method has the 
least idea of how to process that prosody. But your brain does. Pei doesn't. 
But his/your brain does.


And your experience of MT has been heavily based similarly on processing the 
*sound* images - the voice behind my words. Hence your I'm sorry if it 
*sounds* rude..


Words, even written words, aren't just symbols, they are sounds. And your 
brain hears those sounds and from their music can tell many, many things, 
including the emotions of the speaker, and whether they're being angry or 
ironic or rude.


Now, if you had had more of a literary/arts education, you would probably be 
alive to that dimension. But, as it is, you've missed it, and you're missing 
all kinds of dimensions of how symbols work.


Similarly, if you had more of a visual education, and also more of a 
psychological developmental background, you wouldn't find time and 
intelligence so daunting to visualise.


You would realise that it takes a great deal of time and preparatory 
sensory/imaginative to build up abstract concepts


You would realise that it takes time for an infant to come to use that 
word, and still more for a child to understand the word intelligence. I 
doubt that any child will understand time before they've seen a watch or 
clock, and that's what they will probably visualise time as, first. Your 
capacity to abstract time still further, will have come from having become 
gradually acquainted with a whole range of time-measuring devices, and 
seeing the word time and associating that with many other kinds of 
measurement especially in relation to maths. and science.


Similarly,  a person's concept of intelligence will come from seeing and 
hearing people solving problems in different ways - quickly and slowly, for 
example.. It will be deeply grounded in sensory images and experience.


All the most abstract maths and logic that you may think totally abstract 
are similarly and necessarily grounded. Ben, in parallel to you, didn't 
realise that the decimal numeral system is digital, based on the hand, and 
so, a little less obviously, is the roman numeral system. Numbers and logic 
have to be built up out of experience.


[You might profit BTW by looking at Barsalou, [many of his papers online], 
to see how the mind modally simulates concepts - with lots of experimental 
evidence]


I, as you know, am very ignorant about computers; but you are also very 
ignorant about all kinds of dimensions of how symbols work, and intelligence 
generally, that are absolutely essential for AGI. You can continue to look 
down on me, or you can open your mind, recognize that general intelligence 
can only be achieved by a confluence of disciplines way beyond the reach of 
any single individual, and see that maybe useful exchanges can take place. 





---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-07 Thread Jiri Jelinek
Mike,

If you think your AGI know-how is superior to the know-how of those
who already built testable thinking machines then why don't you try to
build one yourself? Maybe you would learn more that way than when
spending significant amount of time trying to sort out great
incompatibilities between your views and views of the other AGI
researchers. If you don't have resources to build the system then,
perhaps, you could just put together some architecture doc (including
your definitions of important terms) for your as-simple-as-possible
AGI. The talk could then be more specific/interesting/fruitful for
everyone involved. Sorry if I'm missing something. I'm reading this
list only occasionally. But when I get to your posts, I often see
things very differently and I know I'm not alone. I guess, if you try
to view things from developers perspective + if you systematically
move forward improving a particular AGI design, your views would
change drastically. Just my opinion..

Regards,
Jiri Jelinek


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


RE: Language modeling (was Re: [agi] draft for comment)

2008-09-06 Thread John G. Rose
Thinking out loud here as I find the relationship between compression and
intelligence interesting:

Compression in itself has the overriding goal of reducing storage bits.
Intelligence has coincidental compression. There is resource management
there. But I do think that it is not ONLY coincidental. Knowledge has
structure which can be organized and naturally can collapse into a lower
complexity storage state. Things have order, based on physics and other
mathematical relationships. The relationship between compression and stored
knowledge and intelligence is intriguing. But knowledge can be compressed
inefficiently to where it inhibits extraction and other operations so there
are differences with compression and intelligence related to computational
expense. Optimal intelligence would have a variational compression structure
IOW some stuff needs fast access time with minimal decompression resource
expenditure and other stuff has high storage priority but computational
expense and access time are not a priority.

And then when you say the word compression there is a complicity of utility.
The result of a compressor that has general intelligence still has a goal of
reducing storage bits. I think that compression can be a byproduct of the
stored knowledge created by a general intelligence. But if you have a
compressor with general intelligence built in and you assign it a goal of
taking input data and reducing the storage space it still may result in a
series of hacks because that may be the best way of accomplishing that goal.


Sure there may be some new undiscovered hacks that require general
intelligence to uncover. And a compressor that is generally intelligent may
produce more rich lossily compressed data from varied sources. The best
lossy compressor is probably generally intelligent. They are very similar as
you indicate... but when you start getting real lossy, when you start asking
questions from your lossy compressed data that are not related to just the
uncompressed input there is a difference there. Compression itself is just
one dimensional. Intelligence is multi. 

John 



 -Original Message-
 From: Matt Mahoney [mailto:[EMAIL PROTECTED]
 Sent: Friday, September 05, 2008 6:39 PM
 To: agi@v2.listbox.com
 Subject: Re: Language modeling (was Re: [agi] draft for comment)
 
 --- On Fri, 9/5/08, Pei Wang [EMAIL PROTECTED] wrote:
 
  Like to many existing AI works, my disagreement with you is
  not that
  much on the solution you proposed (I can see the value),
  but on the
  problem you specified as the goal of AI. For example, I
  have no doubt
  about the theoretical and practical values of compression,
  but don't
  think it has much to do with intelligence.
 
 In http://cs.fit.edu/~mmahoney/compression/rationale.html I explain why
 text compression is an AI problem. To summarize, if you know the
 probability distribution of text, then you can compute P(A|Q) for any
 question Q and answer A to pass the Turing test. Compression allows you
 to precisely measure the accuracy of your estimate of P. Compression
 (actually, word perplexity) has been used since the early 1990's to
 measure the quality of language models for speech recognition, since it
 correlates well with word error rate.
 
 The purpose of this work is not to solve general intelligence, such as
 the universal intelligence proposed by Legg and Hutter [1]. That is not
 computable, so you have to make some arbitrary choice with regard to
 test environments about what problems you are going to solve. I believe
 the goal of AGI should be to do useful work for humans, so I am making a
 not so arbitrary choice to solve a problem that is central to what most
 people regard as useful intelligence.
 
 I had hoped that my work would lead to an elegant theory of AI, but that
 hasn't been the case. Rather, the best compression programs were
 developed as a series of thousands of hacks and tweaks, e.g. change a 4
 to a 5 because it gives 0.002% better compression on the benchmark. The
 result is an opaque mess. I guess I should have seen it coming, since it
 is predicted by information theory (e.g. [2]).
 
 Nevertheless the architectures of the best text compressors are
 consistent with cognitive development models, i.e. phoneme (or letter)
 sequences - lexical - semantics - syntax, which are themselves
 consistent with layered neural architectures. I already described a
 neural semantic model in my last post. I also did work supporting
 Hutchens and Alder showing that lexical models can be learned from n-
 gram statistics, consistent with the observation that babies learn the
 rules for segmenting continuous speech before they learn any words [3].
 
 I agree it should also be clear that semantics is learned before
 grammar, contrary to the way artificial languages are processed. Grammar
 requires semantics, but not the other way around. Search engines work
 using semantics only. Yet we cannot parse sentences like I ate pizza
 with Bob, I

Re: AI isn't cheap (was Re: Real vs. simulated environments (was Re: [agi] draft for comment.. P.S.))

2008-09-06 Thread Steve Richfield
Matt,

I heartily disagree with your view as expressed here, and as stated to my by
heads of CS departments and other high ranking CS PhDs, nearly (but not
quite) all of whom have lost the fire in the belly that we all once had
for CS/AGI.

I DO agree that CS is like every other technological endeavor, in that
almost everything that can be done as a PhD thesis has already been done.
but there is a HUGE gap between a PhD thesis scale project and what that
same person can do with another few more millions and a couple more years,
especially if allowed to ignore the naysayers.

The reply is a even more complex than your well documented statement, but
I'll take my best shot at it, time permitting. Here, the angel is in the
details.

On 9/5/08, Matt Mahoney [EMAIL PROTECTED] wrote:

 --- On Fri, 9/5/08, Steve Richfield [EMAIL PROTECTED] wrote:
 I think that a billion or so, divided up into small pieces to fund EVERY
 disparate approach to see where the low hanging fruit is, would go a
 LONG way in guiding subsequent billions. I doubt that it would take a
 trillion to succeed.

 Sorry, the low hanging fruit was all picked by the early 1960's. By then we
 had neural networks [1,6,7,11,12],


... but we STILL do not have any sort of useful *unsupervised* NN, the
equivalent of which seems to be needed for any good AGI. Note my recent
postings about a potential theory of everything that would most directly
hit unsupervised NN, providing not only a good way of operating these, but
possibly the provably best way of operating.

natural language processing and language translation [2],


My Dr. Eliza is right there and showing that useful understanding out of
precise context is almost certainly impossible. I regularly meet with the
folks working on the Russian translator project, and rest assured, things
are STILL advancing fairly rapidly. Here, there is continuing funding, and I
expect that the Russian translator will eventually succeed (they already
claim success).

models of human decision making [3],


These are curious but I believe them to be an emergent properties of
processes that we don't understand at all, so they have no value other than
for testing of future systems. Note that human decision making does NOT
generally include many advanced sorts of logic that simply don't occur to
ordinary humans, which is where an AGI could shine. Hence, understanding the
human but not the non-human processes is nearly worthless.

automatic theorem proving [4,8,10],


Great for when you already have the answer - but what is it good for?!

natural language databases [5],


Which are only useful if/when the provably false presumption is true that NL
understanding is generally possible.

game playing programs [9,13],


Note relevant for AGI.

optical character recognition [14],


Only recently have methods emerged that are truly font-independent. This
SHOULD have been accomplished long ago (like shortly after your 1960
reference), but no one wanted to throw significant money at it. I nearly
launched an OCR company (Cognitext) in 1981, but funding eventually failed *
because* I had done the research and had a new (but *un*proven) method that
was truly font-independent.

handwriting and speech recognition [15],


... both of which are now good enough for AI interaction (e.g. my Gracie
speech I/O interface to Dr. Eliza), but NOT good enough for general
dictation. Unfortunately, the methods used don't seem to shed much light on
how the underlying processes work in us.

and important theoretical work [16,17,18].


Note again my call for work/help on what I call computing's theory of
everything leveraging off of principal component analysis.

Since then we have had mostly just incremental improvements.


YES. This only shows that the support process has long been broken. and NOT
that there isn't a LOT of value that is just out of reach of PhD-sized
projects.

Big companies like Google and Microsoft have strong incentives to develop AI


Internal politics at both (that I have personally run into) restrict
expenditures to PROVEN methods, as a single technical failure spells doom
for the careers of everyone working on them. Hence, their RD is all D and
no R.

and have billions to spend.


Not one dollar of which goes into what I would call genuine research.

Maybe the problem really is hard.


... and maybe it is just a little difficult. My own Dr. Eliza program
has seemingly unbelievable NL-stated problem solving capabilities, but is
built mostly on the same sort of 1960s technology you cited. Why wasn't it
built before 1970? I see two simple reasons:
1.  Joe Weizenbaum, in his *Computer Power and Human Reason,* explained why
this approach could never work. That immediately made it impossible to get
any related effort funded or acceptable in a university setting.
2.  It took about a year to make a demonstrable real-world NL problem
solving system, which would have been at the outer reaches of a PhD or
casual personal project.

I have 

Re: Language modeling (was Re: [agi] draft for comment)

2008-09-06 Thread Matt Mahoney
--- On Fri, 9/5/08, Pei Wang [EMAIL PROTECTED] wrote:

 Thanks for taking the time to explain your ideas in detail.
 As I said,
 our different opinions on how to do AI come from our very
 different
 understanding of intelligence. I don't take
 passing Turing Test as
 my research goal (as explained in
 http://nars.wang.googlepages.com/wang.logic_intelligence.pdf
 and
 http://nars.wang.googlepages.com/wang.AI_Definitions.pdf). 
 I disagree
 with Hutter's approach, not because his SOLUTION is not
 computable,
 but because his PROBLEM is too idealized and simplified to
 be relevant
 to the actual problems of AI.

I don't advocate the Turing test as the ideal test of intelligence. Turing 
himself was aware of the problem when he gave an example of a computer 
answering an arithmetic problem incorrectly in his famous 1950 paper:

Q: Please write me a sonnet on the subject of the Forth Bridge.
A: Count me out on this one. I never could write poetry.
Q: Add 34957 to 70764.
A: (Pause about 30 seconds and then give as answer) 105621.
Q: Do you play chess?
A: Yes.
Q: I have K at my K1, and no other pieces.  You have only K at K6 and R at R1.  
It is your move.  What do you play?
A: (After a pause of 15 seconds) R-R8 mate.

I prefer a preference test, which a machine passes if you prefer to talk to 
it over a human. Such a machine would be too fast and make too few errors to 
pass a Turing test. For example, if you had to add two large numbers, I think 
you would prefer to use a calculator than ask someone. You could, I suppose, 
measure intelligence as the fraction of questions for which the machine gives 
the preferred answer, which would be 1/4 in Turing's example.

If you know the probability distribution P of text, and therefore know the 
distribution P(A|Q) for any question Q and answer A, then to pass the Turing 
test you would randomly choose answers from this distribution. But to pass the 
preference test for all Q, you would choose A that maximizes P(A|Q) because the 
most probable answer is usually the correct one. Text compression measures 
progress toward either test.

I believe that compression measures your definition of intelligence, i.e. 
adaptation given insufficient knowledge and resources. In my benchmark, there 
are two parts: the size of the decompression program, which measures the 
initial knowledge, and the compressed size, which measures prediction errors 
that occur as the system adapts. Programs must also meet practical time and 
memory constraints to be listed in most benchmarks.

Compression is also consistent with Legg and Hutter's universal intelligence, 
i.e. expected reward of an AIXI universal agent in an environment simulated by 
a random program. Suppose you have a compression oracle that inputs any string 
x and outputs the shortest program that outputs a string with prefix x. Then 
this reduces the (uncomputable) AIXI problem to using the oracle to guess which 
environment is consistent with the interaction so far, and figuring out which 
future outputs by the agent will maximize reward.

Of course universal intelligence is also not testable because it requires an 
infinite number of environments. Instead, we have to choose a practical data 
set. I use Wikipedia text, which has fewer errors than average text, but I 
believe that is consistent with my goal of passing the preference test.


-- Matt Mahoney, [EMAIL PROTECTED]



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


RE: Language modeling (was Re: [agi] draft for comment)

2008-09-06 Thread Matt Mahoney
--- On Sat, 9/6/08, John G. Rose [EMAIL PROTECTED] wrote:

 Compression in itself has the overriding goal of reducing
 storage bits.

Not the way I use it. The goal is to predict what the environment will do next. 
Lossless compression is a way of measuring how well we are doing.

-- Matt Mahoney, [EMAIL PROTECTED]



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: Language modeling (was Re: [agi] draft for comment)

2008-09-06 Thread Pei Wang
I won't argue against your  preference test here, since this is a
big topic, and I've already made my position clear in the papers I
mentioned.

As for compression, yes every intelligent system needs to 'compress'
its experience in the sense of keeping the essence but using less
space. However, it is clearly not loseless. It is even not what we
usually call loosy compression, because what to keep and in what
form is highly context-sensitive. Consequently, this process is not
reversible --- no decompression, though the result can be applied in
various ways. Therefore I prefer not to call it compression to avoid
confusing this process with the technical sense of compression,
which is reversible, at least approximately.

Legg and Hutter's universal intelligence definition is way too
narrow to cover various attempts towards AI, even as an idealization.
Therefore, I don't take it as a goal to aim at and to approach to as
close as possible. However, as I said before, I'd rather leave this
topic for the future, when I have enough time to give it a fair
treatment.

Pei

On Sat, Sep 6, 2008 at 4:29 PM, Matt Mahoney [EMAIL PROTECTED] wrote:
 --- On Fri, 9/5/08, Pei Wang [EMAIL PROTECTED] wrote:

 Thanks for taking the time to explain your ideas in detail.
 As I said,
 our different opinions on how to do AI come from our very
 different
 understanding of intelligence. I don't take
 passing Turing Test as
 my research goal (as explained in
 http://nars.wang.googlepages.com/wang.logic_intelligence.pdf
 and
 http://nars.wang.googlepages.com/wang.AI_Definitions.pdf).
 I disagree
 with Hutter's approach, not because his SOLUTION is not
 computable,
 but because his PROBLEM is too idealized and simplified to
 be relevant
 to the actual problems of AI.

 I don't advocate the Turing test as the ideal test of intelligence. Turing 
 himself was aware of the problem when he gave an example of a computer 
 answering an arithmetic problem incorrectly in his famous 1950 paper:

 Q: Please write me a sonnet on the subject of the Forth Bridge.
 A: Count me out on this one. I never could write poetry.
 Q: Add 34957 to 70764.
 A: (Pause about 30 seconds and then give as answer) 105621.
 Q: Do you play chess?
 A: Yes.
 Q: I have K at my K1, and no other pieces.  You have only K at K6 and R at 
 R1.  It is your move.  What do you play?
 A: (After a pause of 15 seconds) R-R8 mate.

 I prefer a preference test, which a machine passes if you prefer to talk to 
 it over a human. Such a machine would be too fast and make too few errors to 
 pass a Turing test. For example, if you had to add two large numbers, I think 
 you would prefer to use a calculator than ask someone. You could, I suppose, 
 measure intelligence as the fraction of questions for which the machine gives 
 the preferred answer, which would be 1/4 in Turing's example.

 If you know the probability distribution P of text, and therefore know the 
 distribution P(A|Q) for any question Q and answer A, then to pass the Turing 
 test you would randomly choose answers from this distribution. But to pass 
 the preference test for all Q, you would choose A that maximizes P(A|Q) 
 because the most probable answer is usually the correct one. Text compression 
 measures progress toward either test.

 I believe that compression measures your definition of intelligence, i.e. 
 adaptation given insufficient knowledge and resources. In my benchmark, there 
 are two parts: the size of the decompression program, which measures the 
 initial knowledge, and the compressed size, which measures prediction errors 
 that occur as the system adapts. Programs must also meet practical time and 
 memory constraints to be listed in most benchmarks.

 Compression is also consistent with Legg and Hutter's universal intelligence, 
 i.e. expected reward of an AIXI universal agent in an environment simulated 
 by a random program. Suppose you have a compression oracle that inputs any 
 string x and outputs the shortest program that outputs a string with prefix 
 x. Then this reduces the (uncomputable) AIXI problem to using the oracle to 
 guess which environment is consistent with the interaction so far, and 
 figuring out which future outputs by the agent will maximize reward.

 Of course universal intelligence is also not testable because it requires an 
 infinite number of environments. Instead, we have to choose a practical data 
 set. I use Wikipedia text, which has fewer errors than average text, but I 
 believe that is consistent with my goal of passing the preference test.


 -- Matt Mahoney, [EMAIL PROTECTED]



 ---
 agi
 Archives: https://www.listbox.com/member/archive/303/=now
 RSS Feed: https://www.listbox.com/member/archive/rss/303/
 Modify Your Subscription: https://www.listbox.com/member/?;
 Powered by Listbox: http://www.listbox.com



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: 

Re: AI isn't cheap (was Re: Real vs. simulated environments (was Re: [agi] draft for comment.. P.S.))

2008-09-06 Thread Matt Mahoney
Steve, where are you getting your cost estimate for AGI? Is it a gut feeling, 
or something like the common management practice of I can afford $X so it will 
cost $X?

My estimate of $10^15 is based on the value of the world economy, US $66 
trillion per year and increasing 5% annually over the next 30 years, which is 
how long it will take for the internet to grow to the computational power of 
10^10 human brains (at 10^15 bits and 10^16 OPS each) at the current rate of 
growth, doubling every couple of years. Even if you disagree with these numbers 
by a factor of 1000, it only moves the time to AGI by a few years, so the cost 
estimate hardly changes.

And even if the hardware is free, you still have to program or teach about 
10^16 to 10^17 bits of knowledge, assuming 10^9 bits of knowledge per brain [1] 
and 1% to 10% of this is not known by anyone else. Software and training costs 
are not affected by Moore's law. Even if we assume human level language 
understanding and perfect sharing of knowledge, the training cost will be 1% to 
10% of your working life to train the AGI to do your job.

Also, we have made *some* progress toward AGI since 1965, but it is mainly a 
better understanding of why it is so hard, e.g.

- We know that general intelligence is not computable [2] or provable [3]. 
There is no neat theory.

- From Cyc, we know that coding common sense is more than a 20 year effort. 
Lenat doesn't know how much more, but guesses it is maybe between 0.1% and 10% 
finished.

- Google is the closest we have to AI after a half trillion dollar effort.

 

1. Landauer, Tom (1986), “How much do
people remember?  Some estimates of the quantity of learned
information in long term memory”, Cognitive Science (10) pp.
477-493.




2. Hutter, Marcus (2003), A Gentle
Introduction to The Universal Algorithmic Agent {AIXI}, in
Artificial General Intelligence, B. Goertzel and C. Pennachin
eds., Springer. http://www.idsia.ch/~marcus/ai/aixigentle.htm




3. Legg, Shane, (2006), Is There an
Elegant Universal Theory of Prediction?,  Technical Report
IDSIA-12-06, IDSIA / USI-SUPSI, Dalle Molle Institute for Artificial
Intelligence, Galleria 2, 6928 Manno, Switzerland.
http://www.vetta.org/documents/IDSIA-12-06-1.pdf


-- Matt Mahoney, [EMAIL PROTECTED]

--- On Sat, 9/6/08, Steve Richfield [EMAIL PROTECTED] wrote:
From: Steve Richfield [EMAIL PROTECTED]
Subject: Re: AI isn't cheap (was Re: Real vs. simulated environments (was Re: 
[agi] draft for comment.. P.S.))
To: agi@v2.listbox.com
Date: Saturday, September 6, 2008, 2:58 PM

Matt,
 
I heartily disagree with your view as expressed here, and as stated to my by 
heads of CS departments and other high ranking CS PhDs, nearly (but not 
quite) all of whom have lost the fire in the belly that we all once had for 
CS/AGI.

 
I DO agree that CS is like every other technological endeavor, in that almost 
everything that can be done as a PhD thesis has already been done. but there is 
a HUGE gap between a PhD thesis scale project and what that same person can do 
with another few more millions and a couple more years, especially if allowed 
to ignore the naysayers.

 
The reply is a even more complex than your well documented statement, but I'll 
take my best shot at it, time permitting. Here, the angel is in the details.
 
On 9/5/08, Matt Mahoney [EMAIL PROTECTED] wrote: 
--- On Fri, 9/5/08, Steve Richfield [EMAIL PROTECTED] wrote:

I think that a billion or so, divided up into small pieces to fund EVERY
disparate approach to see where the low hanging fruit is, would go a
LONG way in guiding subsequent billions. I doubt that it would take a

trillion to succeed.

Sorry, the low hanging fruit was all picked by the early 1960's. By then we had 
neural networks [1,6,7,11,12],
 
... but we STILL do not have any sort of useful unsupervised NN, the equivalent 
of which seems to be needed for any good AGI. Note my recent postings about a 
potential theory of everything that would most directly hit unsupervised NN, 
providing not only a good way of operating these, but possibly the provably 
best way of operating.


natural language processing and language translation [2],
 
My Dr. Eliza is right there and showing that useful understanding out of 
precise context is almost certainly impossible. I regularly meet with the folks 
working on the Russian translator project, and rest assured, things are STILL 
advancing fairly rapidly. Here, there is continuing funding, and I expect that 
the Russian translator will eventually succeed (they already claim success).


models of human decision making [3],
 
These are curious but I believe them to be an emergent properties of processes 
that we don't understand at all, so they have no value other than for testing 
of future systems. Note that human decision making does NOT generally include 
many advanced sorts of logic that simply don't occur to ordinary humans, which 
is where an AGI could shine

Re: Language modeling (was Re: [agi] draft for comment)

2008-09-06 Thread Matt Mahoney
--- On Sat, 9/6/08, Pei Wang [EMAIL PROTECTED] wrote:

 As for compression, yes every intelligent
 system needs to 'compress'
 its experience in the sense of keeping the essence
 but using less
 space. However, it is clearly not loseless. It is
 even not what we
 usually call loosy compression, because what to
 keep and in what
 form is highly context-sensitive. Consequently, this
 process is not
 reversible --- no decompression, though the result can be
 applied in
 various ways. Therefore I prefer not to call it compression
 to avoid
 confusing this process with the technical sense of
 compression,
 which is reversible, at least approximately.

I think you misunderstand my use of compression. The goal is modeling or 
prediction. Given a string, predict the next symbol. I use compression to 
estimate how accurate the model is. It is easy to show that if your model is 
accurate, then when you connect your model to an ideal coder (such as an 
arithmetic coder), then compression will be optimal. You could actually skip 
the coding step, but it is cheap, so I use it so that there is no question of 
making a mistake in the measurement. If a bug in the coder produces a too small 
output, then the decompression step won't reproduce the original file.

In fact, many speech recognition experiments do skip the coding step in their 
tests and merely calculate what the compressed size would be. (More precisely, 
they calculate word perplexity, which is equivalent). The goal of speech 
recognition is to find the text y that maximizes P(y|x) for utterance x. It is 
common to factor the model using Bayes law: P(y|x) = P(x|y)P(y)/P(x). We can 
drop P(x) since it is constant, leaving the acoustic model P(x|y) and language 
model P(y) to evaluate. We know from experiments that compression tests on P(y) 
correlate well with word error rates for the overall system.

Internally, all lossless compressors use lossy compression or data reduction to 
make predictions. Most commonly, a context is truncated and possibly hashed 
before looking up the statistics for the next symbol. The top lossless 
compressors in my benchmark use more sophisticated forms of data reduction, 
such as mapping upper and lower case letters together, or mapping groups of 
semantically or syntactically related words to the same context.

As a test, lossless compression is only appropriate for text. For other hard AI 
problems such as vision, art, and music, incompressible noise would overwhelm 
the human-perceptible signal. Theoretically you could compress video to 2 bits 
per second (the rate of human long term memory) by encoding it as a script. The 
decompressor would read the script and create a new movie. The proper test 
would be lossy compression, but this requires human judgment to evaluate how 
well the reconstructed data matches the original.


-- Matt Mahoney, [EMAIL PROTECTED]




---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Language modeling (was Re: [agi] draft for comment)

2008-09-05 Thread Matt Mahoney
--- On Thu, 9/4/08, Pei Wang [EMAIL PROTECTED] wrote:

 I guess you still see NARS as using model-theoretic
 semantics, so you
 call it symbolic and contrast it with system
 with sensors. This is
 not correct --- see
 http://nars.wang.googlepages.com/wang.semantics.pdf and
 http://nars.wang.googlepages.com/wang.AI_Misconceptions.pdf

I mean NARS is symbolic in the sense that you write statements in Narsese like 
raven - bird 0.97, 0.92 (probability=0.97, confidence=0.92). I realize 
that the meanings of raven and bird are determined by their relations to 
other symbols in the knowledge base and that the probability and confidence 
change with experience. But in practice you are still going to write statements 
like this because it is the easiest way to build the knowledge base. You aren't 
going to specify the brightness of millions of pixels in a vision system in 
Narsese, and there is no mechanism I am aware of to collect this knowledge from 
a natural language text corpus. There is no mechanism to add new symbols to the 
knowledge base through experience. You have to explicitly add them.

 You have made this point on CPU power several
 times, and I'm still
 not convinced that the bottleneck of AI is hardware
 capacity. Also,
 there is no reason to believe an AGI must be designed in a
 biologically plausible way.

Natural language has evolved to be learnable on a massively parallel network of 
slow computing elements. This should be apparent when we compare successful 
language models with unsuccessful ones. Artificial language models usually 
consist of tokenization, parsing, and semantic analysis phases. This does not 
work on natural language because artificial languages have precise 
specifications and natural languages do not. No two humans use exactly the same 
language, nor does the same human at two points in time. Rather, language is 
learnable by example, so that each message causes the language of the receiver 
to be a little more like that of the sender.

Children learn semantics before syntax, which is the opposite order from which 
you would write an artificial language interpreter. An example of a successful 
language model is a search engine. We know that most of the meaning of a text 
document depends only on the words it contains, ignoring word order. A search 
engine matches the semantics of the query with the semantics of a document 
mostly by matching words, but also by matching semantically related words like 
water to wet.

Here is an example of a computationally intensive but biologically plausible 
language model. A semantic model is a word-word matrix A such that A_ij is the 
degree to which words i and j are related, which you can think of as the 
probability of finding i and j together in a sliding window over a huge text 
corpus. However, semantic relatedness is a fuzzy identity relation, meaning it 
is reflexive, commutative, and transitive. If i is related to j and j to k, 
then i is related to k. Deriving transitive relations in A, also known as 
latent semantic analysis, is performed by singular value decomposition, 
factoring A = USV where S is diagonal, then discarding the small terms of S, 
which has the effect of lossy compression. Typically, A has about 10^6 elements 
and we keep only a few hundred elements of S. Fortunately there is a parallel 
algorithm that incrementally updates the matrices as the system learns: a 3 
layer neural network where S is the hidden layer
 (which can grow) and U and V are weight matrices. [1].

Traditional language processing has failed because the task of converting 
natural language statements like ravens are birds to formal language is 
itself an AI problem. It requires humans who have already learned what ravens 
are and how to form and recognize grammatically correct sentences so they 
understand all of the hundreds of ways to express the same statement. You have 
to have human level understand of the logic to realize that ravens are coming 
doesn't mean ravens - coming. If you solve the translation problem, then you 
must have already solved the natural language problem. You can't take a 
shortcut directly to the knowledge base, tempting as it might be. You have to 
learn the language first, going through all the childhood stages. I would have 
hoped we have learned a lesson from Cyc.

1. Gorrell, Genevieve (2006), Generalized Hebbian Algorithm for Incremental 
Singular Value Decomposition in Natural Language Processing, Proceedings of 
EACL 2006, Trento, Italy.
http://www.aclweb.org/anthology-new/E/E06/E06-1013.pdf

-- Matt Mahoney, [EMAIL PROTECTED]




---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: Language modeling (was Re: [agi] draft for comment)

2008-09-05 Thread Pei Wang
On Fri, Sep 5, 2008 at 11:15 AM, Matt Mahoney [EMAIL PROTECTED] wrote:
 --- On Thu, 9/4/08, Pei Wang [EMAIL PROTECTED] wrote:

 I guess you still see NARS as using model-theoretic
 semantics, so you
 call it symbolic and contrast it with system
 with sensors. This is
 not correct --- see
 http://nars.wang.googlepages.com/wang.semantics.pdf and
 http://nars.wang.googlepages.com/wang.AI_Misconceptions.pdf

 I mean NARS is symbolic in the sense that you write statements in Narsese 
 like raven - bird 0.97, 0.92 (probability=0.97, confidence=0.92). I 
 realize that the meanings of raven and bird are determined by their 
 relations to other symbols in the knowledge base and that the probability and 
 confidence change with experience. But in practice you are still going to 
 write statements like this because it is the easiest way to build the 
 knowledge base.

Yes.

 You aren't going to specify the brightness of millions of pixels in a vision 
 system in Narsese, and there is no mechanism I am aware of to collect this 
 knowledge from a natural language text corpus.

Of course not. To have visual experience, there must be a devise to
convert visual signals into internal representation in Narsese. I
never suggested otherwise.

 There is no mechanism to add new symbols to the knowledge base through 
 experience. You have to explicitly add them.

New symbols either come from the outside in experience (experience
can be verbal), or composed by the concept-formation rules from
existing ones. The latter case is explained in my book.

 Natural language has evolved to be learnable on a massively parallel network 
 of slow computing elements. This should be apparent when we compare 
 successful language models with unsuccessful ones. Artificial language models 
 usually consist of tokenization, parsing, and semantic analysis phases. This 
 does not work on natural language because artificial languages have precise 
 specifications and natural languages do not.

It depends on which aspect of the language you talk about. Narsese has
precise specifications in syntax, but the meaning of the terms is a
function of experience, and change from time to time.

 No two humans use exactly the same language, nor does the same human at two 
 points in time. Rather, language is learnable by example, so that each 
 message causes the language of the receiver to be a little more like that of 
 the sender.

Same thing in NARS --- if two implementations of NARS have different
experience, they will disagree on what is the meaning of a term. When
they begin to learn natural language, it will also be true for
grammar. Since I haven't done any concrete NLP yet, I don't expect you
to believe me on the second point, but you cannot rule out that
possibility just because no traditional system can do that.

 Children learn semantics before syntax, which is the opposite order from 
 which you would write an artificial language interpreter.

NARS indeed can learn semantics before syntax --- see
http://nars.wang.googlepages.com/wang.roadmap.pdf

I won't comment on the following detailed statements, since I agree
with your criticism on the traditional processing of formal language,
but that is not how NARS handles languages. Don't think NARS as
another Cyc just because both use formal language. The same ravens
are birds in these two systems are treated very differently in them.

Pei


 An example of a successful language model is a search engine. We know that 
 most of the meaning of a text document depends only on the words it contains, 
 ignoring word order. A search engine matches the semantics of the query with 
 the semantics of a document mostly by matching words, but also by matching 
 semantically related words like water to wet.

 Here is an example of a computationally intensive but biologically plausible 
 language model. A semantic model is a word-word matrix A such that A_ij is 
 the degree to which words i and j are related, which you can think of as the 
 probability of finding i and j together in a sliding window over a huge text 
 corpus. However, semantic relatedness is a fuzzy identity relation, meaning 
 it is reflexive, commutative, and transitive. If i is related to j and j to 
 k, then i is related to k. Deriving transitive relations in A, also known as 
 latent semantic analysis, is performed by singular value decomposition, 
 factoring A = USV where S is diagonal, then discarding the small terms of S, 
 which has the effect of lossy compression. Typically, A has about 10^6 
 elements and we keep only a few hundred elements of S. Fortunately there is a 
 parallel algorithm that incrementally updates the matrices as the system 
 learns: a 3 layer neural network where S is the hidden layer
  (which can grow) and U and V are weight matrices. [1].

 Traditional language processing has failed because the task of converting 
 natural language statements like ravens are birds to formal language is 
 itself an AI problem. It 

Re: Real vs. simulated environments (was Re: [agi] draft for comment.. P.S.)

2008-09-05 Thread Steve Richfield
Matt,

FINALLY, someone here is saying some of the same things that I have been
saying. With general agreement with your posting, I will make some
comments...

On 9/4/08, Matt Mahoney [EMAIL PROTECTED] wrote:

 --- On Thu, 9/4/08, Valentina Poletti [EMAIL PROTECTED] wrote:
 Ppl like Ben argue that the concept/engineering aspect of intelligence is
 independent of the type of environment. That is, given you understand how
 to make it in a virtual environment you can then tarnspose that concept
 into a real environment more safely.


This is probably a good starting point, to avoid beating the world up during
the debugging process.


 Some other ppl on the other hand believe intelligence is a property of
 humans only.


Only people who haven't had a pet believe such things. I have seen too many
animals find clever solutions to problems.

So you have to simulate every detail about humans to get
 that intelligence. I'd say that among the two approaches the first one
 (Ben's) is safer and more realistic.

 The issue is not what is intelligence, but what do you want to create? In
 order for machines to do more work for us, they may need language and
 vision, which we associate with human intelligence.


Not necessarily, as even text-interfaced knowledge engines can handily
outperform humans in many complex problem solving tasks. The still open
question is: What would best do what we need done but can NOT presently do
(given computers, machinery, etc.). So far, the talk here on this forum
has been about what we could do and how we might do it, rather than about
what we NEED done.

Right now, we NEED resources to work productively in the directions that we
have been discussing, yet the combined intelligence of those here on this
forum is apparently unable to solve even this seemingly trivial problem.
Perhaps something more than raw intelligence is needed?

But building artificial humans is not necessarily useful. We already know
 how to create humans, and we are doing so at an unsustainable rate.

 I suggest that instead of the imitation game (Turing test) for AI, we
 should use a preference test. If you prefer to talk to a machine vs. a
 human, then the machine passes the test.


YES, like what is it that our AGI can do that we need done but can NOT
presently do?

Prediction is central to intelligence. If you can predict a text stream,
 then for any question Q and any answer A, you can compute the probability
 distribution P(A|Q) = P(QA)/P(Q). This passes the Turing test. More
 importantly, it allows you to output max_A P(QA), the most likely answer
 from a group of humans. This passes the preference test because a group is
 usually more accurate than any individual member. (It may fail a Turing test
 for giving too few wrong answers, a problem Turing was aware of in 1950 when
 he gave an example of a computer incorrectly answering an arithmetic
 problem).


Unfortunately, this also tests the ability to incorporate the very
misunderstandings that presently limit our thinking. We need to give credit
for compression algorithms that cleans up our grammar, corrects our
technical errors, etc., as these can probably be done in the process of
better compressing the text.

Text compression is equivalent to AI because we have already solved the
 coding problem. Given P(x) for string x, we know how to optimally and
 efficiently code x in log_2(1/P(x)) bits (e.g. arithmetic coding). Text
 compression has an advantage over the Turing or preference tests in that
 that incremental progress in modeling can be measured precisely and the test
 is repeatable and verifiable.

 If I want to test a text compressor, it is important to use real data
 (human generated text) rather than simulated data, i.e. text generated by a
 program. Otherwise, I know there is a concise code for the input data, which
 is the program that generated it. When you don't understand the source
 distribution (i.e. the human brain), the problem is much harder, and you
 have a legitimate test.


Wouldn't it be better to understand the problem domain while ignoring human
(mis)understandings? After all, if humans need an AGI to work in a difficult
domain, it is probably made more difficult by incorporating human
misunderstandings.

Of course, humans state human problems, so it is important to be able to
semantically communicate, but also useful to separate the communications
from the problems.

I understand that Ben is developing AI for virtual worlds. This might
 produce interesting results, but I wouldn't call it AGI. The value of AGI is
 on the order of US $1 quadrillion. It is a global economic system running on
 a smarter internet. I believe that any attempt to develop AGI on a budget of
 $1 million or $1 billion or $1 trillion is just wishful thinking.


I think that a billion or so, divided up into small pieces to fund EVERY
disparate approach to see where the low hanging fruit is, would go a LONG
way in guiding subsequent billions. I doubt that it would 

Re: Language modeling (was Re: [agi] draft for comment)

2008-09-05 Thread Matt Mahoney
--- On Fri, 9/5/08, Pei Wang [EMAIL PROTECTED] wrote:

 NARS indeed can learn semantics before syntax --- see
 http://nars.wang.googlepages.com/wang.roadmap.pdf

Yes, I see this corrects many of the problems with Cyc and with traditional 
language models. I didn't see a description of a mechanism for learning new 
terms in your other paper. Clearly this could be added, although I believe it 
should be a statistical process.

I am interested in determining the computational cost of language modeling. The 
evidence I have so far is that it is high. I believe the algorithmic complexity 
of a model is 10^9 bits. This is consistent with Turing's 1950 prediction that 
AI would require this much memory, with Landauer's estimate of human long term 
memory, and is about how much language a person processes by adulthood assuming 
an information content of 1 bit per character as Shannon estimated in 1950. 
This is why I use a 1 GB data set in my compression benchmark.

However there is a 3 way tradeoff between CPU speed, memory, and model accuracy 
(as measured by compression ratio). I added two graphs to my benchmark at 
http://cs.fit.edu/~mmahoney/compression/text.html (below the main table) which 
shows this clearly. In particular the size-memory tradeoff is an almost 
perfectly straight line (with memory on a log scale) over tests of 104 
compressors. These tests suggest to me that CPU and memory are indeed 
bottlenecks to language modeling. The best models in my tests use simple 
semantic and grammatical models, well below adult human level. The 3 top 
programs on the memory graph map words to tokens using dictionaries that group 
semantically and syntactically related words together, but only one 
(paq8hp12any) uses a semantic space of more than one dimension. All have large 
vocabularies, although not implausibly large for an educated person. Other top 
programs like nanozipltcb and WinRK use smaller dictionaries and
 strictly lexical models. Lesser programs model only at the n-gram level.

I don't yet have an answer to my question, but I believe efficient human-level 
NLP will require hundreds of GB or perhaps 1 TB of memory. The slowest programs 
are already faster than real time, given that equivalent learning in humans 
would take over a decade. I think you could use existing hardware in a 
speed-memory tradeoff to get real time NLP, but it would not be practical for 
doing experiments where each source code change requires training the model 
from scratch. Model development typically requires thousands of tests.


-- Matt Mahoney, [EMAIL PROTECTED]



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: Language modeling (was Re: [agi] draft for comment)

2008-09-05 Thread Pei Wang
On Fri, Sep 5, 2008 at 6:15 PM, Matt Mahoney [EMAIL PROTECTED] wrote:
 --- On Fri, 9/5/08, Pei Wang [EMAIL PROTECTED] wrote:

 NARS indeed can learn semantics before syntax --- see
 http://nars.wang.googlepages.com/wang.roadmap.pdf

 Yes, I see this corrects many of the problems with Cyc and with traditional 
 language models. I didn't see a description of a mechanism for learning new 
 terms in your other paper. Clearly this could be added, although I believe it 
 should be a statistical process.

I don't have a separate paper on term composition, so you'd have to
read my book. It is indeed a statistical process, in the sense that
most of the composed terms won't be useful, so will be forgot
gradually. Only the useful patterns will be kept for long time in
the form of compound terms.

 I am interested in determining the computational cost of language modeling. 
 The evidence I have so far is that it is high. I believe the algorithmic 
 complexity of a model is 10^9 bits. This is consistent with Turing's 1950 
 prediction that AI would require this much memory, with Landauer's estimate 
 of human long term memory, and is about how much language a person processes 
 by adulthood assuming an information content of 1 bit per character as 
 Shannon estimated in 1950. This is why I use a 1 GB data set in my 
 compression benchmark.

I see your point, though I think to analyze this problem in terms of
computational complexity is not the correct way to go, because this
process does not follow a predetermined algorithm. Instead, language
learning is an incremental process, without a well-defined beginning
and ending.

 However there is a 3 way tradeoff between CPU speed, memory, and model 
 accuracy (as measured by compression ratio). I added two graphs to my 
 benchmark at http://cs.fit.edu/~mmahoney/compression/text.html (below the 
 main table) which shows this clearly. In particular the size-memory tradeoff 
 is an almost perfectly straight line (with memory on a log scale) over tests 
 of 104 compressors. These tests suggest to me that CPU and memory are indeed 
 bottlenecks to language modeling. The best models in my tests use simple 
 semantic and grammatical models, well below adult human level. The 3 top 
 programs on the memory graph map words to tokens using dictionaries that 
 group semantically and syntactically related words together, but only one 
 (paq8hp12any) uses a semantic space of more than one dimension. All have 
 large vocabularies, although not implausibly large for an educated person. 
 Other top programs like nanozipltcb and WinRK use smaller dictionaries and
  strictly lexical models. Lesser programs model only at the n-gram level.

Like to many existing AI works, my disagreement with you is not that
much on the solution you proposed (I can see the value), but on the
problem you specified as the goal of AI. For example, I have no doubt
about the theoretical and practical values of compression, but don't
think it has much to do with intelligence. I don't think this kind of
issue can be efficient handled by email discussion like this one. I've
been thinking about to write a paper to compare my ideas with the
ideas represented by AIXI, which is closely related to yours, though
this project hasn't got enough priority in my to-do list. Hopefully
I'll find the time to make myself clear on this topic.

 I don't yet have an answer to my question, but I believe efficient 
 human-level NLP will require hundreds of GB or perhaps 1 TB of memory. The 
 slowest programs are already faster than real time, given that equivalent 
 learning in humans would take over a decade. I think you could use existing 
 hardware in a speed-memory tradeoff to get real time NLP, but it would not be 
 practical for doing experiments where each source code change requires 
 training the model from scratch. Model development typically requires 
 thousands of tests.

I guess we are exploring very different paths in NLP, and now it is
too early to tell which one will do better.

Pei


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: Language modeling (was Re: [agi] draft for comment)

2008-09-05 Thread Matt Mahoney
--- On Fri, 9/5/08, Pei Wang [EMAIL PROTECTED] wrote:

 Like to many existing AI works, my disagreement with you is
 not that
 much on the solution you proposed (I can see the value),
 but on the
 problem you specified as the goal of AI. For example, I
 have no doubt
 about the theoretical and practical values of compression,
 but don't
 think it has much to do with intelligence.

In http://cs.fit.edu/~mmahoney/compression/rationale.html I explain why text 
compression is an AI problem. To summarize, if you know the probability 
distribution of text, then you can compute P(A|Q) for any question Q and answer 
A to pass the Turing test. Compression allows you to precisely measure the 
accuracy of your estimate of P. Compression (actually, word perplexity) has 
been used since the early 1990's to measure the quality of language models for 
speech recognition, since it correlates well with word error rate.

The purpose of this work is not to solve general intelligence, such as the 
universal intelligence proposed by Legg and Hutter [1]. That is not computable, 
so you have to make some arbitrary choice with regard to test environments 
about what problems you are going to solve. I believe the goal of AGI should be 
to do useful work for humans, so I am making a not so arbitrary choice to solve 
a problem that is central to what most people regard as useful intelligence.

I had hoped that my work would lead to an elegant theory of AI, but that hasn't 
been the case. Rather, the best compression programs were developed as a series 
of thousands of hacks and tweaks, e.g. change a 4 to a 5 because it gives 
0.002% better compression on the benchmark. The result is an opaque mess. I 
guess I should have seen it coming, since it is predicted by information theory 
(e.g. [2]).

Nevertheless the architectures of the best text compressors are consistent with 
cognitive development models, i.e. phoneme (or letter) sequences - lexical - 
semantics - syntax, which are themselves consistent with layered neural 
architectures. I already described a neural semantic model in my last post. I 
also did work supporting Hutchens and Alder showing that lexical models can be 
learned from n-gram statistics, consistent with the observation that babies 
learn the rules for segmenting continuous speech before they learn any words 
[3].

I agree it should also be clear that semantics is learned before grammar, 
contrary to the way artificial languages are processed. Grammar requires 
semantics, but not the other way around. Search engines work using semantics 
only. Yet we cannot parse sentences like I ate pizza with Bob, I ate pizza 
with pepperoni, I ate pizza with chopsticks, without semantics.

My benchmark does not prove that there aren't better language models, but it is 
strong evidence. It represents the work of about 100 researchers who have tried 
and failed to find more accurate, faster, or less memory intensive models. The 
resource requirements seem to increase as we go up the chain from n-grams to 
grammar, contrary to symbolic approaches. This is my argument why I think AI is 
bound by lack of hardware, not lack of theory.

1. Legg, Shane, and Marcus Hutter (2006), A Formal Measure of Machine 
Intelligence, Proc. Annual machine learning conference of Belgium and The 
Netherlands (Benelearn-2006). Ghent, 2006.  
http://www.vetta.org/documents/ui_benelearn.pdf

2. Legg, Shane, (2006), Is There an Elegant Universal Theory of Prediction?,  
Technical Report IDSIA-12-06, IDSIA / USI-SUPSI, Dalle Molle Institute for 
Artificial Intelligence, Galleria 2, 6928 Manno, Switzerland.
http://www.vetta.org/documents/IDSIA-12-06-1.pdf

3. M. Mahoney (2000), A Note on Lexical Acquisition in Text without Spaces, 
http://cs.fit.edu/~mmahoney/dissertation/lex1.html


-- Matt Mahoney, [EMAIL PROTECTED]



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


AI isn't cheap (was Re: Real vs. simulated environments (was Re: [agi] draft for comment.. P.S.))

2008-09-05 Thread Matt Mahoney
--- On Fri, 9/5/08, Steve Richfield [EMAIL PROTECTED] wrote:
I think that a billion or so, divided up into small pieces to fund EVERY
disparate approach to see where the low hanging fruit is, would go a
LONG way in guiding subsequent billions. I doubt that it would take a
trillion to succeed.

Sorry, the low hanging fruit was all picked by the early 1960's. By then we had 
neural networks [1,6,7,11,12], natural language processing and language 
translation [2], models of human decision making [3], automatic theorem proving 
[4,8,10], natural language databases [5], game playing programs [9,13], optical 
character recognition [14], handwriting and speech recognition [15], and 
important theoretical work [16,17,18]. Since then we have had mostly just 
incremental improvements.

Big companies like Google and Microsoft have strong incentives to develop AI 
and have billions to spend. Maybe the problem really is hard.

References

1. Ashby, W. Ross (1960), Design for a Brain, 2’nd Ed., London: Wiley. 
Describes a 4 neuron electromechanical neural network.

2. Borko, Harold (1967), Automated Language Processing, The State of the Art, 
New York: Wiley.  Cites 72 NLP systems prior to 1965, and the 1959-61 U.S. 
government Russian-English translation project.

3. Feldman, Julian (1961), Simulation of Behavior in the Binary Choice 
Experiment, Proceedings of the Western Joint Computer Conference 19:133-144

4. Gelernter, H. (1959), Realization of a Geometry-Theorem Proving Machine, 
Proceedings of an International Conference on Information Processing, Paris: 
UNESCO House, pp. 273-282.

5. Green, Bert F. Jr., Alice K. Wolf, Carol Chomsky, and Kenneth Laughery 
(1961), Baseball: An Automatic Question Answerer, Proceedings of the Western 
Joint Computer Conference, 19:219-224.

6. Hebb, D. O. (1949), The Organization of Behavior, New York: Wiley.  Proposed 
the first model of learning in neurons: when two neurons fire simultaneously, 
the synapse between them becomes stimulating.

7. McCulloch, Warren S., and Walter Pitts (1943), A logical calculus of the 
ideas immanent in nervous activity, Buletin of Mathematical Biophysics (5) pp. 
115-133.

8. Newell, Allen, J. C. Shaw, H. A. Simon (1957), Empirical Explorations with 
the Logic Theory Machine: A Case Study in Heuristics, Proceedings of the 
Western Joint Computer Conference, 15:218-239.

9. Newell, Allen, J. C. Shaw, and H. A. Simon (1958), Chess-Playing Programs 
and the Problem of Complexity, IBM Journal of Research and Development, 
2:320-335.

10. Newell, Allen, H. A. Simon (1961), GPS: A Program that Simulates Human 
Thought, Lernende Automaten, Munich: R. Oldenbourg KG.

11. Rochester, N., J. J. Holland, L. H. Haibt, and Wl L. Duda (1956), Tests on 
a cell assembly theory of the action of the brain, using a large digital 
computer, IRE Transactions on Information Theory IT-2: pp. 80-93. 

12. Rosenblatt, F. (1958), The perceptron: a probabilistic model for 
information storage and organization in the brain, Psychological Review (65) 
pp. 386-408.

13. Samuel, A. L. (1959), Some Studies in Machine Learning using the Game of 
Checkers, IBM Journal of Research and Development, 3:211-229.

14. Selfridge, Oliver G., Ulric Neisser (1960), Pattern Recognition by 
Machine, Scientific American, Aug., 203:60-68.

15. Uhr, Leonard, Charles Vossler (1963) A Pattern-Recognition Program that 
Generates, Evaluates, and Adjusts its own Operators, Computers and Thought, E. 
A. Feigenbaum and J. Feldman eds, New York: McGraw Hill, pp. 251-268.

16. Turing, A. M., (1950) Computing Machinery and Intelligence, Mind, 
59:433-460.

17. Shannon, Claude, and Warren Weaver (1949), The Mathematical Theory of 
Communication, Urbana: University of Illinois Press. 

18. Minsky, Marvin (1961), Steps toward Artificial Intelligence, Proceedings 
of the Institute of Radio Engineers, 49:8-30. 


-- Matt Mahoney, [EMAIL PROTECTED]



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: Language modeling (was Re: [agi] draft for comment)

2008-09-05 Thread Pei Wang
Matt,

Thanks for taking the time to explain your ideas in detail. As I said,
our different opinions on how to do AI come from our very different
understanding of intelligence. I don't take passing Turing Test as
my research goal (as explained in
http://nars.wang.googlepages.com/wang.logic_intelligence.pdf and
http://nars.wang.googlepages.com/wang.AI_Definitions.pdf).  I disagree
with Hutter's approach, not because his SOLUTION is not computable,
but because his PROBLEM is too idealized and simplified to be relevant
to the actual problems of AI.

Even so, I'm glad that we can still agree on somethings, like
semantics comes before syntax. In my plan for NLP, there won't be
separate 'parsing' and 'semantic mapping' stages. I'll say more when I
have concrete results to share.

Pei

On Fri, Sep 5, 2008 at 8:39 PM, Matt Mahoney [EMAIL PROTECTED] wrote:
 --- On Fri, 9/5/08, Pei Wang [EMAIL PROTECTED] wrote:

 Like to many existing AI works, my disagreement with you is
 not that
 much on the solution you proposed (I can see the value),
 but on the
 problem you specified as the goal of AI. For example, I
 have no doubt
 about the theoretical and practical values of compression,
 but don't
 think it has much to do with intelligence.

 In http://cs.fit.edu/~mmahoney/compression/rationale.html I explain why text 
 compression is an AI problem. To summarize, if you know the probability 
 distribution of text, then you can compute P(A|Q) for any question Q and 
 answer A to pass the Turing test. Compression allows you to precisely measure 
 the accuracy of your estimate of P. Compression (actually, word perplexity) 
 has been used since the early 1990's to measure the quality of language 
 models for speech recognition, since it correlates well with word error rate.

 The purpose of this work is not to solve general intelligence, such as the 
 universal intelligence proposed by Legg and Hutter [1]. That is not 
 computable, so you have to make some arbitrary choice with regard to test 
 environments about what problems you are going to solve. I believe the goal 
 of AGI should be to do useful work for humans, so I am making a not so 
 arbitrary choice to solve a problem that is central to what most people 
 regard as useful intelligence.

 I had hoped that my work would lead to an elegant theory of AI, but that 
 hasn't been the case. Rather, the best compression programs were developed as 
 a series of thousands of hacks and tweaks, e.g. change a 4 to a 5 because it 
 gives 0.002% better compression on the benchmark. The result is an opaque 
 mess. I guess I should have seen it coming, since it is predicted by 
 information theory (e.g. [2]).

 Nevertheless the architectures of the best text compressors are consistent 
 with cognitive development models, i.e. phoneme (or letter) sequences - 
 lexical - semantics - syntax, which are themselves consistent with layered 
 neural architectures. I already described a neural semantic model in my last 
 post. I also did work supporting Hutchens and Alder showing that lexical 
 models can be learned from n-gram statistics, consistent with the observation 
 that babies learn the rules for segmenting continuous speech before they 
 learn any words [3].

 I agree it should also be clear that semantics is learned before grammar, 
 contrary to the way artificial languages are processed. Grammar requires 
 semantics, but not the other way around. Search engines work using semantics 
 only. Yet we cannot parse sentences like I ate pizza with Bob, I ate pizza 
 with pepperoni, I ate pizza with chopsticks, without semantics.

 My benchmark does not prove that there aren't better language models, but it 
 is strong evidence. It represents the work of about 100 researchers who have 
 tried and failed to find more accurate, faster, or less memory intensive 
 models. The resource requirements seem to increase as we go up the chain from 
 n-grams to grammar, contrary to symbolic approaches. This is my argument why 
 I think AI is bound by lack of hardware, not lack of theory.

 1. Legg, Shane, and Marcus Hutter (2006), A Formal Measure of Machine 
 Intelligence, Proc. Annual machine learning conference of Belgium and The 
 Netherlands (Benelearn-2006). Ghent, 2006.  
 http://www.vetta.org/documents/ui_benelearn.pdf

 2. Legg, Shane, (2006), Is There an Elegant Universal Theory of Prediction?,  
 Technical Report IDSIA-12-06, IDSIA / USI-SUPSI, Dalle Molle Institute for 
 Artificial Intelligence, Galleria 2, 6928 Manno, Switzerland.
 http://www.vetta.org/documents/IDSIA-12-06-1.pdf

 3. M. Mahoney (2000), A Note on Lexical Acquisition in Text without Spaces, 
 http://cs.fit.edu/~mmahoney/dissertation/lex1.html


 -- Matt Mahoney, [EMAIL PROTECTED]



 ---
 agi
 Archives: https://www.listbox.com/member/archive/303/=now
 RSS Feed: https://www.listbox.com/member/archive/rss/303/
 Modify Your Subscription: https://www.listbox.com/member/?;
 Powered by 

Re: [agi] draft for comment

2008-09-04 Thread Ben Goertzel
Hi,


  What I think is that the set of patterns in perceptual and motoric data
 has
  radically different statistical properties than the set of patterns in
  linguistic and mathematical data ... and that the properties of the set
 of
  patterns in perceptual and motoric data is intrinsically better suited to
  the needs of a young, ignorant, developing mind.

 Sure it is. Systems with different sensory channels will never fully
 understand each other. I'm not saying that one channel (verbal) can
 replace another (visual), but that both of them (and many others) can
 give symbol/representation/concept/pattern/whatever-you-call-it
 meaning. No on is more real than others.


True, but some channels may -- due to the statistical properties of the data
coming across them -- be more conducive to the development of AGI than
others...




  All these different domains of pattern display what I've called a dual
  network structure ... a collection of hierarchies (of progressively more
  and more complex, hierarchically nested patterns) overlayed with a
  heterarchy (of overlapping, interrelated patterns).  But the statistics
 of
  the dual networks in the different domains is different.  I haven't fully
  plumbed the difference yet ... but, among the many differences is that in
  perceptual/motoric domains, you have a very richly connected dual network
 at
  a very low level of the overall dual network hierarchy -- i.e., there's a
  richly connected web of relatively simple stuff to understand ... and
 then
  these simple things are related to (hence useful for learning) the more
  complex things, etc.

 True, but can you say that the relations among words, or concepts, are
 simpler?



I think the set of relations among words (considered in isolation, without
their referents) is less rich than the set of relations among perceptions
of a complex world, and far less rich than the set of relations among
{perceptions of a complex world, plus words referring to these
perceptions}

And I think that this lesser richness makes sequences of words a much worse
input stream for a developing AGI

I realize that quantifying less rich in the above is a significant
challenge, but I'm presenting my intuition anyway...

Also, relatedly and just as critically, the set of perceptions regarding the
body and its interactions with the environment, are well-structured to give
the mind a sense of its own self.  This primitive infantile sense of
body-self gives rise to the more sophisticated phenomenal self of the child
and adult mind, which gives rise to reflective consciousness, the feeling of
will, and other characteristic structures of humanlike general
intelligence.  A stream of words doesn't seem to give an AI the same kind of
opportunity for self-development




 In this short paper, I make no attempt to settle all issues, but just
 to point out a simple fact --- a laptop has a body, and is not less
 embodied than Roomba or Mindstorms --- that seems have been ignored in
 the previous discussion.


I agree with your point, but I wonder if it's partially a straw man
argument.  The proponents of embodiment as a key  aspect of AGI don't of
course think that Cyc is disembodied in a maximally strong sense -- they
know it interacts with the world via physical means.  What they mean by
embodied is something different.

I don't have the details at my finger tips, but I know that Maturana, Varela
and Eleanor Rosch took some serious pains to carefully specify the sense in
which they feel embodiment is critical to intelligence, and to distinguish
their sense of embodiment from the trivial sense of communicating via
physical signals.

I suggest your paper should probably include a careful response to the
characterization of embodiment presented in

http://www.*amazon*.com/*Embodied*-*Mind*
-Cognitive-Science-Experience/dp/0262720213

I note that I do not agree with the arguments of Varela, Rosch, Brooks,
etc.  I just think their characterization of embodiment is an interesting
and nontrivial one, and I'm not sure NARS with a text stream as input would
be embodied according to their definition...

-- Ben



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-04 Thread Ben Goertzel

 Also, relatedly and just as critically, the set of perceptions regarding
 the body and its interactions with the environment, are well-structured to
 give the mind a sense of its own self.  This primitive infantile sense of
 body-self gives rise to the more sophisticated phenomenal self of the child
 and adult mind, which gives rise to reflective consciousness, the feeling of
 will, and other characteristic structures of humanlike general
 intelligence.  A stream of words doesn't seem to give an AI the same kind of
 opportunity for self-development



To put it perhaps more clearly: I think that a standard laptop is too
lacking in

-- proprioceptive perception

-- perception of its own relationship to other entities in the world around
it

to form a physical self-image based on its perceptions ... hence a standard
laptop will not likely be driven by its experience to develop a phenomenal
self ... hence, I suspect, no generally intelligent mind...

-- Ben G



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment.. P.S.

2008-09-04 Thread Valentina Poletti
That's if you aim at getting an AGI that is intelligent in the real world. I
think some people on this list (incl Ben perhaps) might argue that for now -
for safety purposes but also due to costs - it might be better to build an
AGI that is intelligent in a simulated environment.

Ppl like Ben argue that the concept/engineering aspect of intelligence
is *independent
of the type of environment*. That is, given you understand how to make it in
a virtual environment you can then tarnspose that concept into a real
environment more safely.

Some other ppl on the other hand believe intelligence is a property of
humans only. So you have to simulate every detail about humans to get that
intelligence. I'd say that among the two approaches the first one (Ben's) is
safer and more realistic.

I am more concerned with the physics aspect of the whole issue I guess.



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-04 Thread Pei Wang
On Thu, Sep 4, 2008 at 2:10 AM, Ben Goertzel [EMAIL PROTECTED] wrote:

 Sure it is. Systems with different sensory channels will never fully
 understand each other. I'm not saying that one channel (verbal) can
 replace another (visual), but that both of them (and many others) can
 give symbol/representation/concept/pattern/whatever-you-call-it
 meaning. No on is more real than others.

 True, but some channels may -- due to the statistical properties of the data
 coming across them -- be more conducive to the development of AGI than
 others...

I haven't seen any evidence for that. For human intelligence, maybe,
but for intelligence in general, I doubt it.

 I think the set of relations among words (considered in isolation, without
 their referents) is less rich than the set of relations among perceptions
 of a complex world, and far less rich than the set of relations among
 {perceptions of a complex world, plus words referring to these
 perceptions}

Not necessarily. Actually some people may even make the opposite
argument: relations among non-linguistic components in experience are
basically temporal or spatial, while the relations among words and
concepts have much more types. I won't go that far, but I guess in
some sense all channels may have the same (potential) richness.

 And I think that this lesser richness makes sequences of words a much worse
 input stream for a developing AGI

 I realize that quantifying less rich in the above is a significant
 challenge, but I'm presenting my intuition anyway...

If your condition is true, then your conclusion follows, but the
problem is in that IF.

 Also, relatedly and just as critically, the set of perceptions regarding the
 body and its interactions with the environment, are well-structured to give
 the mind a sense of its own self.

We can say the same for every input/out operation set of an
intelligent system. SELF is defined by what the system can feel and
do.

 This primitive infantile sense of
 body-self gives rise to the more sophisticated phenomenal self of the child
 and adult mind, which gives rise to reflective consciousness, the feeling of
 will, and other characteristic structures of humanlike general
 intelligence.

Agree.

 A stream of words doesn't seem to give an AI the same kind of
 opportunity for self-development

If the system just sits there and passively accept whatever words come
into it, what you said is true. If the incoming words is causally
related to its outgoing words, will you still say that?

 I agree with your point, but I wonder if it's partially a straw man
 argument.

If you read Brooks or Pfeifer, you'll see that most of their arguments
are explicitly or implicitly based on the myth that only a robot has
a body, have real sensor, live in a real world, ...

 The proponents of embodiment as a key  aspect of AGI don't of
 course think that Cyc is disembodied in a maximally strong sense -- they
 know it interacts with the world via physical means.  What they mean by
 embodied is something different.

Whether a system is embodied does not depends on hardware, but on semantics.

 I don't have the details at my finger tips, but I know that Maturana, Varela
 and Eleanor Rosch took some serious pains to carefully specify the sense in
 which they feel embodiment is critical to intelligence, and to distinguish
 their sense of embodiment from the trivial sense of communicating via
 physical signals.

That is different. The embodiment school in CogSci doesn't focus on
body (they know every human already has one), but on experience.
However, they have their misconception about AI. As I mentioned,
Barsalou and Lakoff both thought strong AI is unlikely because
computer cannot have human experience --- I agree what they said
except their narrow conception of intelligence (CogSci people tend to
take intelligence as human intelligence).

 I suggest your paper should probably include a careful response to the
 characterization of embodiment presented in

 http://www.amazon.com/Embodied-Mind-Cognitive-Science-Experience/dp/0262720213

 I note that I do not agree with the arguments of Varela, Rosch, Brooks,
 etc.  I just think their characterization of embodiment is an interesting
 and nontrivial one, and I'm not sure NARS with a text stream as input would
 be embodied according to their definition...

If I got the time (and motivation) to extend the paper into a journal
paper, I'll double the length by discussing embodiment in CogSci. In
the current version, as a short conference paper, I'd rather focus on
embodiment in AI, and only attack the robot myth.

Pei


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-04 Thread Pei Wang
On Thu, Sep 4, 2008 at 2:12 AM, Ben Goertzel [EMAIL PROTECTED] wrote:

 Also, relatedly and just as critically, the set of perceptions regarding
 the body and its interactions with the environment, are well-structured to
 give the mind a sense of its own self.  This primitive infantile sense of
 body-self gives rise to the more sophisticated phenomenal self of the child
 and adult mind, which gives rise to reflective consciousness, the feeling of
 will, and other characteristic structures of humanlike general
 intelligence.  A stream of words doesn't seem to give an AI the same kind of
 opportunity for self-development

 To put it perhaps more clearly: I think that a standard laptop is too
 lacking in

 -- proprioceptive perception

 -- perception of its own relationship to other entities in the world around
 it

Obviously you didn't consider the potential a laptop has with its
network connection, which in theory can give it all kinds of
perception by connecting it to some input/output device.

Even if we exclude network, your conclusion is still problematic. Why
a touchpad cannot provide proprioceptive perception? I agree it
usually doesn't, because the way it is used, but that doesn't mean it
cannot, under all possible usage. The same is true for keyboard. The
current limitation of the standard computer is more in the way we use
them than in the hardware itself.

 to form a physical self-image based on its perceptions ... hence a standard
 laptop will not likely be driven by its experience to develop a phenomenal
 self ... hence, I suspect, no generally intelligent mind...

Of course it won't have a visual concept of self, but a system like
NARS has the potential to grow into an intelligent operating system,
with a notion of self based on what it can feel and do, as well as
the causal relations among them --- If there is a file in this
folder, then I should have felt it, it cannot be there because I've
deleted the contents.

I know some people won't agree there is a self in such a system,
because it doesn't look like themselves. Too bad human intelligence is
the only known example of intelligence ...

Pei


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-04 Thread Valentina Poletti
I agree with Pei in that a robot's experience is not necessarily more real
than that of a, say, web-embedded agent - if anything it is closer to the *
human* experience of the world. But who knows how limited our own sensory
experience is anyhow. Perhaps a better intelligence would comprehend the
world better through a different emboyment.

However, could you guys be more specific regarding the statistical
differences of different types of data? What kind of differences are you
talking about specifically (mathematically)? And what about the differences
at the various levels of the dual-hierarchy? Has any of your work or
research suggested this hypothesis, if so which?



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-04 Thread Ben Goertzel

 Obviously you didn't consider the potential a laptop has with its
 network connection, which in theory can give it all kinds of
 perception by connecting it to some input/output device.


yes, that's true ... I was considering the laptop w/ only a power cable as
the AI system in question.  Of course my point does not apply to a laptop
that's being used as an on-board control system for an android robot, or a
laptop that's connected to a network of sensors and actuators via the net,
etc.  Sorry I did not clarify my terms better!

Similarly the human brain lacks much proprioception and control in
isolation, and probably would not be able to achieve a high level of general
intelligence without the right peripherals (such as the rest of the human
body ;-)


Even if we exclude network, your conclusion is still problematic. Why
 a touchpad cannot provide proprioceptive perception? I agree it
 usually doesn't, because the way it is used, but that doesn't mean it
 cannot, under all possible usage. The same is true for keyboard. The
 current limitation of the standard computer is more in the way we use
 them than in the hardware itself.


I understand that a keyboard and touchpad do provide proprioceptive input,
but I think it's too feeble, and too insensitively respondent to changes in
the environment and the relation btw the laptop and the environment, to
serve as the foundation for a robust self-model or a powerful general
intelligence.




  to form a physical self-image based on its perceptions ... hence a
 standard
  laptop will not likely be driven by its experience to develop a
 phenomenal
  self ... hence, I suspect, no generally intelligent mind...

 Of course it won't have a visual concept of self, but a system like
 NARS has the potential to grow into an intelligent operating system,
 with a notion of self based on what it can feel and do, as well as
 the causal relations among them --- If there is a file in this
 folder, then I should have felt it, it cannot be there because I've
 deleted the contents.


My suggestion is that the file system lacks the complexity of structure and
dynamics to support the emergence of a robust self-model, and powerful
general intelligence...

Not in principle ... potentially a file system *could* display the needed
complexity, but I don't think any file systems on laptops now come close...

Whether the Internet as a whole contains the requisite complexity is a
subtler question.



 I know some people won't agree there is a self in such a system,
 because it doesn't look like themselves. Too bad human intelligence is
 the only known example of intelligence ...


I would call a self any internal, explicit model that a system creates
that allows it to predict its own behaviors in a sufficient variety of
contexts  This need not have a visual aspect nor a great similarity to a
human self.

-- Ben



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-04 Thread Ben Goertzel
Hi Pei,

I think your point is correct that the notion of embodiment presented by
Brooks and some other roboticists is naive.  I'm not sure whether their
actual conceptions are naive, or whether they just aren't presenting their
foundational philosophical ideas clearly in their writings (being ultimately
more engineering-oriented people, and probably not that accustomed to the
philosophical style of discourse in which these sorts of definitional
distinctions need to be more precisely drawn).  I do think (in approximate
concurrence with your paper) that ANY control system physically embodied in
a physical system S, that has an input and output stream, and whose input
and output stream possess correlation with the physical state of S, should
be considered as psychologically embodied.  Clearly, whether it's a robot
or a laptop (w/o network connection if you like), such a system has the
basic property of embodiment.  Furthermore S doesn't need to be a physical
system ... it could be a virtual system inside some virtual world (and
then there's the question of what properties characterize a valid virtual
world ... but let's leave that for another email thread...)

However, I think that not all psychologically-embodied systems possess a
sufficiently rich psychological-embodiment to lead to significantly general
intelligence  My suggestion is that a laptop w/o network connection or
odd sensor-peripherals, probably does not have sufficiently rich
correlations btw its I/O stream and its physical state, to allow it to
develop a robust self-model of its physical self (which can then be used as
a basis for a more general phenomenal self).

I think that Varela and crew understood the value of this rich network of
correlations, but mistakenly assumed it to be a unique property of
biological systems...

I realize that the points you made in your paper do not contradict the
suggestions I've made in this email.  I don't think anything significant in
your paper is wrong, actually.  It just seems to me not to address the most
interesting aspects of the embodiment issue as related to AGI.

-- Ben G

On Thu, Sep 4, 2008 at 7:06 AM, Pei Wang [EMAIL PROTECTED] wrote:

 On Thu, Sep 4, 2008 at 2:10 AM, Ben Goertzel [EMAIL PROTECTED] wrote:
 
  Sure it is. Systems with different sensory channels will never fully
  understand each other. I'm not saying that one channel (verbal) can
  replace another (visual), but that both of them (and many others) can
  give symbol/representation/concept/pattern/whatever-you-call-it
  meaning. No on is more real than others.
 
  True, but some channels may -- due to the statistical properties of the
 data
  coming across them -- be more conducive to the development of AGI than
  others...

 I haven't seen any evidence for that. For human intelligence, maybe,
 but for intelligence in general, I doubt it.

  I think the set of relations among words (considered in isolation,
 without
  their referents) is less rich than the set of relations among
 perceptions
  of a complex world, and far less rich than the set of relations among
  {perceptions of a complex world, plus words referring to these
  perceptions}

 Not necessarily. Actually some people may even make the opposite
 argument: relations among non-linguistic components in experience are
 basically temporal or spatial, while the relations among words and
 concepts have much more types. I won't go that far, but I guess in
 some sense all channels may have the same (potential) richness.

  And I think that this lesser richness makes sequences of words a much
 worse
  input stream for a developing AGI
 
  I realize that quantifying less rich in the above is a significant
  challenge, but I'm presenting my intuition anyway...

 If your condition is true, then your conclusion follows, but the
 problem is in that IF.

  Also, relatedly and just as critically, the set of perceptions regarding
 the
  body and its interactions with the environment, are well-structured to
 give
  the mind a sense of its own self.

 We can say the same for every input/out operation set of an
 intelligent system. SELF is defined by what the system can feel and
 do.

  This primitive infantile sense of
  body-self gives rise to the more sophisticated phenomenal self of the
 child
  and adult mind, which gives rise to reflective consciousness, the feeling
 of
  will, and other characteristic structures of humanlike general
  intelligence.

 Agree.

  A stream of words doesn't seem to give an AI the same kind of
  opportunity for self-development

 If the system just sits there and passively accept whatever words come
 into it, what you said is true. If the incoming words is causally
 related to its outgoing words, will you still say that?

  I agree with your point, but I wonder if it's partially a straw man
  argument.

 If you read Brooks or Pfeifer, you'll see that most of their arguments
 are explicitly or implicitly based on the myth that only a robot has
 a 

Re: [agi] draft for comment

2008-09-04 Thread Ben Goertzel

 However, could you guys be more specific regarding the statistical
 differences of different types of data? What kind of differences are you
 talking about specifically (mathematically)? And what about the differences
 at the various levels of the dual-hierarchy? Has any of your work or
 research suggested this hypothesis, if so which?



Sorry I've been fuzzy on this ... I'm engaging in this email conversation in
odd moments while at a conference (Virtual Worlds 2008, in Los Angeles...)

Specifically I think that patterns interrelating the I/O stream of system S
with the relation between the system S's embodiment and its environment, are
important.  It is these patterns that let S build a self-model of its
physical embodiment, which then leads S to a more abstract self-model (aka
Metzinger's phenomenal self)

Considering patterns in the above category, it seems critical to have a rich
variety of patterns at varying levels of complexity... so that the patterns
at complexity level L are largely approximable as compositions of patterns
at complexity less than L.  This way a mind can incrementally build up its
self-model via recognizing slightly complex self-related patterns, then
acting based on these patterns, then recognizing somewhat more complex
self-related patterns involving its recent actions, and so forth.

It seems that a human body's sensors and actuators are suited to create and
recognize patterns of the above sort whereas the sensors and actuators of a
laptop w/o network cables or odd peripherals are not...

-- Ben G



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-04 Thread Valentina Poletti
On 9/4/08, Ben Goertzel [EMAIL PROTECTED] wrote:



 However, could you guys be more specific regarding the statistical
 differences of different types of data? What kind of differences are you
 talking about specifically (mathematically)? And what about the differences
 at the various levels of the dual-hierarchy? Has any of your work or
 research suggested this hypothesis, if so which?



 Sorry I've been fuzzy on this ... I'm engaging in this email conversation
 in odd moments while at a conference (Virtual Worlds 2008, in Los
 Angeles...)

 Specifically I think that patterns interrelating the I/O stream of system S
 with the relation between the system S's embodiment and its environment, are
 important.  It is these patterns that let S build a self-model of its
 physical embodiment, which then leads S to a more abstract self-model (aka
 Metzinger's phenomenal self)

 So in short you are saying that the main difference between I/O data by
a motor embodyed system (such as robot or human) and a laptop is the ability
to interact with the data: make changes in its environment to systematically
change the input?

  Considering patterns in the above category, it seems critical to have a
 rich variety of patterns at varying levels of complexity... so that the
 patterns at complexity level L are largely approximable as compositions of
 patterns at complexity less than L.  This way a mind can incrementally build
 up its self-model via recognizing slightly complex self-related patterns,
 then acting based on these patterns, then recognizing somewhat more complex
 self-related patterns involving its recent actions, and so forth.


Definitely.

  It seems that a human body's sensors and actuators are suited to create
 and recognize patterns of the above sort whereas the sensors and actuators
 of a

 laptop w/o network cables or odd peripherals are not...


Agree.



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-04 Thread Ben Goertzel
 So in short you are saying that the main difference between I/O data by
 a motor embodyed system (such as robot or human) and a laptop is the ability
 to interact with the data: make changes in its environment to systematically
 change the input?


Not quite ... but, to interact w/ the data in a way that gives rise to a
hierarchy of nested, progressively more complex patterns that correlate the
system and its environment (and that the system can recognize and act upon)

ben



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-04 Thread Terren Suydam

Hi Ben,

You may have stated this explicitly in the past, but I just want to clarify - 
you seem to be suggesting that a phenomenological self is important if not 
critical to the actualization of general intelligence. Is this your belief, and 
if so, can you provide a brief justification of that?  (I happen to believe 
this myself.. just trying to understand your philosophy better.)

Terren

--- On Thu, 9/4/08, Ben Goertzel [EMAIL PROTECTED] wrote:
However, I think that not all psychologically-embodied systems possess a 
sufficiently rich psychological-embodiment to lead to significantly general 
intelligence  My suggestion is that a laptop w/o network connection or odd 
sensor-peripherals, probably does not have sufficiently rich correlations btw 
its I/O stream and its physical state, to allow it to develop a robust 
self-model of its physical self (which can then be used as a basis for a more 
general phenomenal self).  






  


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Real vs. simulated environments (was Re: [agi] draft for comment.. P.S.)

2008-09-04 Thread Matt Mahoney
--- On Thu, 9/4/08, Valentina Poletti [EMAIL PROTECTED] wrote:
Ppl like Ben argue that the concept/engineering aspect of intelligence is
independent of the type of environment. That is, given you understand how
to make it in a virtual environment you can then tarnspose that concept
into a real environment more safely.

Some other ppl on the other hand believe intelligence is a property of
humans only. So you have to simulate every detail about humans to get
that intelligence. I'd say that among the two approaches the first one
(Ben's) is safer and more realistic.

The issue is not what is intelligence, but what do you want to create? In order 
for machines to do more work for us, they may need language and vision, which 
we associate with human intelligence. But building artificial humans is not 
necessarily useful. We already know how to create humans, and we are doing so 
at an unsustainable rate.

I suggest that instead of the imitation game (Turing test) for AI, we should 
use a preference test. If you prefer to talk to a machine vs. a human, then the 
machine passes the test.

Prediction is central to intelligence. If you can predict a text stream, then 
for any question Q and any answer A, you can compute the probability 
distribution P(A|Q) = P(QA)/P(Q). This passes the Turing test. More 
importantly, it allows you to output max_A P(QA), the most likely answer from a 
group of humans. This passes the preference test because a group is usually 
more accurate than any individual member. (It may fail a Turing test for giving 
too few wrong answers, a problem Turing was aware of in 1950 when he gave an 
example of a computer incorrectly answering an arithmetic problem).

Text compression is equivalent to AI because we have already solved the coding 
problem. Given P(x) for string x, we know how to optimally and efficiently code 
x in log_2(1/P(x)) bits (e.g. arithmetic coding). Text compression has an 
advantage over the Turing or preference tests in that that incremental progress 
in modeling can be measured precisely and the test is repeatable and verifiable.

If I want to test a text compressor, it is important to use real data (human 
generated text) rather than simulated data, i.e. text generated by a program. 
Otherwise, I know there is a concise code for the input data, which is the 
program that generated it. When you don't understand the source distribution 
(i.e. the human brain), the problem is much harder, and you have a legitimate 
test.

I understand that Ben is developing AI for virtual worlds. This might produce 
interesting results, but I wouldn't call it AGI. The value of AGI is on the 
order of US $1 quadrillion. It is a global economic system running on a smarter 
internet. I believe that any attempt to develop AGI on a budget of $1 million 
or $1 billion or $1 trillion is just wishful thinking.

-- Matt Mahoney, [EMAIL PROTECTED]




---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-04 Thread Matt Mahoney
--- On Wed, 9/3/08, Pei Wang [EMAIL PROTECTED] wrote:

 TITLE: Embodiment: Who does not have a body?
 
 AUTHOR: Pei Wang
 
 ABSTRACT: In the context of AI, ``embodiment''
 should not be
 interpreted as ``giving the system a body'', but as
 ``adapting to the
 system's experience''. Therefore, being a robot
 is neither a
 sufficient condition nor a necessary condition of being
 embodied. What
 really matters is the assumption about the environment for
 which the
 system is designed.
 
 URL: http://nars.wang.googlepages.com/wang.embodiment.pdf

The paper seems to argue that embodiment applies to any system with inputs and 
outputs, and therefore all AI systems are embodied. However, there are 
important differences between symbolic systems like NARS and systems with 
external sensors such as robots and humans. The latter are analog, e.g. the 
light intensity of a particular point in the visual field, or the position of a 
joint in an arm. In humans, there is a tremendous amount of data reduction from 
the senses, from 137 million rods and cones in each eye each firing up to 300 
pulses per second, down to 2 bits per second by the time our high level visual 
perceptions reach long term memory.

AI systems have traditionally avoided this type of processing because they 
lacked the necessary CPU power. IMHO this has resulted in biologically 
implausible symbolic language models with only a small number of connections 
between concepts, rather than the tens of thousands of connections per neuron.

Another aspect of embodiment (as the term is commonly used), is the false 
appearance of intelligence. We associate intelligence with humans, given that 
there are no other examples. So giving an AI a face or a robotic body modeled 
after a human can bias people to believe there is more intelligence than is 
actually present.


-- Matt Mahoney, [EMAIL PROTECTED]



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-04 Thread Pei Wang
On Thu, Sep 4, 2008 at 8:56 AM, Valentina Poletti [EMAIL PROTECTED] wrote:
 I agree with Pei in that a robot's experience is not necessarily more real
 than that of a, say, web-embedded agent - if anything it is closer to the
 human experience of the world. But who knows how limited our own sensory
 experience is anyhow. Perhaps a better intelligence would comprehend the
 world better through a different emboyment.

Exactly, the world to a system is always limited by the system's I/O
channels, and for systems with different I/O channels, their worlds
are different in many aspects, but no one is more real than the
others.

 However, could you guys be more specific regarding the statistical
 differences of different types of data? What kind of differences are you
 talking about specifically (mathematically)? And what about the differences
 at the various levels of the dual-hierarchy? Has any of your work or
 research suggested this hypothesis, if so which?

It is Ben who suggested the statistical differences and the
dual-hierarchy, while I'm still not convinced about their value.

My own constructive work on this topic can be found in
http://nars.wang.googlepages.com/wang.semantics.pdf

Pei


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-04 Thread Pei Wang
On Thu, Sep 4, 2008 at 9:35 AM, Ben Goertzel [EMAIL PROTECTED] wrote:

 I understand that a keyboard and touchpad do provide proprioceptive input,
 but I think it's too feeble, and too insensitively respondent to changes in
 the environment and the relation btw the laptop and the environment, to
 serve as the foundation for a robust self-model or a powerful general
 intelligence.

Compared to what? Of course the human sensors are much more
complicated, but many robot sensors are no better, so why they are
considered as real, while keyboard and touchpad are not?

Of course I'm not really arguing that keyboard and touchpad are all
we'll need for AGI (I plan to play with robots myself), but that there
is no fundamental difference between what we call 'robot' and what we
call 'computer', as far as the 'embodiment' discussion is concerned.
Robot is just special-purpose computer with I/O not designed for human
users.

 Of course it won't have a visual concept of self, but a system like
 NARS has the potential to grow into an intelligent operating system,
 with a notion of self based on what it can feel and do, as well as
 the causal relations among them --- If there is a file in this
 folder, then I should have felt it, it cannot be there because I've
 deleted the contents.

 My suggestion is that the file system lacks the complexity of structure and
 dynamics to support the emergence of a robust self-model, and powerful
 general intelligence...

Sure. I just used file managing as a simple example. What if the AI
have full control of the system's hardware and software, and can use
them in novel ways to solve all kinds of problems unknown to it
previously, without human involvement?

 I would call a self any internal, explicit model that a system creates
 that allows it to predict its own behaviors in a sufficient variety of
 contexts  This need not have a visual aspect nor a great similarity to a
 human self.

I'd rather not call it a 'model', though won't argue on this topic ---
'embodiment' is already confusing enough, so 'self' is better to wait,
otherwise someone will even add 'consciousness' into the discussion.
;-)

Pei


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-04 Thread Bryan Bishop
On Thursday 04 September 2008, Matt Mahoney wrote:
 Another aspect of embodiment (as the term is commonly used), is the
 false appearance of intelligence. We associate intelligence with
 humans, given that there are no other examples. So giving an AI a
 face or a robotic body modeled after a human can bias people to
 believe there is more intelligence than is actually present.

I'm still waiting until you guys could show me a psychometric test that 
has a one-to-one correlation with the bioinformatics and 
neuroinformatics and then thus could be approached with a physical 
model down at the biophysics. Otherwise the 'false appearance of 
intelligence' is a truism - intelligence is false. What then? (Would 
you give up making brains and such systems? I'm just wondering. It's an 
interesting scenario.)

- Bryan

http://heybryan.org/
Engineers: http://heybryan.org/exp.html
irc.freenode.net #hplusroadmap


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-04 Thread Pei Wang
On Thu, Sep 4, 2008 at 10:04 AM, Ben Goertzel [EMAIL PROTECTED] wrote:

 Hi Pei,

 I think your point is correct that the notion of embodiment presented by
 Brooks and some other roboticists is naive.  I'm not sure whether their
 actual conceptions are naive, or whether they just aren't presenting their
 foundational philosophical ideas clearly in their writings (being ultimately
 more engineering-oriented people, and probably not that accustomed to the
 philosophical style of discourse in which these sorts of definitional
 distinctions need to be more precisely drawn).

To a large extent, their position is an reaction to the 'disembodied'
symbolic AI, though they get the issue wrong. The symbolic AI is
indeed 'disembodied', but it is not because computers have no body (or
sensorimotor devices), but that the systems are designed to ignore
their body and their experience.

Therefore, the solution should not be to get a (robotic) body, but
to take experience into account.

 I do think (in approximate
 concurrence with your paper) that ANY control system physically embodied in
 a physical system S, that has an input and output stream, and whose input
 and output stream possess correlation with the physical state of S, should
 be considered as psychologically embodied.  Clearly, whether it's a robot
 or a laptop (w/o network connection if you like), such a system has the
 basic property of embodiment.

Yes, though I'd neither say possess correlation with the physical
state (which is the terminology of model-theoretic semantics), nor
psychologically embodied (which still sounds like a second-rate
substitute of physically embodied).

 Furthermore S doesn't need to be a physical
 system ... it could be a virtual system inside some virtual world (and
 then there's the question of what properties characterize a valid virtual
 world ... but let's leave that for another email thread...)

Every system (in this discussion) is a physical system. It is just
that sometimes we can ignore its physical properties.

 However, I think that not all psychologically-embodied systems possess a
 sufficiently rich psychological-embodiment to lead to significantly general
 intelligence  My suggestion is that a laptop w/o network connection or
 odd sensor-peripherals, probably does not have sufficiently rich
 correlations btw its I/O stream and its physical state, to allow it to
 develop a robust self-model of its physical self (which can then be used as
 a basis for a more general phenomenal self).

That is a separate issue.  If a system's I/O devices are very simple,
it cannot produce rich behaviors. However, the problem is not caused
by 'disembodiment'. We cannot say that a body much reach a certain
complexity to be called a 'body'.

 I think that Varela and crew understood the value of this rich network of
 correlations, but mistakenly assumed it to be a unique property of
 biological systems...

Agree.

 I realize that the points you made in your paper do not contradict the
 suggestions I've made in this email.  I don't think anything significant in
 your paper is wrong, actually.  It just seems to me not to address the most
 interesting aspects of the embodiment issue as related to AGI.

Understand.

Pei


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-04 Thread Pei Wang
On Thu, Sep 4, 2008 at 2:22 PM, Matt Mahoney [EMAIL PROTECTED] wrote:

 The paper seems to argue that embodiment applies to any system with inputs 
 and outputs, and therefore all AI systems are embodied.

No. It argues that since every system has inputs and outputs,
'embodiment', as a non-trivial notion, should be interpreted as
taking experience into account when behaves. Therefore, traditional
symbolic AI systems, like CYC, is still disembodied.

 However, there are important differences between symbolic systems like NARS 
 and systems with external sensors such as robots and humans.

NARS, when implemented, has input/output, and therefore has external sensors.

I guess you still see NARS as using model-theoretic semantics, so you
call it symbolic and contrast it with system with sensors. This is
not correct --- see
http://nars.wang.googlepages.com/wang.semantics.pdf and
http://nars.wang.googlepages.com/wang.AI_Misconceptions.pdf

 The latter are analog, e.g. the light intensity of a particular point in the 
 visual field, or the position of a joint in an arm. In humans, there is a 
 tremendous amount of data reduction from the senses, from 137 million rods 
 and cones in each eye each firing up to 300 pulses per second, down to 2 bits 
 per second by the time our high level visual perceptions reach long term 
 memory.

Within a certain accuracy, 'digital' and 'analog' have no fundamental
difference. I hope you are not arguing that only analog system can be
embodied.

 AI systems have traditionally avoided this type of processing because they 
 lacked the necessary CPU power. IMHO this has resulted in biologically 
 implausible symbolic language models with only a small number of connections 
 between concepts, rather than the tens of thousands of connections per neuron.

You have made this point on CPU power several times, and I'm still
not convinced that the bottleneck of AI is hardware capacity. Also,
there is no reason to believe an AGI must be designed in a
biologically plausible way.

 Another aspect of embodiment (as the term is commonly used), is the false 
 appearance of intelligence. We associate intelligence with humans, given that 
 there are no other examples. So giving an AI a face or a robotic body modeled 
 after a human can bias people to believe there is more intelligence than is 
 actually present.

I agree with you on this point, though will not argue so in the paper
--- it is like to call the roboticists cheating, even though it is
indeed the case that works in robotics are much easier to get public
attention.

Pei


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


[agi] draft for comment

2008-09-03 Thread Pei Wang
TITLE: Embodiment: Who does not have a body?

AUTHOR: Pei Wang

ABSTRACT: In the context of AI, ``embodiment'' should not be
interpreted as ``giving the system a body'', but as ``adapting to the
system's experience''. Therefore, being a robot is neither a
sufficient condition nor a necessary condition of being embodied. What
really matters is the assumption about the environment for which the
system is designed.

URL: http://nars.wang.googlepages.com/wang.embodiment.pdf


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-03 Thread Mike Tintner

Pei:it is important to understand
that both linguistic experience and non-linguistic experience are both 
special
cases of experience, and the latter is not more real than the former. In 
the previous
discussions, many people implicitly suppose that linguistic experience is 
nothing but
Dictionary-Go-Round [Harnad, 1990], and only non-linguistic experience can 
give
symbols meaning. This is a misconception coming from traditional semantics, 
which
determines meaning by referred object, so that an image of the object seems 
to be closer

to the real thing than a verbal description [Wang, 2007].

1. Of course the image is more real than the symbol or word.

Simple test of what should be obvious: a) use any amount of symbols you 
like, incl. Narsese, to describe Pei Wang. Give your description to any 
intelligence, human or AI, and see if it can pick out Pei in a lineup of 
similar men.


b) give the same intelligence a photo of Pei -  apply the same test.

Guess which method will win.

Only images can represent *INDIVIDUAL objects* - incl Pei/Ben or this 
keyboard on my desk. And in the final analysis, only indvidual objects *are* 
real. There are no chairs or oranges for example - those general 
concepts are, in the final analysis, useful fictions. There is only this 
chair here and that chair over there. And if you want to refer to them, 
individually, - so that you communicate successfully with another 
person/intelligence - you have no choice but to use images, (flat or solid).


2. Symbols are abstract - they can't refer to anything unless you already 
know, via images, what they refer to. If you think not, please draw a 
cheggnutAgain, if I give you an image of a cheggnut, you will have no 
problem.


3. You talk of a misconception of semantics, but give no reason why it is 
such, merely state it is.


4. You leave out the most important thing of all - you argue that experience 
is composed of symbols and images. And...?  Hey, there's also the real 
thing(s). The real objects that they refer to. You certainly can't do 
science without looking at the real objects. And science is only a 
systematic version of all intelligence. That's how every  functioning 
general intelligence is able to be intelligent about the world - by being 
grounded in the real world, composed of real objects. which it can go out 
and touch, walk round, look at and interact with. A box like Nars can't do 
that, can it?


Do you realise what you're saying, Pei? To understand statements is to 
*realise* what they mean - what they refer to - to know that they refer to 
real objects, which you can really go and interact with and test - and to 
try (or have your brain try automatically) to connect those statements to 
real objects.


When you or I are given words or images, find this man [Pei], or cook a 
Chinese meal tonight, we know that those signs must be tested in the real 
world and are only valid if so tested. We know that it's possible that that 
man over there who looks v. like the photo may not actually be Pei, or that 
Pei may have left the country and be impossible to find. We know that it may 
be impossible to cook such a meal, because there's no such food around. - 
And all such tests can only be conducted in the real world (and not say by 
going and looking at other texts or photos - living in a Web world).


Your concept of AI is not so much un-grounded as unreal.

5. Why on earth do you think that evolution shows us general intelligences 
very successfully dealing with the problems of the world for over a billion 
years *without* any formal symbols? Why do infants take time to acquire 
l;anguage and are therefore able to survive without it?


The conception of AI that you are advancing is the equivalent of 
Creationism - it both lacks and denies an evolutionary perspective on 
intelligence - a (correctly) cardinal sin in modern science..







---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-03 Thread Ben Goertzel
Pei,

I have a different sort of reason for thinking embodiment is important ...
it's a deeper reason that I think underlies the embodiment is important
because of symbol grounding argument.

Linguistic data, mathematical data, visual data, motoric data etc. are all
just bits ... and intelligence needs to work by recognizing patterns among
these bits, especially patterns related to system goals.

What I think is that the set of patterns in perceptual and motoric data has
radically different statistical properties than the set of patterns in
linguistic and mathematical data ... and that the properties of the set of
patterns in perceptual and motoric data is intrinsically better suited to
the needs of a young, ignorant, developing mind.

All these different domains of pattern display what I've called a dual
network structure ... a collection of hierarchies (of progressively more
and more complex, hierarchically nested patterns) overlayed with a
heterarchy (of overlapping, interrelated patterns).  But the statistics of
the dual networks in the different domains is different.  I haven't fully
plumbed the difference yet ... but, among the many differences is that in
perceptual/motoric domains, you have a very richly connected dual network at
a very low level of the overall dual network hierarchy -- i.e., there's a
richly connected web of relatively simple stuff to understand ... and then
these simple things are related to (hence useful for learning) the more
complex things, etc.

In short, Pei, I agree that the arguments typically presented in favor of
embodiment in AI suck.  However, I think there are deeper factors going on
which do imply a profound value of embodiment for AGI.  Unfortunately, we
currently lack a really appropriate scientific language for describing the
differences in statistical organization between different pattern-sets, so
it's almost as difficult to articulate these differences as it is to
understand them...

-- Ben G

On Wed, Sep 3, 2008 at 4:58 PM, Pei Wang [EMAIL PROTECTED] wrote:

 TITLE: Embodiment: Who does not have a body?

 AUTHOR: Pei Wang

 ABSTRACT: In the context of AI, ``embodiment'' should not be
 interpreted as ``giving the system a body'', but as ``adapting to the
 system's experience''. Therefore, being a robot is neither a
 sufficient condition nor a necessary condition of being embodied. What
 really matters is the assumption about the environment for which the
 system is designed.

 URL: http://nars.wang.googlepages.com/wang.embodiment.pdf


 ---
 agi
 Archives: https://www.listbox.com/member/archive/303/=now
 RSS Feed: https://www.listbox.com/member/archive/rss/303/
 Modify Your Subscription:
 https://www.listbox.com/member/?;
 Powered by Listbox: http://www.listbox.com




-- 
Ben Goertzel, PhD
CEO, Novamente LLC and Biomind LLC
Director of Research, SIAI
[EMAIL PROTECTED]

Nothing will ever be attempted if all possible objections must be first
overcome  - Dr Samuel Johnson



---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-03 Thread Pei Wang
Mike,

As I said before, you give symbol a very narrow meaning, and insist
that it is the only way to use it. In the current discussion,
symbols are not 'X', 'Y', 'Z', but 'table', 'time', 'intelligence'.
BTW, what images you associate with the latter two?

Since you prefer to use person as example, let me try the same. All of
my experience about 'Mike Tintner' is symbolic, nothing visual, but it
still makes you real enough to me, and I've got more information about
you than a photo of you can provide. For instance, this experience
tells me that to argue this issue with you will very likely be a waste
of time, which is something that no photo can teach me. I still cannot
pick you out in a lineup, but it doesn't mean your name is meaningless
to me.

I'm sorry if it sounds rude --- I rarely talk to people in this tone,
but you are exceptional, in my experience of personal communication.
Again, the meaning of your name, in my mind, is not the person it
refers, but its relations with other concepts in my experience, this
experience can either be visual, verbal, or something else.

Pei

On Wed, Sep 3, 2008 at 6:07 PM, Mike Tintner [EMAIL PROTECTED] wrote:
 Pei:it is important to understand
 that both linguistic experience and non-linguistic experience are both
 special
 cases of experience, and the latter is not more real than the former. In
 the previous
 discussions, many people implicitly suppose that linguistic experience is
 nothing but
 Dictionary-Go-Round [Harnad, 1990], and only non-linguistic experience can
 give
 symbols meaning. This is a misconception coming from traditional semantics,
 which
 determines meaning by referred object, so that an image of the object seems
 to be closer
 to the real thing than a verbal description [Wang, 2007].

 1. Of course the image is more real than the symbol or word.

 Simple test of what should be obvious: a) use any amount of symbols you
 like, incl. Narsese, to describe Pei Wang. Give your description to any
 intelligence, human or AI, and see if it can pick out Pei in a lineup of
 similar men.

 b) give the same intelligence a photo of Pei -  apply the same test.

 Guess which method will win.

 Only images can represent *INDIVIDUAL objects* - incl Pei/Ben or this
 keyboard on my desk. And in the final analysis, only indvidual objects *are*
 real. There are no chairs or oranges for example - those general
 concepts are, in the final analysis, useful fictions. There is only this
 chair here and that chair over there. And if you want to refer to them,
 individually, - so that you communicate successfully with another
 person/intelligence - you have no choice but to use images, (flat or solid).

 2. Symbols are abstract - they can't refer to anything unless you already
 know, via images, what they refer to. If you think not, please draw a
 cheggnutAgain, if I give you an image of a cheggnut, you will have no
 problem.

 3. You talk of a misconception of semantics, but give no reason why it is
 such, merely state it is.

 4. You leave out the most important thing of all - you argue that experience
 is composed of symbols and images. And...?  Hey, there's also the real
 thing(s). The real objects that they refer to. You certainly can't do
 science without looking at the real objects. And science is only a
 systematic version of all intelligence. That's how every  functioning
 general intelligence is able to be intelligent about the world - by being
 grounded in the real world, composed of real objects. which it can go out
 and touch, walk round, look at and interact with. A box like Nars can't do
 that, can it?

 Do you realise what you're saying, Pei? To understand statements is to
 *realise* what they mean - what they refer to - to know that they refer to
 real objects, which you can really go and interact with and test - and to
 try (or have your brain try automatically) to connect those statements to
 real objects.

 When you or I are given words or images, find this man [Pei], or cook a
 Chinese meal tonight, we know that those signs must be tested in the real
 world and are only valid if so tested. We know that it's possible that that
 man over there who looks v. like the photo may not actually be Pei, or that
 Pei may have left the country and be impossible to find. We know that it may
 be impossible to cook such a meal, because there's no such food around. -
 And all such tests can only be conducted in the real world (and not say by
 going and looking at other texts or photos - living in a Web world).

 Your concept of AI is not so much un-grounded as unreal.

 5. Why on earth do you think that evolution shows us general intelligences
 very successfully dealing with the problems of the world for over a billion
 years *without* any formal symbols? Why do infants take time to acquire
 l;anguage and are therefore able to survive without it?

 The conception of AI that you are advancing is the equivalent of Creationism
 - it both lacks and denies an 

Re: [agi] draft for comment.. P.S.

2008-09-03 Thread Mike Tintner
I think I have an appropriate term for what I was trying to conceptualise. 
It is that intelligence has not only to be embodied, but it has to be 
EMBEDDED in the real world -  that's the only way it can test whether 
information about the world and real objects is really true. If you want to 
know whether Jane Doe is great at sex, you can't take anyone's word for it, 
you have to go to bed with her. [Comments on the term esp. welcome). 





---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com


Re: [agi] draft for comment

2008-09-03 Thread Pei Wang
On Wed, Sep 3, 2008 at 6:24 PM, Ben Goertzel [EMAIL PROTECTED] wrote:

 What I think is that the set of patterns in perceptual and motoric data has
 radically different statistical properties than the set of patterns in
 linguistic and mathematical data ... and that the properties of the set of
 patterns in perceptual and motoric data is intrinsically better suited to
 the needs of a young, ignorant, developing mind.

Sure it is. Systems with different sensory channels will never fully
understand each other. I'm not saying that one channel (verbal) can
replace another (visual), but that both of them (and many others) can
give symbol/representation/concept/pattern/whatever-you-call-it
meaning. No on is more real than others.

 All these different domains of pattern display what I've called a dual
 network structure ... a collection of hierarchies (of progressively more
 and more complex, hierarchically nested patterns) overlayed with a
 heterarchy (of overlapping, interrelated patterns).  But the statistics of
 the dual networks in the different domains is different.  I haven't fully
 plumbed the difference yet ... but, among the many differences is that in
 perceptual/motoric domains, you have a very richly connected dual network at
 a very low level of the overall dual network hierarchy -- i.e., there's a
 richly connected web of relatively simple stuff to understand ... and then
 these simple things are related to (hence useful for learning) the more
 complex things, etc.

True, but can you say that the relations among words, or concepts, are simpler?

 In short, Pei, I agree that the arguments typically presented in favor of
 embodiment in AI suck.  However, I think there are deeper factors going on
 which do imply a profound value of embodiment for AGI.  Unfortunately, we
 currently lack a really appropriate scientific language for describing the
 differences in statistical organization between different pattern-sets, so
 it's almost as difficult to articulate these differences as it is to
 understand them...

In this short paper, I make no attempt to settle all issues, but just
to point out a simple fact --- a laptop has a body, and is not less
embodied than Roomba or Mindstorms --- that seems have been ignored in
the previous discussion.

Pei


---
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com