Re: [singularity] Motivational Systems that are stable

2006-10-30 Thread Richard Loosemore

Mitchell Porter wrote:


Richard Loosemore:

In fact, if it knew all about its own design (and it would, 
eventually), it would check to see just how possible it might be for 
it to accidentally convince itself to disobey its prime directive,


But it doesn't have a prime directive, does it? It has large numbers
of constraints affecting its decisions.


Well  I have used prime directive to mean motives that the 
motivational system gives to the system.  This would initially be the 
very simple motives of attachment, affection, etc., but would then 
develop later into more sophisticated versions of the same.


Where the large numbers of constraints come in would be in the mechanics 
of how the motivational system governs the system.




I would agree absolutely that emergent stability sounds possible, but
(1) one needs to say much more about the necessary and sufficient
conditions (2) one needs to define Friendliness and specialize to that
case. (And I hope you'd agree with these extra points.)


If by (1) you mean we need to know more about the implementation 
details, then, yes of course!  I am trying to establish a general 
principle to guide research.  I can see a number of the details, but not 
the complete picture yet.


Defining friendliness is more a matter of figuring out what motivational 
primitives give us what we want.  In other words, I agree with you, but 
the way we produce the definition will not necessarily involve writing 
down the actual laws of friendliness in explicit terms.  We need to do 
experimental and theoretical work to see how the initial motivational 
seeds control later behavior.


I do apologize for not being able to explain more of what is in my head 
here:  to do that properly I have to set up a lot of background, and be 
meticulous.  I am doing that, but it is more appropriate for a book than 
an essay on a list.  I'm working as fast as I can, given a limited time 
budget.



Richard Loosemore





-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Re: [singularity] Motivational Systems that are stable

2006-10-30 Thread Richard Loosemore


Ben,

I guess the issue I have with your critique is that you say that I have 
given no details, no rigorous argument, just handwaving, etc.


But you are being contradictory:  on the one hand you say that the 
proposal is vague/underspecified/does not give any arguments  but 
then having said that, you go on to make specific criticisms and say 
that it is wrong on this or that point.


I don't think you can have it both ways.  Either you don't see an 
argument, and rest your case, or you do see an argument and want to 
critique it.  You are trying to do both:  you repeatedly make broad 
accusations about the quality of the proposal (some very hand-wavy, 
intuitive suggestions, you have not given any sort of rigorous 
argument, ... your intuitive suggestions..., you did not give any 
details as to why you think your proposal will 'work', etc. etc.), but 
then go on to make specific points about what is wrong with it.


Now, if the specific points you make were valid criticisms, I could 
perhaps overlook the inconsistency and just address the criticisms.  But 
that is exactly what I just did, and your specific criticisms, as I 
explained in the last message, were mostly about issues that had nothing 
to do with the general class of architectures I proposed, but only with 
weird cases or weird issues that had no bearing on my case.


Since you just dropped most of those issues (except one, which I will 
address in a moment), I must assume that you accept that I have given a 
good reply to each of them.  But instead of conceding that the argument 
I gave must therefore have some merit, you repeat -- even more 
insistently than before -- that there is nothing in the argument, that 
it is all just vague handwaving etc.


No fair!

This kind of response:

  -  Your argument is either too vague or I don't understand it.

Would be fine, and I would just try to clarify it in the future.

But this response:

  -  This is all just handwaving, with no details and no argument.
  -  It is also a wrong argument, for these reasons:
  -  [Reasons that are mostly just handwaving or irrelevant].

Is not so good.

*

I will say something about the specific point you make about my claim 
that as time goes on the system will check new ideas against previous 
ones to make sure that new ones are consistent with ALL the old ones, so 
therefore it will become more and more stable.


What you have raised is a minor technical issue, together with some 
confusion about what exactly I meant:


The ideas being checked against all previous ideas are *not* the 
incoming general learned concepts (cup, salt, cricket, democracy, 
sneezes. etc.) but the concepts related to planned actions and the 
system's base of moral/ethical/motivational concerns.  Broadly speaking, 
it is when there is a new perhaps I should do this ... idea that the 
comparison starts.  I did actually say this, but it was a little 
obscurely worded.


Now, when I said checked for consistency against all previous ideas I 
was speaking rather loosely (my bad).  Obviously I would not do this by 
an exhaustive comparison [please:  I don't need to have it explained to 
me that this is O(n^^2)! :-) ].  The mechanism would work something like 
a parallel terraced scan:  issues are represented at different levels of 
granularity, and if any kind of inconsistency is detected at one of the 
high (low-granularity) levels, it provokes a focussing on the problem 
and an elaboration of everything involved in the idea, which then can 
bring in lots more consideration, potentially resulting in a complete 
comparison on that one issue.  In addition, but the system would use 
various other (monte-carlo-esque) techniques for taking random looks at 
the implications of some issue, to catch problems that might not get 
past the top level scan.


Specific example.   The system thinks that maybe selling its mother into 
the white slave trade is a good way to make money.  But this very idea 
causes simple associations with [white slave trade] to kick in (for 
example [misery], [brutality], [betrayal], and so on).  These simple 
associations get connected with [mother] and in a moment the system 
finds that the concept [unhappy mother] sends a big fat negative signal 
back to the motivational system, waking up the module that is 
responsible for the [social group attachment] motivation.  Pretty soon 
this kicks in a full-scale reexamination of the entire idea, and when 
examined in detail it is found to be inconsistent with the system's 
prime motivations.


So although you made a reasonable point, this is not a technical 
difficulty that cannot be handled easily.


I note that you did anticipate this reply, when you said Some heuristic 
shortcuts must be used to decrease the number of comparisons, and such 
heuristics introduce the possibility of error..., and then also The 
kind of distributed system you are describing seems NOT to solve the 

Re: Re: [agi] Re: [singularity] Motivational Systems that are stable

2006-10-30 Thread Ben Goertzel

Hi Richard,

Let me go back to start of this dialogue...

Ben Goertzel wrote:

Loosemore wrote:

 The motivational system of some types of AI (the types you would
 classify as tainted by complexity) can be made so reliable that the
 likelihood of them becoming unfriendly would be similar to the
 likelihood of the molecules of an Ideal Gas suddenly deciding to split
 into two groups and head for opposite ends of their container.


Wow!  This is a vey strong hypothesis  I really doubt this
kind of certainty is possible for any AI with radically increasing
intelligence ... let alone a complex-system-type AI with highly
indeterminate internals...

I don't expect you to have a proof for this assertion, but do you have
an argument at all?


Your subsequent responses have shown that you do have an argument, but
not anything close to a proof.

And, your argument has not convinced me, so far.  Parts of it seem
vague to me, but based on my limited understanding of your argument, I
am far from convinced that AI systems of the type you describe, under
conditions of radically improving intelligence, can be made so
reliable that the likelihood of them becoming unfriendly would be
similar to the likelihood of the molecules of an Ideal Gas suddenly
deciding to split into two groups and head for opposite ends of their
container.

At this point, my judgment is that carrying on this dialogue further
is not the best expenditure of my time.  Your emails are long and
complex mixtures of vague and precise statements, and it takes a long
time for me to read them and respond to them with even a moderate
level of care.

I remain interested in your ideas and if you write a paper or book on
your ideas I will read it as my schedule permits.  But I will now opt
out of this email thread.

Thanks,
Ben


On 10/30/06, Richard Loosemore [EMAIL PROTECTED] wrote:


Ben,

I guess the issue I have with your critique is that you say that I have
given no details, no rigorous argument, just handwaving, etc.

But you are being contradictory:  on the one hand you say that the
proposal is vague/underspecified/does not give any arguments  but
then having said that, you go on to make specific criticisms and say
that it is wrong on this or that point.

I don't think you can have it both ways.  Either you don't see an
argument, and rest your case, or you do see an argument and want to
critique it.  You are trying to do both:  you repeatedly make broad
accusations about the quality of the proposal (some very hand-wavy,
intuitive suggestions, you have not given any sort of rigorous
argument, ... your intuitive suggestions..., you did not give any
details as to why you think your proposal will 'work', etc. etc.), but
then go on to make specific points about what is wrong with it.

Now, if the specific points you make were valid criticisms, I could
perhaps overlook the inconsistency and just address the criticisms.  But
that is exactly what I just did, and your specific criticisms, as I
explained in the last message, were mostly about issues that had nothing
to do with the general class of architectures I proposed, but only with
weird cases or weird issues that had no bearing on my case.

Since you just dropped most of those issues (except one, which I will
address in a moment), I must assume that you accept that I have given a
good reply to each of them.  But instead of conceding that the argument
I gave must therefore have some merit, you repeat -- even more
insistently than before -- that there is nothing in the argument, that
it is all just vague handwaving etc.

No fair!

This kind of response:

   -  Your argument is either too vague or I don't understand it.

Would be fine, and I would just try to clarify it in the future.

But this response:

   -  This is all just handwaving, with no details and no argument.
   -  It is also a wrong argument, for these reasons:
   -  [Reasons that are mostly just handwaving or irrelevant].

Is not so good.

*

I will say something about the specific point you make about my claim
that as time goes on the system will check new ideas against previous
ones to make sure that new ones are consistent with ALL the old ones, so
therefore it will become more and more stable.

What you have raised is a minor technical issue, together with some
confusion about what exactly I meant:

The ideas being checked against all previous ideas are *not* the
incoming general learned concepts (cup, salt, cricket, democracy,
sneezes. etc.) but the concepts related to planned actions and the
system's base of moral/ethical/motivational concerns.  Broadly speaking,
it is when there is a new perhaps I should do this ... idea that the
comparison starts.  I did actually say this, but it was a little
obscurely worded.

Now, when I said checked for consistency against all previous ideas I
was speaking rather loosely (my bad).  Obviously I would not do this by
an exhaustive comparison [please:  I don't 

RE: [singularity] Motivational Systems that are stable

2006-10-29 Thread Mitchell Porter


Richard Loosemore:

In fact, if it knew all about its own design (and it would, eventually), it 
would check to see just how possible it might be for it to accidentally 
convince itself to disobey its prime directive,


But it doesn't have a prime directive, does it? It has large numbers
of constraints affecting its decisions.

I would agree absolutely that emergent stability sounds possible, but
(1) one needs to say much more about the necessary and sufficient
conditions (2) one needs to define Friendliness and specialize to that
case. (And I hope you'd agree with these extra points.)

_
Research and compare new cars side by side at carpoint.com.au 
http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fsecure%2Dau%2Eimrworldwide%2Ecom%2Fcgi%2Dbin%2Fa%2Fci%5F450304%2Fet%5F2%2Fcg%5F801459%2Fpi%5F1004813%2Fai%5F833884_t=54321_r=hotmail_endtext_m=EXT


-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [agi] Re: [singularity] Motivational Systems that are stable

2006-10-29 Thread Ben Goertzel

Hi,


There is something about the gist of your response that seemed strange
to me, but I think I have put my finger on it:  I am proposing a general
*class* of architectures for an AI-with-motivational-system.  I am not
saying that this is a specific instance (with all the details nailed
down) of that architecture, but an entire class. an approach.

However, as I explain in detail below, most of your criticisms are that
there MIGHT be instances of that architecture that do not work.


No.   I don't see why there will be any instances of your architecture
that do work (in the sense of providing guaranteeable Friendliness
under conditions of radical, intelligence-increasing
self-modification).

And you have not given any sort of rigorous argument that such
instances will exist

Just some very hand-wavy, intuitive suggestions, centering on the
notion that (to paraphrase) because there are a lot of constraints, a
miracle happens  ;-)

I don't find your intuitive suggestions foolish or anything, just
highly sketchy and unconvincing.

I would say the same about Eliezer's attempt to make a Friendly AI
architecture in his old, now-repudiated-by-him essay Creating a
Friendly AI.  A lot in CFAI seemed plausible to me , and the intuitive
arguments were more fully fleshed out than your in your email
(naturally, because it was an article, not an email) ... but in the
end I felt unconvinced, and Eliezer eventually came to agree with me
(though not on the best approach to fixing the problems)...


  In a radically self-improving AGI built according to your
  architecture, the set of constraints would constantly be increasing in
  number and complexity ... in a pattern based on stimuli from the
  environment as well as internal stimuli ... and it seems to me you
  have no way to guarantee based on the smaller **initial** set of
  constraints, that the eventual larger set of constraints is going to
  preserve Friendliness or any other criterion.

On the contrary, this is a system that grows by adding new ideas whose
motivatonal status must be consistent with ALL of the previous ones, and
the longer the system is allowed to develop, the deeper the new ideas
are constrained by the sum total of what has gone before.


This does not sound realistic.  Within realistic computational
constraints, I don't see how an AI system is going to verify that each
of its new ideas is consistent with all of its previous ideas.

This is a specific issue that has required attention within the
Novamente system.  In Novamente, each new idea is specifically NOT
required to be verified for consistency against all previous ideas
existing in the system, because this would make the process of
knowledge acquisition computationally intractable.  Rather, it is
checked for consistency against those other pieces of knowledge with
which it directly interacts.  If an inconsistency is noticed, in
real-time, during the course of thought, then it is resolved
(sometimes by a biased random decision, if there is not enough
evidence to choose between two inconsistent alternatives; or
sometimes, if the matter is important enough, by explicitly
maintaining two inconsistent perspectives in the system, with separate
labels, and an instruction to pay attention to resolving the
inconsistency as more evidence comes in.)

The kind of distributed system you are describing seems NOT to solve
the computational problem of verifying the consistency of each new
knowledge item with each other knowledge item.



Thus:  if the system has grown up and acquired a huge number of examples
and ideas about what constitutes good behavior according to its internal
system of values, then any new ideas about new values must, because of
the way the system is designed, prove themselves by being compared
against all of the old ones.


If each idea must be compared against all other ideas, then cognition
has order n^2 where n is the number of ideas.  This is not workable.
Some heuristic shortcuts must be used to decrease the number of
comparisons, and such heuristics introduce the possibility of error...


And I said ridiculously small chance advisedly:  if 10,000 previous
constraints apply to each new motivational idea, and if 9,900 of them
say 'Hey, this is inconsistent with what I think is a good thing to do',
then it doesn't have a snowball's chance in hell of getting accepted.
THIS is the deep potential well I keep referring to.


The problem, as I said, is posing a set of constraints that is both
loose enough to allow innovative new behaviors, and tight enough to
prevent the wrong behaviors...


I maintain that we can, during early experimental work, understand the
structure of the motivational system well enough to get it up to a
threshold of acceptably friendly behavior, and that beyond that point
its stability will be self-reinforcing, for the above reasons.


Well, I hope so ;-)

I don't rule out the possibility, but I don't feel you've argued for
it convincingly, 

Re: [singularity] Motivational Systems that are stable

2006-10-27 Thread Richard Loosemore


Curious.

A couple of days ago, I responded to demands that I produce arguments to 
justify the conclusion that there were ways to build a friendly AI that 
was extremely stable and trustworthy, but without having to give a 
mathematical proof of its friendliness.


Now, granted, the text was complex, technical, and not necessarily 
worded as best it could be.  But the background to this is that I am 
writing a long work on the foundations of cognitive science, and the 
ideas in that post were a condensed version of material that is spread 
out over several dense chapters in that book ... but even though that 
longer version is not ready, I finally gave in to the repeated (and 
sometimes shrill and abusive) demands that I produce at least some kind 
of summary of what is in those chapters.


But after all that complaining, I gave the first outline of an actual 
technique for guaranteeing Friendliness (not vague promises that a 
rigorous mathematical proof is urgently needed, and I promise I am 
working on it, but an actual method that can be developed into a 
complete solution), and the response was  nothing.


I presume this means everyone agrees with it, so this is a milestone of 
mutual accord in a hitherto divided community.


Progress!



Richard Loosemore.

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]


Re: [singularity] Motivational Systems that are stable

2006-10-25 Thread Anna Taylor

The last I heard, computers are spied upon because of the language the
computer is generating.  Why would the government care about the guy
that picks up garbage?

Richard Loosemore wrote, Wed, Oct 25, 2006:

The word trapdoor is a reference to trapdoor algorithms that allow
computers to be spied upon.


If you feet guilty about something then you will feel that your
ethical values are being compromised.
Technology is without a doubt the age of the future  If you have
posted, said or done, chances are if will come and haunt you.
The only way to change the algorithms is to change the thoughts.

Just my thoughts, let me know what you think.
Anna:)






On 10/25/06, Richard Loosemore [EMAIL PROTECTED] wrote:

Anna Taylor wrote:
 On, Wed, Oct 25, 2006 at 10:11 R. Loosemore wrote:
 What I have in mind here is the objection (that I know
 some people will raise) that it might harbor some deep-seated animosity
 such as an association between human beings in general and something
 'bad' that happened to it when it was growing up ... we would easily be
 able to catch something like that if we had a trapdoor on the
 motivational system.

 I'm not clear what you meant, could you rephrase?
 I understood, what I have in mind is a trapdoor of the motivational
 system:)
 Do you think motivation is a key factor that generates
 singularity-level events?
 Am I understanding properly?

 Just curious
 Anna:)

Anna,

The word trapdoor is a reference to trapdoor algorithms that allow
computers to be spied upon:  I meant it in a similar sense, that the AI
would be built in such a way that we could (in the development stages)
spy on what was happening in the motivational system to find out whether
the AI was developing any nasty intentions.

The purpose of the essay was to establish that this alternative approach
to creating a friendly AI would be both viable and (potentially)
extremely stable.  It is a very different approach to the one currently
thought to be the only method, which is to prove properties of the AI's
goal system mathematically  a task that many consider impossible.
By suggesting this alternative I am saying that mathematical proof may
be impossible, but guarantees of very strong kind may well be possible.

As you probably know, many people (including me) are extremely concerned
that AI be developed safely.

Hope that helps,

Richard Loosemore

-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]



-
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]