Richard and Matt, The below is an interesting exchange.
For Richard I have the question, how is what you are proposing that different than what could be done with Novamente, where if one had hardcoded a set of top level goals, all of the perceptual, cognitive, behavioral, and goal patterns -- and the activation of such patterns - developed by the system would not only be molded by the probabilities of the "world' in which the system dealt, but also with how important each of those patterns have proven relative to the system's high level goals. So in a Novamente system you would appear to have the types of biases you suggest that would greatly influence the each of the millions to trillions (depending on system size) of patterns in the "cloud of concepts" that would be formed, their links, and their activation patterns. So, how is your system different? What am I missing? Edward W. Porter Porter & Associates 24 String Bridge S12 Exeter, NH 03833 (617) 494-1722 Fax (617) 494-1822 [EMAIL PROTECTED] -----Original Message----- From: Richard Loosemore [mailto:[EMAIL PROTECTED] Sent: Monday, October 01, 2007 1:41 PM To: [email protected] Subject: Re: [agi] Religion-free technical content Matt Mahoney wrote: > Richard, > Let me make sure I understand your proposal. You propose to program > friendliness into the motivational structure of the AGI as tens of > thousands of hand-coded soft constraints or rules. Presumably with so > many rules, we should be able to cover every conceivable situation now > or in the future where the AGI would have to make a moral decision. > Among these rules: the AGI is not allowed to modify the function that > computes its reward signal, nor it is allowed to create another AGI > with a different function. > > You argue that the reward function becomes more stable after RSI. I > presume this is because when there are a large number of AGIs, they > will be able to observe any deviant behavior, then make a collective > decision as to whether the deviant should be left alone, reprogrammed, > or killed. This policing would be included in the reward function. > > Presumably the reward function is designed by a committee of > upstanding citizens who have reached a consensus on what it means to > be friendly in every possible scenario. Once designed, it can never > be changed. Because if there were any mechanism by which all of the > AGIs could be updated at once, then there is a single point of > failure. This is not allowed. On the other hand, if the AGIs were > updated one at a time (allowed only with human permission), then the > resulting deviant behavior would be noticed by the other AGIs before > they could be updated. So the reward function remains fixed. > > Is this correct? Well, I am going to assume that Mark is wrong and that you are not trying to be sarcastic, but really do genuinely mean to pose the questions. You have misunderstood the design at a very deep level, so none ofthe above would happen. The multiple constraints are not explicitly programmed into the system in the form of semantically interpretable statements (like Asimov's laws), nor would there be a simple "reward function", nor would there be a committe of experts who sat down and tried to write out a complete list of all the rules. These are all old-AI concepts (conventional, non-complex AI), they simply do not map onto the system at all. The AGI has a motivational system that *biasses* the cloud of concepts in one direction or another, to make the system have certain goals, and the nature of this bias is that during development, the concepts themselves all grew from simple primitive ideas (so primitive that they are not even ideas, but just sources of influence on the concept building process), and these simple primitives reach out through the entire web of adult concepts. This is a difficult idea to grasp, I admit, but the consequence of that type of system design is that, for example, the general idea of "feeling empathy for the needs and aspirations of the entire human race" is not represented in the system as an explcit memory location that says "Rule number 71, as decided by the Committee of World AGI Ethics Experts, is that you must feel empathy for the entire human race" .... instead, the thing that we externally describe as "empathy" is just a collective result of a massive number of learned concepts and their connections. This makes "empathy" a _systemic_ characteristic, intrinsic to the entire system, not a localizable rule. The empathy feeling, to be sure, is controlled by roots that go back to the motivational system, but these roots would be built in such a way that tampering or malfunction would: (a) not be able to happen without huge intervention, which would be easily noticed, and (b) not cause any catastrophic behavior even if it did go wrong, because the malfunctioning of the motivational system would render the entire system useless. Notice that in a real human system, damage to the empathy component can possibly cause trouble, but that is precisely because we have other, dangerous components like our aggression modules, which can take over. These would not be present, so an AGI would degrade gracefully if the empathy system (for some bizarre reason) were interfered with. And to asnwer you general question: the empathy function would not be constrained to be fixed, because it would be dependent on the wishes of humanity. Or rather, the *nature* of the empathy function would stay the same, but the content (the expression of the empathy) would stay locked in to the desires of humanity, in perpetuity. Hope that answers the questions. Richard Loosemore > --- Richard Loosemore <[EMAIL PROTECTED]> wrote: > >> Matt Mahoney wrote: >>> --- Richard Loosemore <[EMAIL PROTECTED]> wrote: >>> >>>> Derek Zahn wrote: >>>>> Richard Loosemore writes: >>>>> >>>>> > It is much less opaque. >>>>> > >>>>> > I have argued that this is the ONLY way that I know of to >>>>> ensure that > AGI is done in a way that allows >>>>> safety/friendliness to be >> guaranteed. >>>>> > >>>>> > I will have more to say about that tomorrow, when I hope to >>>>> make an > announcement. >>>>> >>>>> Cool. I'm sure I'm not the only one eager to see how you can >>>>> guarantee >>>>> (read: prove) such specific detailed things about the behaviors of a >>>>> complex system. >>>> Hmmm... do I detect some skepticism? ;-) >>> I remain skeptical. Your argument applies to an AGI not modifying >>> its own motivational system. It does not apply to an AGI making >>> modified copies >> of >>> itself. In fact you say: >> Not correct, I am afraid: I specifically emphasize that the AGI is >> allowed to modify its own motivational system. I don't know how you got >> the opposite idea. (I haven't had time to review my text, so apologies >> if it was my fault and I did accidentally give the wrong impression ... >> but the whole point of this essay was to suggest a way to gurantee >> friendliness under any circumstances, including self-improvement). >> >>>> Also, during the development of the first true AI, we would monitor >>>> the >>>> connections going from motivational system to thinking system. It would >>>> be easy to set up alarm bells if certain kinds of thoughts started to >>>> take hold -- just do it by associating with certain keys sets of >>>> concepts and keywords. While we are designing a stable motivational >>>> system, we can watch exactly what goes on, and keep tweeking until it >>>> gets to a point where it is clearly not going to get out of the large >>>> potential well. >> I do not see how this illustrates your point above. >> >> >>> You refer to the humans building the first AGI. Humans, being >>> imperfect, might not get the algorithm for friendliness exactly >>> right in the first iteration. So it will be up to the AGI to tweak >>> the second copy a little >> more >>> (according to the first AGI's interpretation of friendliness). And >>> so on. >> So >>> the goal drifts a little with each iteration. And we have no >>> control over which way it drifts. >> What an extraordinary statement to make! >> >> The purpose of the essay was to argue that with each iteration it >> digs >> itself deeper into the same pattern and cannot drift out into an >> unfriendly state. >> >> But you reply to this by just stating that the opposite is going to >> be >> the case, without saying why. Which part of my argument did you decide >> was wrong, that you could state the opposite conclusion? >> >> >> >> Richard Loosemore >> >> >> >> >> ----- >> This list is sponsored by AGIRI: http://www.agiri.org/email To >> unsubscribe or change your options, please go to: >> http://v2.listbox.com/member/?& >> > > > -- Matt Mahoney, [EMAIL PROTECTED] > > ----- > This list is sponsored by AGIRI: http://www.agiri.org/email To > unsubscribe or change your options, please go to: > http://v2.listbox.com/member/?& > > ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?& ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244&id_secret=48514906-098c16
