Bill Hibbard wrote: > Human and animal brains have mostly selfish values, but > there is no good reason to design artificial brains with > selfish values. I'd like to see values based on human > happiness, as recognized in human faces, voices and body > language. > > The danger is that reinforcement values will be based > on some corporation's profits and losses, or even some > sort of military values.
A lot of these issues have been discussed extensively on the SL4 list (see www.sl4.org); all list members who are interested in such things and don't know the SL4 list, should peruse the list's archives a bit... However, Bill, your post brings up a specific issue that came up a while ago on the SL4 list ([EMAIL PROTECTED]) -- "goal drift" or "value drift" -- that is sufficiently AGI-pertinent that it seems worth discussing here... Suppose one has an advanced AGI system, which revises and modifies itself progressively, seeking to make itself more intelligent and/or generally to cause itself to be more "high-quality" in accordance with its own value system. In theory if a system has goal G, and revises itself, it should produce a revised system that also has goal G. Unless it has a self-negating goal, such as "Destroy my current goal system" or a time-dependent goal with an imminent expiry date. If the goal G is "be nice to humans" (in some more rigorous form, presumably) then this should persist thru revisions... a revision done with the goal of making the system even nicer to humans than before, should result in a system that still wants to be nice to humans... But what if it doesn't? What if iterative self-revision causes the system's goal G to "drift" over time... This could have all kinds of effects, from dangerous to wondrous I suspect this kind of drift is almost inevitable... Of course, one can seek to architect one's AGI system to mitigate against goal drift under iterative self-revisions. But algorithmic information theory comes up again, here. At some point, a self-revising AGI system, which adds new hardware onto itself periodically, will achieve a complexity (in the alg. info. theory sense) greater than that of the human brain. At this point, one can formally show, it is *impossible for humans to predict what it will do*. We just don't have the compute power in our measly little brains.... So we certainly can't be sure that goal drift won't occur in a system of superhuman complexity... This is an issue to be rethought again & again as AGI gets closer & closer... -- Ben ------- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/