> > Another reason that RL attracts computer guys is because it looks binary.
Anything looking like binary is definitely a turn off to me. I only go low-level out of necessity, and that's exceedingly rare. I like RL because it actually works (both theoretically and practically) and it fits nicely with what we know about dopamine-based and other forms of motivation in the brain. http://www.princeton.edu/~yael/Publications/Niv2009.pdf http://www.scholarpedia.org/article/Reward_signals http://www.hss.caltech.edu/~steve/schultz.pdf In regards to the idea that intrinsic rewards are somehow different from extrinsic ones, a reward signal can just as easily be modulated by internal events (thoughts) as external ones (percepts). Furthermore, if you read up on RL, you'll see that in all effective multi-step RL-style algorithms, there is a backward chaining of reward, so that previous behaviors or other early triggers for a behavior are rewarded, not just the immediate actions. All actions, whether extrinsically or intrinsically rewarding, derive their value from either immediate or indirect/backward-chained reward signals, which means we can modulate behavior arbitrarily to any level of complexity with relatively minimal difficulty by taking advantage of this backward chaining. Of course, RL is not the final answer as it currently stands; otherwise we would be seeing more of it in the news. What is missing is the formation of concepts. Once an effective concept-formation algorithm has been designed, concepts can trigger actions, and the degree of association of an action to a concept can be modulated by the reward signal. This captures the fundamental division between intelligence and behavior. Intelligence is necessary to implement complex behavior, but it is not sufficient. There must be goal-directedness built into the system, either through explicit goals in the form of goal states and search heuristics, implicit goals in the form of chained reward signals, or some hybrid or alternative. Otherwise, your super-intelligent robot is just going to sit there, potentially observing and understanding everything but doing nothing whatsoever about it. RL as a paradigm is simply a proven way to implement this part of the system, with some nice features including the ability to arbitrarily reshape the reward function on the fly if necessary. On Sun, Jan 27, 2013 at 8:35 PM, Jim Bromer <[email protected]> wrote: > On Sun, Jan 27, 2013 at 4:43 PM, Piaget Modeler <[email protected] > > wrote: > The Reinforcement Learning (RL) you discuss is called external > reinforcement. What happens when you move > to intrinsic (internal) reinforcement, where the reward function arises > from a robot solving it's own problems, > forming its own world model, and existing and participating in the world. > ------ > It is amazing that a fundamental insight like this is still treated as if > it were controversial. Part of the problem is that when people recognize > that they have to explain how superficial analysis of digitized Input > data is turned into insight and they need a souless answer they > are attracted by the most mundane method that strips out most presumptions > - like internal reasoning. Another reason that RL attracts computer guys > is because it looks binary. > > But when a computer is able to solve its own problems the effect of those > solutions can act as reinforcement. But, even beyond that, since complex > reasoning is feasible it does not have to be just a case of simple > reinforcement, but of complex encouragement that comes from conditional > projections of knowledge. So even though we cannot actually create an AGI > program right now, we are encouraged by our insights and the very limited > results that we can get now. Those "reinforcements" are not coming from > external events. In the worse case this internal motivation can be > delusional and that is one reason why external reinforcements can influence > us so much. > > Jim Bromer > > > > > On Sun, Jan 27, 2013 at 4:43 PM, Piaget Modeler <[email protected] > > wrote: > >> >> You're avoiding the question Aaron. The questinon I raised is an ethical >> one, and you're answering a technical one. >> >> You answered, "How can we prevent robots from desiring things like >> freedom or leisure or compensation?" >> >> I asked "What do we give robots when they ask for rights?" I mean, even >> animals have rights (PETA). >> Why shouldn't robots? >> >> The Reinforcement Learning (RL) you discuss is called external >> reinforcement. What happens when you move >> to intrinsic (internal) reinforcement, where the reward function arises >> from a robot solving it's own problems, >> forming its own world model, and existing and participating in the world. >> When the model consists of millions >> or billions of individual schemes (entities), how are you going to do >> surgery to extract those entities dealing >> with liberty, and justice, or fairness. And why would you want to? >> >> The real question is do you join (PETR - People for the Ethical Treatment >> of Robots) or not? >> Do you embrace robot slavery or not? And is some form of slavery the >> solution to global economy? >> >> ~PM >> >> ------------------------------ >> Date: Sun, 27 Jan 2013 13:01:39 -0600 >> Subject: Re: [agi] Robots and Slavery >> From: [email protected] >> To: [email protected] >> >> What if you didn't program a robot to desire its various freedom or >> leisure, >> but instead, they became sentient, and decided on their own that they want >> freedom, leisure, monetary compensation, and rights? >> >> >> In the field of Reinforcement Learning, which studies how to implement >> "wants" in software, there is a basic separation of every algorithm into >> two pieces: the part that does the learning & choosing (the agent), and the >> part that measures how well things are going (the reward function). The >> agent is the dynamic/intelligent part, and the reward function is a static >> function to be optimized. You can completely replace the reward function >> with a different one, and if the agent is well designed, it will learn a >> completely different set of behaviors to optimize the new reward function >> within the exact same environment. ( >> http://en.wikipedia.org/wiki/Reinforcement_learning) >> >> In our own brains, we have specialized areas that respond to certain >> types of stimuli and generate reward signals which are distributed >> throughout the brain. It is even possible to reshape a person's or animal's >> reward function using an external signal to override or add to our natural >> wants. (http://en.wikipedia.org/wiki/Brain_stimulation_reward) >> >> Intelligence is completely separable from desire. Both the system we >> intend to reverse engineer and the theory about how such systems work >> agree. If our robots were to decide they wanted freedom, leisure, monetary >> compensation, rights, or anything else we can think of, it would be because >> the reward function we gave them included some sort of incentive to seek >> those out. In other words, even if we didn't directly program them to want >> those things, we necessarily did so indirectly in the process of shaping >> the reward function. In either case, provided the structure of our programs >> reflect the theory and keep these components separated (which does not mean >> they can't interact or depend on each other's behavior, but rather means we >> bothered to keep our design appropriately modular), we can redesign and >> replace the reward function so that the robots no longer desire things we >> don't want them to desire. >> >> *AGI* | Archives <https://www.listbox.com/member/archive/303/=now> > <https://www.listbox.com/member/archive/rss/303/23050605-2da819ff> | > Modify<https://www.listbox.com/member/?&>Your Subscription > <http://www.listbox.com> > ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
