Which reward function are you referring to? reward for what specifically? When the cognitive system is pursuing multiple parallel goals at different levels of abstraction, which specific function are you controlling?
~PM Date: Tue, 29 Jan 2013 14:58:30 -0600 Subject: Re: [agi] Robots and Slavery From: [email protected] To: [email protected] People have formative years because it's in their genetic best interest to stop exploring new options and start exploiting known ones, due to a limited lifetime and the need to reliably reproduce for themselves. We can't reset that explore/exploit trade-off in people (yet), but in machines there's no reason to make that control inaccessible to ourselves. It's a good thing machines aren't children. In most RL algorithms, there are two key system parameters that allow learning to be modulated: the reward expectation learning/update rate, and the exploration rate. Raising these two values causes the system to learn faster but make more mistakes. Lowering them causes the system to be more stable but learn more slowly. An analysis would have to be done to determine whether the costs/dangers of a system's behavioral aberrations due to a misshapen reward function outweigh the costs/dangers of raising the learning & exploration rates while the system relearns the reward function after modification. On Tue, Jan 29, 2013 at 2:39 PM, Piaget Modeler <[email protected]> wrote: People procreate. And for a certain period of time they have influence over their creation (children).But then, children grow up and take responsibility for their own lives, and we no longer have control. It's in those formative years that you have influence. Similarly, when you create developmental AI, you have some period during the formative years to influence the later behavior of the cognitive system. But you don't have control, and you wouldn't expect to either. That's why rights are important. Date: Tue, 29 Jan 2013 14:11:05 -0600 Subject: Re: [agi] Robots and Slavery From: [email protected] To: [email protected] If we can build a system capable of determining the value of concepts automatically, we can build a system that can readjust those values automatically, too. If that's not feasible for the design, it's an unsafe design, and you shouldn't have the expectation that it will act as you intend it to. You wouldn't get in a car without a steering wheel, would you? Would you trust an even more powerful and dangerous machine to just do the right thing, with no controls? Let's not build any machines of this uncontrollable nature. ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
