>
> Another reason that RL attracts computer guys is because it looks binary.


Anything looking like binary is definitely a turn off to me. I only go
low-level out of necessity, and that's exceedingly rare. I like RL because
it actually works (both theoretically and practically) and it fits nicely
with what we know about dopamine-based and other forms of motivation in the
brain.

http://www.princeton.edu/~yael/Publications/Niv2009.pdf
http://www.scholarpedia.org/article/Reward_signals
http://www.hss.caltech.edu/~steve/schultz.pdf

In regards to the idea that intrinsic rewards are somehow different from
extrinsic ones, a reward signal can just as easily be modulated by internal
events (thoughts) as external ones (percepts). Furthermore, if you read up
on RL, you'll see that in all effective multi-step RL-style algorithms,
there is a backward chaining of reward, so that previous behaviors or other
early triggers for a behavior are rewarded, not just the immediate actions.
All actions, whether extrinsically or intrinsically rewarding, derive their
value from either immediate or indirect/backward-chained reward signals,
which means we can modulate behavior arbitrarily to any level of complexity
with relatively minimal difficulty by taking advantage of this backward
chaining.

Of course, RL is not the final answer as it currently stands; otherwise we
would be seeing more of it in the news. What is missing is the formation of
concepts. Once an effective concept-formation algorithm has been designed,
concepts can trigger actions, and the degree of association of an action to
a concept can be modulated by the reward signal. This captures the
fundamental division between intelligence and behavior. Intelligence is
necessary to implement complex behavior, but it is not sufficient. There
must be goal-directedness built into the system, either through explicit
goals in the form of goal states and search heuristics, implicit goals in
the form of chained reward signals, or some hybrid or alternative.
Otherwise, your super-intelligent robot is just going to sit there,
potentially observing and understanding everything but doing nothing
whatsoever about it. RL as a paradigm is simply a proven way to implement
this part of the system, with some nice features including the ability to
arbitrarily reshape the reward function on the fly if necessary.


On Sun, Jan 27, 2013 at 8:35 PM, Jim Bromer <[email protected]> wrote:

> On Sun, Jan 27, 2013 at 4:43 PM, Piaget Modeler <[email protected]
> > wrote:
> The Reinforcement Learning (RL) you discuss is called external
> reinforcement. What happens when you move
> to intrinsic (internal) reinforcement, where the reward function arises
> from a robot solving it's own problems,
> forming its own world model, and existing and participating in the world.
> ------
> It is amazing that a fundamental insight like this is still treated as if
> it were controversial. Part of the problem is that when people recognize
> that they have to explain how superficial analysis of digitized Input
> data is turned into insight and they need a souless answer they
> are attracted by the most mundane method that strips out most presumptions
> - like internal reasoning.  Another reason that RL attracts computer guys
> is because it looks binary.
>
> But when a computer is able to solve its own problems the effect of those
> solutions can act as reinforcement. But, even beyond that, since complex
> reasoning is feasible it does not have to be just a case of simple
> reinforcement, but of complex encouragement that comes from conditional
> projections of knowledge.  So even though we cannot actually create an AGI
> program right now, we are encouraged by our insights and the very limited
> results that we can get now.  Those "reinforcements" are not coming from
> external events.  In the worse case this internal motivation can be
> delusional and that is one reason why external reinforcements can influence
> us so much.
>
> Jim Bromer
>
>
>
>
> On Sun, Jan 27, 2013 at 4:43 PM, Piaget Modeler <[email protected]
> > wrote:
>
>>
>> You're avoiding the question Aaron. The questinon I raised is an ethical
>> one, and you're answering a technical one.
>>
>> You answered, "How can we prevent robots from desiring things like
>> freedom or leisure or compensation?"
>>
>> I asked "What do we give robots when they ask for rights?" I mean, even
>> animals have rights (PETA).
>> Why shouldn't robots?
>>
>> The Reinforcement Learning (RL) you discuss is called external
>> reinforcement. What happens when you move
>> to intrinsic (internal) reinforcement, where the reward function arises
>> from a robot solving it's own problems,
>> forming its own world model, and existing and participating in the world.
>> When the model consists of millions
>> or billions of individual schemes (entities), how are you going to do
>> surgery to extract those entities dealing
>> with liberty, and justice, or fairness. And why would you want to?
>>
>> The real question is do you join (PETR - People for the Ethical Treatment
>> of Robots) or not?
>> Do you embrace robot slavery or not? And is some form of slavery the
>> solution to global economy?
>>
>> ~PM
>>
>> ------------------------------
>> Date: Sun, 27 Jan 2013 13:01:39 -0600
>> Subject: Re: [agi] Robots and Slavery
>> From: [email protected]
>> To: [email protected]
>>
>> What if you didn't program a robot to desire its various freedom or
>> leisure,
>> but instead, they became sentient, and decided on their own that they want
>> freedom, leisure, monetary compensation, and rights?
>>
>>
>> In the field of Reinforcement Learning, which studies how to implement
>> "wants" in software, there is a basic separation of every algorithm into
>> two pieces: the part that does the learning & choosing (the agent), and the
>> part that measures how well things are going (the reward function). The
>> agent is the dynamic/intelligent part, and the reward function is a static
>> function to be optimized. You can completely replace the reward function
>> with a different one, and if the agent is well designed, it will learn a
>> completely different set of behaviors to optimize the new reward function
>> within the exact same environment. (
>> http://en.wikipedia.org/wiki/Reinforcement_learning)
>>
>> In our own brains, we have specialized areas that respond to certain
>> types of stimuli and generate reward signals which are distributed
>> throughout the brain. It is even possible to reshape a person's or animal's
>> reward function using an external signal to override or add to our natural
>> wants. (http://en.wikipedia.org/wiki/Brain_stimulation_reward)
>>
>> Intelligence is completely separable from desire. Both the system we
>> intend to reverse engineer and the theory about how such systems work
>> agree. If our robots were to decide they wanted freedom, leisure, monetary
>> compensation, rights, or anything else we can think of, it would be because
>> the reward function we gave them included some sort of incentive to seek
>> those out. In other words, even if we didn't directly program them to want
>> those things, we necessarily did so indirectly in the process of shaping
>> the reward function. In either case, provided the structure of our programs
>> reflect the theory and keep these components separated (which does not mean
>> they can't interact or depend on each other's behavior, but rather means we
>> bothered to keep our design appropriately modular), we can redesign and
>> replace the reward function so that the robots no longer desire things we
>> don't want them to desire.
>>
>>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/23050605-2da819ff> |
> Modify<https://www.listbox.com/member/?&;>Your Subscription
> <http://www.listbox.com>
>



-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Reply via email to