Re: [agi] RL Does Not Fully Explain Inner Direction

Aaron Hosford Wed, 30 Jan 2013 11:22:51 -0800

>
> there are also other methods of reinforcing behavior.  Like rehearsal,
> preparation and practice.  You do not need external reinforcements to
> modify those kinds of behavior that can be modified by these other kinds of
> behaviors. I think many Behaviorists would have argued that the
> internalized reward systems that weren't unconditioned behaviors were
> acquired through external reinforcement, but another way to see it is that
> external rewards only contributed to shaping the goals.

I am not a behaviorist. I won't argue with you that there is more to the
story than mere stimulus/response/reward-based update. We are more than
enormous lookup tables for expected rewards for stimulus/response pairs.
However, just because something isn't the complete answer doesn't mean it
isn't a vital part of it. Reward-based behavioral preference is fundamental
to intelligent behavior, and RL provides the theory and conceptual
framework for implementing that, even if current RL-based algorithms are
insufficient to the task.

Consider a system where concept activations predict both rewards and other
future concept activations. Then, by chaining these predictions together
(an interleaving of proposed actions & expected consequences), a plan can
be built to achieve a particular goal independently of whether it would be
rewarding to do so, and the value of the plan can be determined by stepping
through it and averaging the reward predictions.

If someone chooses to rehearse or practice something, they can be motivated
by rewards expected for the goal concept (proficiency at behavior X) rather
than by the additional rewards (or, more likely, the counterbalanced costs)
of actually performing the behavior over and over. Proficiency comes about
through the successive improvement of predictions via testing against
experience. Furthermore, the reward for proficiency or other concepts can
be triggered merely by the activation of the concept in an appropriate
setting (i.e. one evaluated as really happening vs. imagined), rather than
an external stimulus generated by the behavior, resulting in an
intrinsically rewarding behavior.

external rewards can be made more complicated then simply shaping a string
> of behaviors including any that were incidental and not instrumental in
> producing the behavior.  This means that the accumulated reward for a kind
> of behavior is not merely the Bayesian evaluation of external rewards for
> that behavior.  For example, since internally directed rewards can be
> promoted (by the individual) a person can combine different behaviors by
> just considering the possibility that the behaviors (or ideas) could be
> combined and then through rehearsing, exploring and drawing conclusions new
> behaviors could be reinforced without external rewards.

I'm not sure what you mean by promoting internally directed rewards.

Credit assignment is a problem that must be dealt with for any
goal-directed system. In order to generate an efficient plan based on past
experience, we must be able to identify which acts or behaviors actually
led to the goal in the past, and which were incidental. That's the purpose
of the backwards chaining I mentioned before.

If reward prediction is tied to concept activation, concept activations are
predicted by their predecessors, and dynamic search is performed to
identify plans of action based on both types of prediction, it's possible
to take into account new information connecting behaviors that have never
been tried together and come up with a fairly accurate reward expectation
for that combination nonetheless. For example, if I learn that (1) throwing
things in the direction of a person or animal tends to scare them and (2) a
particular type of animal likes to snack on people, then later I can
dynamically put those pieces of information together with (3) a hardwired
highly negative reward associated with the concept of getting eaten and (4)
the aforementioned unpleasant type of animal looking hungrily at me, to
generate a plan for throwing something at the animal in the hopes of not
getting eaten, even though I have never tried that particular action before.

On Wed, Jan 30, 2013 at 7:49 AM, Jim Bromer <[email protected]> wrote:

> The point is not that such things could not be implemented by RL, it is
> that there are also other methods of reinforcing behavior.  Like rehearsal,
> preparation and practice.  You do not need external reinforcements to
> modify those kinds of behavior that can be modified by these other kinds of
> behaviors. I think many Behaviorists would have argued that the
> internalized reward systems that weren't unconditioned behaviors were
> acquired through external reinforcement, but another way to see it is that
> external rewards only contributed to shaping the goals.  And the other
> point that I was making is that external rewards can be made more
> complicated then simply shaping a string of behaviors including any that
> were incidental and not instrumental in producing the behavior.  This means
> that the accumulated reward for a kind of behavior is not merely the
> Bayesian evaluation of external rewards for that behavior.  For example,
> since internally directed rewards can be promoted (by the individual) a
> person can combine different behaviors by just considering the possibility
> that the behaviors (or ideas) could be combined and then through
> rehearsing, exploring and drawing conclusions new behaviors could be
> reinforced without external rewards.
> Jim Bromer
>
> On Tue, Jan 29, 2013 at 11:58 AM, Aaron Hosford <[email protected]>wrote:
>
>> Once again, I never said RL was all that was needed. I included concept
>> formation (a.k.a cognitive structure formation) as a requirement. Cognitive
>> structures provide the understanding, RL or other goal-directed mechanisms
>> provide the will. I fail to see what about regulation and compensation is
>> not implemented by RL, aside from the formation of the concepts/cognitive
>> structures necessary to distinguish the appropriate circumstances for an
>> action to be performed, which I already acknowledged.
>>
>>
>>
>>
>> On Tue, Jan 29, 2013 at 10:44 AM, Piaget Modeler <
>> [email protected]> wrote:
>>
>>>  You need more than just reinforcement learning.  You need "regulation"
>>> and "compensation" psychological terms Piaget used.
>>>
>>> Regulation is the correction of failed behaviors or reinforcement of
>>> successful behaviors.
>>> Compensation is the inversion of failed behaviors or the elimination of
>>> undesirable side effects.
>>>
>>> Both regulation and compensation should be intrinsic in the cognitive
>>> system, and in my view, should build new cognitive structures
>>> tightly integrated into existing and new behaviors.   This is way more
>>> than Reinforcement Learning.
>>>
>>> ~PM
>>>
>>> ------------------------------
>>> Date: Tue, 29 Jan 2013 08:54:39 -0500
>>> Subject: [agi] RL Does Not Fully Explain Inner Direction
>>> From: [email protected]
>>> To: [email protected]
>>>
>>>
>>> On Mon, Jan 28, 2013 at 6:21 PM, Aaron Hosford <[email protected]>wrote:
>>>
>>> In regards to the idea that intrinsic rewards are somehow different from
>>> extrinsic ones, a reward signal can just as easily be modulated by internal
>>> events (thoughts) as external ones (percepts). Furthermore, if you read up
>>> on RL, you'll see that in all effective multi-step RL-style algorithms,
>>> there is a backward chaining of reward, so that previous behaviors or other
>>> early triggers for a behavior are rewarded, not just the immediate actions.
>>> All actions, whether extrinsically or intrinsically rewarding, derive their
>>> value from either immediate or indirect/backward-chained reward signals,
>>> which means we can modulate behavior arbitrarily to any level of complexity
>>> with relatively minimal difficulty by taking advantage of this backward
>>> chaining.
>>>
>>> Well the fact that backwards chaining of the actions leading up to a
>>> rewarded behavior is an interesting point. And while anyone with a little
>>> imagination could come up with a creative means to develop a way to use RL
>>> to reinforce complex behaviors based on parts of a behavior string that is
>>> reinforced this is not explained by the backward-chained reward signals
>>> that you mentioned.
>>>  But looking beyond that the claim that any internal motivation could
>>> be explained by external reinforcement is unnecessarily complicated because
>>> it is dependent on external rewards which would demand that things like the
>>> massive levels of complexity of infinitesimal past rewards could explain
>>> inner direction. This is the same problem as insisting that Bayesian
>>> Reasoning along with some priors are all that is necessary to explain human
>>> intelligence. Sorry but it just does not work - unless you change the
>>> presumptions of what is meant by Reinforcement Learning or Bayesian
>>> Reasoning. (Which is ok, I am just saying...)
>>>  Jim Bromer
>>>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
>>> <https://www.listbox.com/member/archive/rss/303/19999924-5cfde295> |
>>> Modify <https://www.listbox.com/member/?&;> Your Subscription
>>> <http://www.listbox.com>
>>>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
>>> <https://www.listbox.com/member/archive/rss/303/23050605-2da819ff> |
>>> Modify <https://www.listbox.com/member/?&;> Your Subscription
>>> <http://www.listbox.com>
>>>
>>
>>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
>> <https://www.listbox.com/member/archive/rss/303/10561250-470149cf> |
>> Modify <https://www.listbox.com/member/?&;> Your Subscription
>> <http://www.listbox.com>
>>
>
>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/23050605-2da819ff> |
> Modify<https://www.listbox.com/member/?&;>Your Subscription
> <http://www.listbox.com>
>

-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Re: [agi] RL Does Not Fully Explain Inner Direction

Reply via email to