Re: [agi] RL Does Not Fully Explain Inner Direction

Aaron Hosford Wed, 30 Jan 2013 16:03:55 -0800

I see no reason to disagree with that, but I would place a restriction on
it: We cannot make an unpleasant concept into a pleasant one without
finding some sort of outcome of that unpleasant concept that leads to
another concept sufficiently pleasant to overcome its distastefulness.
(Likewise, the dual of this statement is also true: Pleasant concepts
require association with unpleasant ones to become so themselves.)


So I can convince myself that not procrastinating is a good thing, despite
the fact that procrastination feels good at the moment, because I can
logically connect it to higher productivity if I refrain. But I cannot *ever
*convince myself that murder is a good thing, because I cannot logically
connect it to any concept with sufficiently positive expectation to
overcome the displeasure that I would feel for having done such a thing, no
matter how I try to rationalize it. (This may not be true for a person
whose moral judgment corresponds to a shallower gradient in their reward
function.)

The previous statement implies that there is a necessarily an awareness of
expected reward for some behaviors or concepts despite never having
actually tried them. I think the brain has certain hardwired modules for
assigning expected reward to certain concepts/behaviors a priori, including
many associated with morality, sexual behavior, and personal harm/danger.
These hardwired reward expectations typically have relatively extreme
values and affect behavior powerfully by instilling strong compulsions or
aversions, depending on their polarity. Any system we build which is
sufficiently embodied to significantly threaten the life of a human being
will need such modules built into it, as well, to prevent catastrophe.



On Wed, Jan 30, 2013 at 5:13 PM, Jim Bromer <[email protected]> wrote:

> Aaron,
> I did not misunderstand you. I am just saying...
>
> I see the fundamental problem as one of interpreting observable
> (superficial) occurrences as understanding.  Is RL the fundamental agent or
> an essential agent of that process? I don't think it is.
>
> The observation and consideration of (relatively) complicated data-events
> is often rewarding.  The thing that I imagine the behaviorists would say (I
> am not saying you are a behaviorist) was that the ability to appreciate
> complicated events was premised on conditioned responses of the past.  I
> was saying, no not necessarily.  If it is reasonable to believe that we can
> combine behaviors (or attitudes and thoughts) and speculate on that
> combination then that shows that our behavior is not completely dependent
> on unconditioned and conditioned responses.  Then I am saying that this
> goes beyond the possibility that we can find some novel action rewarding
> even though we never have been rewarded for it.  If you accept the idea
> that we can combine memories or thoughts about different kinds of events,
> then even if you say that we are fusing the expectations of the different
> kinds of events, you still have acknowledged that we can shape our
> behaviors in novel - and unexpected ways - without external rewards
> (additional external rewards).  Now if this can occur through a fusion of
> recall and expectation, then why not admit that with another modifying
> thought-behavior we could further shape our behavior by simply thinking
> about it in the right way. And through practice and using writing or other
> memory aids we could make this a process capable of rendering insights of
> greater complexity.
> Jim Bromer
>
>
> On Wed, Jan 30, 2013 at 4:40 PM, Aaron Hosford <[email protected]>wrote:
>
>>  the theory that "Reinforcement Learning" can be used to
>>> derive "understanding" a little dubious
>>
>>
>> I was actually saying that concepts provide understanding, and RL
>> provides motivational direction (preference) to behavior. RL certainly
>> cannot produce understanding, only value judgments for things that have
>> already been understood. I think we have been talking past each other on
>> this point, without realizing we were already in agreement.
>>
>> We can derive unexpected "rewards" from the observation or consideration
>>> of (relatively) complex events. This cannot be all due to simplistic
>>> external reinforcements or else it would be easily demonstrated with
>>> computers.  This kind of reasoning suggests that concepts themselves can
>>> renforce behaviors.
>>
>>
>> Can you give an example of an unexpected reward derived from the
>> observation of complex events?
>>
>> I don't think that concepts reinforce behaviors. I think they play a role
>> however. I could see that the activation of a concept with high expected
>> reward could transfer its expected benefits to whatever behaviors would
>> activate it, such as practicing to achieve competency in a skill, but this
>> is just the backwards chaining of rewards I already put forth, as applied
>> to complex concept interactions. The important shift from the limited
>> capabilities of an ordinary RL system to the robust capabilities of the AGI
>> system I'm envisioning is dependent not on additional mechanisms besides RL
>> for credit assignment, but rather on additional mechanisms for concept
>> formation and interconnection. Without effective concepts in place to act
>> on, RL is going to look rather unimpressive, as it does in its current
>> state.
>>
>>
>>
>> On Wed, Jan 30, 2013 at 2:23 PM, Jim Bromer <[email protected]> wrote:
>>
>>> On Wed, Jan 30, 2013 at 2:22 PM, Aaron Hosford <[email protected]>wrote:
>>> If reward prediction is tied to concept activation, concept activations
>>> are predicted by their predecessors, and dynamic search is performed to
>>> identify plans of action based on both types of prediction, it's possible
>>> to take into account new information connecting behaviors that have never
>>> been tried together and come up with a fairly accurate reward expectation
>>> for that combination nonetheless. For example, if I learn that (1) throwing
>>> things in the direction of a person or animal tends to scare them and (2) a
>>> particular type of animal likes to snack on people, then later I can
>>> dynamically put those pieces of information together with (3) a hardwired
>>> highly negative reward associated with the concept of getting eaten and (4)
>>> the aforementioned unpleasant type of animal looking hungrily at me, to
>>> generate a plan for throwing something at the animal in the hopes of not
>>> getting eaten, even though I have never tried that particular action before.
>>> ----------------------------------------
>>>
>>>
>>> Exactly. And you can further elaborate on this imagined combination.
>>> This is what I meant by the statement, "since internally directed rewards
>>> can be promoted (by the individual) a person can combine different
>>> behaviors by just considering the possibility that the behaviors (or ideas)
>>> could be combined and then through rehearsing, exploring and drawing
>>> conclusions new behaviors could be reinforced without external rewards."
>>>
>>> If you wanted to, you could take your imaginary combination of
>>> behaviors and try to better simulate them to see what happens.  But the
>>> idea that comprehending a response to a simulated event is the same as an
>>> elementary reward is a stretch since it has turned out to be so difficult
>>> to get a computer program to comprehend something.  I have never said that
>>> you should not use RL (or Bayesian L) I am saying that the philosophical
>>> problems that are presented given the difficulty of the task of getting
>>> computers to "understand" something (to look at one case) makes the theory
>>> that "Reinforcement Learning" can be used to derive "understanding"
>>> a little dubious.  We can derive unexpected "rewards" from the observation
>>> or consideration of (relatively) complex events. This cannot be all due to
>>> simplistic external reinforcements or else it would be easily demonstrated
>>> with computers.  This kind of reasoning suggests that concepts themselves
>>> can renforce behaviors.
>>>
>>> Jim Bromer
>>>
>>> On Wed, Jan 30, 2013 at 2:22 PM, Aaron Hosford <[email protected]>wrote:
>>>
>>>> there are also other methods of reinforcing behavior.  Like rehearsal,
>>>>> preparation and practice.  You do not need external reinforcements to
>>>>> modify those kinds of behavior that can be modified by these other kinds 
>>>>> of
>>>>> behaviors. I think many Behaviorists would have argued that the
>>>>> internalized reward systems that weren't unconditioned behaviors were
>>>>> acquired through external reinforcement, but another way to see it is that
>>>>> external rewards only contributed to shaping the goals.
>>>>
>>>>
>>>> I am not a behaviorist. I won't argue with you that there is more to
>>>> the story than mere stimulus/response/reward-based update. We are more than
>>>> enormous lookup tables for expected rewards for stimulus/response pairs.
>>>> However, just because something isn't the complete answer doesn't mean it
>>>> isn't a vital part of it. Reward-based behavioral preference is fundamental
>>>> to intelligent behavior, and RL provides the theory and conceptual
>>>> framework for implementing that, even if current RL-based algorithms are
>>>> insufficient to the task.
>>>>
>>>> Consider a system where concept activations predict both rewards and
>>>> other future concept activations. Then, by chaining these predictions
>>>> together (an interleaving of proposed actions & expected consequences), a
>>>> plan can be built to achieve a particular goal independently of whether it
>>>> would be rewarding to do so, and the value of the plan can be determined by
>>>> stepping through it and averaging the reward predictions.
>>>>
>>>> If someone chooses to rehearse or practice something, they can be
>>>> motivated by rewards expected for the goal concept (proficiency at behavior
>>>> X) rather than by the additional rewards (or, more likely, the
>>>> counterbalanced costs) of actually performing the behavior over and over.
>>>> Proficiency comes about through the successive improvement of predictions
>>>> via testing against experience. Furthermore, the reward for proficiency or
>>>> other concepts can be triggered merely by the activation of the concept in
>>>> an appropriate setting (i.e. one evaluated as really happening vs.
>>>> imagined), rather than an external stimulus generated by the behavior,
>>>> resulting in an intrinsically rewarding behavior.
>>>>
>>>>
>>>> external rewards can be made more complicated then simply shaping a
>>>>> string of behaviors including any that were incidental and not 
>>>>> instrumental
>>>>> in producing the behavior.  This means that the accumulated reward for a
>>>>> kind of behavior is not merely the Bayesian evaluation of external rewards
>>>>> for that behavior.  For example, since internally directed rewards can be
>>>>> promoted (by the individual) a person can combine different behaviors by
>>>>> just considering the possibility that the behaviors (or ideas) could be
>>>>> combined and then through rehearsing, exploring and drawing conclusions 
>>>>> new
>>>>> behaviors could be reinforced without external rewards.
>>>>
>>>>
>>>> I'm not sure what you mean by promoting internally directed rewards.
>>>>
>>>> Credit assignment is a problem that must be dealt with for any
>>>> goal-directed system. In order to generate an efficient plan based on past
>>>> experience, we must be able to identify which acts or behaviors actually
>>>> led to the goal in the past, and which were incidental. That's the purpose
>>>> of the backwards chaining I mentioned before.
>>>>
>>>> If reward prediction is tied to concept activation, concept activations
>>>> are predicted by their predecessors, and dynamic search is performed to
>>>> identify plans of action based on both types of prediction, it's possible
>>>> to take into account new information connecting behaviors that have never
>>>> been tried together and come up with a fairly accurate reward expectation
>>>> for that combination nonetheless. For example, if I learn that (1) throwing
>>>> things in the direction of a person or animal tends to scare them and (2) a
>>>> particular type of animal likes to snack on people, then later I can
>>>> dynamically put those pieces of information together with (3) a hardwired
>>>> highly negative reward associated with the concept of getting eaten and (4)
>>>> the aforementioned unpleasant type of animal looking hungrily at me, to
>>>> generate a plan for throwing something at the animal in the hopes of not
>>>> getting eaten, even though I have never tried that particular action 
>>>> before.
>>>>
>>>>
>>>> On Wed, Jan 30, 2013 at 7:49 AM, Jim Bromer <[email protected]>wrote:
>>>>
>>>>> The point is not that such things could not be implemented by RL, it
>>>>> is that there are also other methods of reinforcing behavior.  Like
>>>>> rehearsal, preparation and practice.  You do not need external
>>>>> reinforcements to modify those kinds of behavior that can be modified by
>>>>> these other kinds of behaviors. I think many Behaviorists would have 
>>>>> argued
>>>>> that the internalized reward systems that weren't unconditioned behaviors
>>>>> were acquired through external reinforcement, but another way to see it is
>>>>> that external rewards only contributed to shaping the goals.  And the 
>>>>> other
>>>>> point that I was making is that external rewards can be made more
>>>>> complicated then simply shaping a string of behaviors including any that
>>>>> were incidental and not instrumental in producing the behavior.  This 
>>>>> means
>>>>> that the accumulated reward for a kind of behavior is not merely the
>>>>> Bayesian evaluation of external rewards for that behavior.  For example,
>>>>> since internally directed rewards can be promoted (by the individual) a
>>>>> person can combine different behaviors by just considering the possibility
>>>>> that the behaviors (or ideas) could be combined and then through
>>>>> rehearsing, exploring and drawing conclusions new behaviors could be
>>>>> reinforced without external rewards.
>>>>>  Jim Bromer
>>>>>
>>>>> On Tue, Jan 29, 2013 at 11:58 AM, Aaron Hosford 
>>>>> <[email protected]>wrote:
>>>>>
>>>>>> Once again, I never said RL was all that was needed. I included
>>>>>> concept formation (a.k.a cognitive structure formation) as a requirement.
>>>>>> Cognitive structures provide the understanding, RL or other goal-directed
>>>>>> mechanisms provide the will. I fail to see what about regulation and
>>>>>> compensation is not implemented by RL, aside from the formation of the
>>>>>> concepts/cognitive structures necessary to distinguish the appropriate
>>>>>> circumstances for an action to be performed, which I already 
>>>>>> acknowledged.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 29, 2013 at 10:44 AM, Piaget Modeler <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>>  You need more than just reinforcement learning.  You need
>>>>>>> "regulation" and "compensation" psychological terms Piaget used.
>>>>>>>
>>>>>>> Regulation is the correction of failed behaviors or reinforcement of
>>>>>>> successful behaviors.
>>>>>>> Compensation is the inversion of failed behaviors or the elimination
>>>>>>> of undesirable side effects.
>>>>>>>
>>>>>>> Both regulation and compensation should be intrinsic in the
>>>>>>> cognitive system, and in my view, should build new cognitive structures
>>>>>>> tightly integrated into existing and new behaviors.   This is way
>>>>>>> more than Reinforcement Learning.
>>>>>>>
>>>>>>> ~PM
>>>>>>>
>>>>>>> ------------------------------
>>>>>>> Date: Tue, 29 Jan 2013 08:54:39 -0500
>>>>>>> Subject: [agi] RL Does Not Fully Explain Inner Direction
>>>>>>> From: [email protected]
>>>>>>> To: [email protected]
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 28, 2013 at 6:21 PM, Aaron Hosford 
>>>>>>> <[email protected]>wrote:
>>>>>>>
>>>>>>> In regards to the idea that intrinsic rewards are somehow different
>>>>>>> from extrinsic ones, a reward signal can just as easily be modulated by
>>>>>>> internal events (thoughts) as external ones (percepts). Furthermore, if 
>>>>>>> you
>>>>>>> read up on RL, you'll see that in all effective multi-step RL-style
>>>>>>> algorithms, there is a backward chaining of reward, so that previous
>>>>>>> behaviors or other early triggers for a behavior are rewarded, not just 
>>>>>>> the
>>>>>>> immediate actions. All actions, whether extrinsically or intrinsically
>>>>>>> rewarding, derive their value from either immediate or
>>>>>>> indirect/backward-chained reward signals, which means we can modulate
>>>>>>> behavior arbitrarily to any level of complexity with relatively minimal
>>>>>>> difficulty by taking advantage of this backward chaining.
>>>>>>>
>>>>>>> Well the fact that backwards chaining of the actions leading up to a
>>>>>>> rewarded behavior is an interesting point. And while anyone with a 
>>>>>>> little
>>>>>>> imagination could come up with a creative means to develop a way to use 
>>>>>>> RL
>>>>>>> to reinforce complex behaviors based on parts of a behavior string that 
>>>>>>> is
>>>>>>> reinforced this is not explained by the backward-chained reward signals
>>>>>>> that you mentioned.
>>>>>>>  But looking beyond that the claim that any internal motivation
>>>>>>> could be explained by external reinforcement is unnecessarily 
>>>>>>> complicated
>>>>>>> because it is dependent on external rewards which would demand that 
>>>>>>> things
>>>>>>> like the massive levels of complexity of infinitesimal past rewards 
>>>>>>> could
>>>>>>> explain inner direction. This is the same problem as insisting that
>>>>>>> Bayesian Reasoning along with some priors are all that is necessary to
>>>>>>> explain human intelligence. Sorry but it just does not work - unless you
>>>>>>> change the presumptions of what is meant by Reinforcement Learning or
>>>>>>> Bayesian Reasoning. (Which is ok, I am just saying...)
>>>>>>>  Jim Bromer
>>>>>>>    *AGI* | Archives<https://www.listbox.com/member/archive/303/=now>
>>>>>>> <https://www.listbox.com/member/archive/rss/303/19999924-5cfde295>|
>>>>>>> Modify <https://www.listbox.com/member/?&;> Your Subscription
>>>>>>> <http://www.listbox.com>
>>>>>>>    *AGI* | Archives<https://www.listbox.com/member/archive/303/=now>
>>>>>>> <https://www.listbox.com/member/archive/rss/303/23050605-2da819ff>|
>>>>>>> Modify <https://www.listbox.com/member/?&;> Your Subscription
>>>>>>> <http://www.listbox.com>
>>>>>>>
>>>>>>
>>>>>>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
>>>>>> <https://www.listbox.com/member/archive/rss/303/10561250-470149cf> |
>>>>>> Modify <https://www.listbox.com/member/?&;> Your Subscription
>>>>>> <http://www.listbox.com>
>>>>>>
>>>>>
>>>>>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
>>>>> <https://www.listbox.com/member/archive/rss/303/23050605-2da819ff> |
>>>>> Modify <https://www.listbox.com/member/?&;> Your Subscription
>>>>> <http://www.listbox.com>
>>>>>
>>>>
>>>>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
>>>> <https://www.listbox.com/member/archive/rss/303/10561250-470149cf> |
>>>> Modify <https://www.listbox.com/member/?&;> Your Subscription
>>>> <http://www.listbox.com>
>>>>
>>>
>>>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
>>> <https://www.listbox.com/member/archive/rss/303/23050605-2da819ff> |
>>> Modify <https://www.listbox.com/member/?&;> Your Subscription
>>> <http://www.listbox.com>
>>>
>>
>>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
>> <https://www.listbox.com/member/archive/rss/303/10561250-470149cf> |
>> Modify <https://www.listbox.com/member/?&;> Your Subscription
>> <http://www.listbox.com>
>>
>
>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/23050605-2da819ff> |
> Modify<https://www.listbox.com/member/?&;>Your Subscription
> <http://www.listbox.com>
>



-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Re: [agi] RL Does Not Fully Explain Inner Direction

Reply via email to