Re: [agi] Reward function vs utility

Joshua Fox Sun, 04 Jul 2010 01:41:40 -0700

Another point. I'm probably repeating the obvious, but perhaps this will be
useful to some.


On the one hand,  an agent could not game a Legg-like intelligence metric
by altering the utility function, even an internal one,, since the metric is
based on the function before any such change.

On the other hand, since an  internally-calculated utility function would
necessarily be a function of observations, rather than of actual world
state, it could be successfully gamed by altering observations.

This latter objection does not apply to functions which are externally
calculated, whether known or unknown.

Joshua



On Fri, Jul 2, 2010 at 7:23 PM, Joshua Fox <[email protected]> wrote:

> I found the answer as given by Legg, *Machine Superintelligence*, p. 72,
> copied below. A reward function is used to bypass potential difficulty in
> communicating a utility function to the agent.
>
> Joshua
>
> The existence of a goal raises the problem of how the agent knows what the
> goal is. One possibility would be for the goal to be known in advance and
> for this knowledge to be built into the agent. The problem with this is
> that
> it limits each agent to just one goal. We need to allow agents that are
> more
> flexible, specifically, we need to be able to inform the agent of what the
> goal
> is. For humans this is easily done using language. In general however, the
> possession of a suffciently high level of language is too strong an
> assumption
> to make about the agent. Indeed, even for something as intelligent as a dog
> or a cat, direct explanation is not very effective.
>
> Fortunately there is another possibility which is, in some sense, a blend
> of
> the above two. We define an additional communication channel with the sim-
> plest possible semantics: a signal that indicates how good the agent’s
> current
> situation is. We will call this signal the reward. The agent simply has to
> maximise the amount of reward it receives, which is a function of the goal.
> In
> a complex setting the agent might be rewarded for winning a game or solving
> a puzzle. If the agent is to succeed in its environment, that is, receive a
> lot of
> reward, it must learn about the structure of the environment and in
> particular
> what it needs to do in order to get reward.
>
>
>
>
> On Mon, Jun 28, 2010 at 1:32 AM, Ben Goertzel <[email protected]> wrote:
>
>> You can always build the utility function into the assumed universal
>> Turing machine underlying the definition of algorithmic information...
>>
>> I guess this will improve learning rate by some additive constant, in the
>> long run ;)
>>
>> ben
>>
>> On Sun, Jun 27, 2010 at 4:22 PM, Joshua Fox <[email protected]> wrote:
>>
>>> This has probably been discussed at length, so I will appreciate a
>>> reference on this:
>>>
>>> Why does Legg's definition of intelligence (following on Hutters' AIXI
>>> and related work) involve a reward function rather than a utility function?
>>> For this purpose, reward is a function of the word state/history which is
>>> unknown to the agent while  a utility function is known to the agent.
>>>
>>> Even if  we replace the former with the latter, we can still have a
>>> definition of intelligence that integrates optimization capacity over
>>> possible all utility functions.
>>>
>>> What is the real  significance of the difference between the two types of
>>> functions here?
>>>
>>> Joshua
>>>    *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
>>> <https://www.listbox.com/member/archive/rss/303/> | 
>>> Modify<https://www.listbox.com/member/?&;>Your Subscription
>>> <http://www.listbox.com>
>>>
>>
>>
>>
>> --
>> Ben Goertzel, PhD
>> CEO, Novamente LLC and Biomind LLC
>> CTO, Genescient Corp
>> Vice Chairman, Humanity+
>> Advisor, Singularity University and Singularity Institute
>> External Research Professor, Xiamen University, China
>> [email protected]
>>
>> "
>> “When nothing seems to help, I go look at a stonecutter hammering away at
>> his rock, perhaps a hundred times without as much as a crack showing in it.
>> Yet at the hundred and first blow it will split in two, and I know it was
>> not that blow that did it, but all that had gone before.”
>>
>>    *agi* | Archives <https://www.listbox.com/member/archive/303/=now>
>> <https://www.listbox.com/member/archive/rss/303/> | 
>> Modify<https://www.listbox.com/member/?&;>Your Subscription
>> <http://www.listbox.com>
>>
>
>



-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=8660244-6e7fb59c
Powered by Listbox: http://www.listbox.com

Re: [agi] Reward function vs utility

Reply via email to