I think I'm beginning to see my confusion which stems from maybe more abstraction than I can process. Thanks Bill for your patience. If I may...

1. Hutter is credited with the Universal AI concept involving maximizing reward. But, how does one determine what is "rewarding" and to what extent one reward is better than another?

2. I see the idea that one use observation to populate environment. (ch 3 pg 22) But, observation is tied closely with this idea of reward, and that is the "ethical" dilemma - who and what defines reward?

3. In chapter two the environment is a given. There is a hint that the environment is not simple... "In a practical AI system, like a self-driving car, observations and actions are complex computer data structures, and the probability distribution ρ (h) is expressed in a massive database and millions of lines of code." Yet, we expect an AI to observe and learn this environment by it's own observation? (I see that chapter 5 deals with this...)

Let me throw up the white flag - I can see that chapter 5 is one that I need to study more closely to get closer to the ethical questions I have.

I am curious though at how one determines what constitutes "reward." Part of my interest in watching AI is because I want to know what the smartest "unit" in the world discovers, or considers, to be the ultimate reward / value.

Thanks again for patience and the cycles spent.
Stan


On 11/14/2014 02:23 PM, Bill Hibbard via AGI wrote:
On Fri, 14 Nov 2014, Stanley Nilsen via AGI wrote:
Okay, I'll see if I can grasp 2.3 and 2.4.
Perhaps you can lessen my pain by telling me which equations address a risk factor?

Sorry to hear this is causing you pain.

Risk is in equaton (2.4):

v(ha) = \sum_{o \in O} \rho(o | ha) v(hao)

In English, this says that the value of an action
a after histor h, denoted v(ha), is von Neumann
and Morgenstern's lottery of possible outcomes
from that action. The possible outcomes are the
hao, for different observations o \in O. Each
outcome hao has value v(hao) and probability
\rho(o | ha).

Risk comes in because some outcomes may have very
low value v(hao). Those values are multiplied by
the probability of the outcome, denoted
\rho(o | ha). The sum adds up the good outcomes
(high v(hao)) and the bad outcomes (low v(hao)),
multiplied by their probabilities, so get an
expected value v(ha) of the action a.

So the sum is balancing risk (low values v(hao))
against reward (high values v(hao)). Then
equations (2.3) and (2.5) choose the action that
maximizes expected value.

Cheers,
Bill


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/9320387-ea529a81
Modify Your Subscription: https://www.listbox.com/member/?&;
Powered by Listbox: http://www.listbox.com





-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Reply via email to