Mike, that's the point. Previously we have only a few values, like,
say, -1 for death. RL calculates the remaining values. Example: we
don't have a instinct giving any value for shooting our own head. But
when we see what happens when someone does it (death), the value is
generated, and it will be almost -1 (for sure there's other things
helping on this example: there must be a way to perceive that the
other's actions results may apply to ours).
A real example (but not for real life): TD-Gammon is a game that
learned to play gammon playing against only itself. The only
informations it had were victory conditions. At this time, it was the
best gammon AI (now, with some improvements, it has a performance
comparable to the best 3 human players of the world).
I suggest you to read about Monte Carlo method and Temporal
Differences to understand it better.

On 6/23/07, Mike Tintner <[EMAIL PROTECTED]> wrote:
I don't quite understand this. We continually decide between different
actions - and, I would argue, very crudely in line with Expected Utility
Theory -  we do this by evaluating the options. In that sense, there is
definitely a need to assign values to our every potential action.  But my
point is: we do it extremely crudely - more crudely than I'm aware any
program does.


----- Original Message -----
From: "Rafael C.P."
> In RL there's no need to assign values to everything, they are derived
> from the basic values (i.e. instinct: life, reproduction, food, etc.).
> If an individual's organism produces good sensations for ice cream,
> it's producing good reinforcement value for ice cream. The organism
> can have different values for different food based on it's value for
> life. For example, in general, poison tastes bitter and bitter, for
> most people, is bad. Sugar is energy, and sweet things are, in
> general, good. The same for fat. For the sexual positions, what counts
> are the sensations, as with food. And what's not in our instincts is
> derived from experience (near things in time and space contribute to
> the value). This is the way psicologic traumas are created, also (a
> side effect).
>
> On 6/23/07, Mike Tintner <[EMAIL PROTECTED]> wrote:
>>
>> C P Raphael: > "Reinforcement learning is a simple theory that *only*
>> solves
>> problems
>> > for which we can design value functions."
>> > In other words... almost anything in real life...
>> >
>> What about if the values are EXTREMELY crude and fluctuating - like the
>> value to you of Mars ice cream vs Ben & Jerry's Phish food or whatever,
>> and
>> the value of this sexual position vs that one?
>>
>> That is, after all, one of the primary functions of emotions - to serve
>> as
>> extremely crude and fluctuating evaluations of different actions -
>> comparisons that are so crude often as to be pre-mathematical. "How much
>> do
>> you like that ice cream?" "Well, I like it 'a lot'." "And that one?"
>> "Well,
>> a lot too. But maybe 'a bit more'."
>>
>> Could reinforcement learning still embrace such crudities - or would you
>> need a totally different kind of programming?
>>
>>
>> -----
>> This list is sponsored by AGIRI: http://www.agiri.org/email
>> To unsubscribe or change your options, please go to:
>> http://v2.listbox.com/member/?&;
>>
>
>
> --
> =========
> Rafael C.P.
> =========
>
> -----
> This list is sponsored by AGIRI: http://www.agiri.org/email
> To unsubscribe or change your options, please go to:
> http://v2.listbox.com/member/?&;
>
>
>
> --
> Internal Virus Database is out-of-date.
> Checked by AVG Free Edition. Version: 7.5.472 / Virus Database:
> 269.8.15/848 - Release Date: 13/06/2007 12:50
>
>


-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&;



--
=========
Rafael C.P.
=========

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=231415&user_secret=e9e40a7e

Reply via email to