Mike, that's the point. Previously we have only a few values, like, say, -1 for death. RL calculates the remaining values. Example: we don't have a instinct giving any value for shooting our own head. But when we see what happens when someone does it (death), the value is generated, and it will be almost -1 (for sure there's other things helping on this example: there must be a way to perceive that the other's actions results may apply to ours). A real example (but not for real life): TD-Gammon is a game that learned to play gammon playing against only itself. The only informations it had were victory conditions. At this time, it was the best gammon AI (now, with some improvements, it has a performance comparable to the best 3 human players of the world). I suggest you to read about Monte Carlo method and Temporal Differences to understand it better.
On 6/23/07, Mike Tintner <[EMAIL PROTECTED]> wrote:
I don't quite understand this. We continually decide between different actions - and, I would argue, very crudely in line with Expected Utility Theory - we do this by evaluating the options. In that sense, there is definitely a need to assign values to our every potential action. But my point is: we do it extremely crudely - more crudely than I'm aware any program does. ----- Original Message ----- From: "Rafael C.P." > In RL there's no need to assign values to everything, they are derived > from the basic values (i.e. instinct: life, reproduction, food, etc.). > If an individual's organism produces good sensations for ice cream, > it's producing good reinforcement value for ice cream. The organism > can have different values for different food based on it's value for > life. For example, in general, poison tastes bitter and bitter, for > most people, is bad. Sugar is energy, and sweet things are, in > general, good. The same for fat. For the sexual positions, what counts > are the sensations, as with food. And what's not in our instincts is > derived from experience (near things in time and space contribute to > the value). This is the way psicologic traumas are created, also (a > side effect). > > On 6/23/07, Mike Tintner <[EMAIL PROTECTED]> wrote: >> >> C P Raphael: > "Reinforcement learning is a simple theory that *only* >> solves >> problems >> > for which we can design value functions." >> > In other words... almost anything in real life... >> > >> What about if the values are EXTREMELY crude and fluctuating - like the >> value to you of Mars ice cream vs Ben & Jerry's Phish food or whatever, >> and >> the value of this sexual position vs that one? >> >> That is, after all, one of the primary functions of emotions - to serve >> as >> extremely crude and fluctuating evaluations of different actions - >> comparisons that are so crude often as to be pre-mathematical. "How much >> do >> you like that ice cream?" "Well, I like it 'a lot'." "And that one?" >> "Well, >> a lot too. But maybe 'a bit more'." >> >> Could reinforcement learning still embrace such crudities - or would you >> need a totally different kind of programming? >> >> >> ----- >> This list is sponsored by AGIRI: http://www.agiri.org/email >> To unsubscribe or change your options, please go to: >> http://v2.listbox.com/member/?& >> > > > -- > ========= > Rafael C.P. > ========= > > ----- > This list is sponsored by AGIRI: http://www.agiri.org/email > To unsubscribe or change your options, please go to: > http://v2.listbox.com/member/?& > > > > -- > Internal Virus Database is out-of-date. > Checked by AVG Free Edition. Version: 7.5.472 / Virus Database: > 269.8.15/848 - Release Date: 13/06/2007 12:50 > > ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?&
-- ========= Rafael C.P. ========= ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=231415&user_secret=e9e40a7e
