1041

State value function


V_Pi(s) = E_Pi[G_t|S_t = s]

The policy value of state s is the expected policy return if the agent
starts at state s.

The equation doesn't seem very helfpul.

Second description of equation:
For each state, the state-value fucntion outputs the expected return,
if the agent starts in that state, and then follows the policy forever
after.

A graphic is shown of a mouse in a tiny maze finding cheese. Each
square has a number representing the negative number of steps needed
to reach the cheese. This step is the value.

At this point, it is pretty easy to imagine a function recursively
updating the values of every square in such a maze until they
stabilise, to arrive at the picture.
  • [ot][spam] Behavior Log For... Undiscussed Horrific Abuse, One Victim of Many
    • Re: [ot][spam] Behavio... Undiscussed Horrific Abuse, One Victim of Many
      • Re: [ot][spam] Beh... Undiscussed Horrific Abuse, One Victim of Many
        • Re: [ot][spam]... Undiscussed Horrific Abuse, One Victim of Many
          • Re: [ot][s... Undiscussed Horrific Abuse, One Victim of Many
            • Re: [... Undiscussed Horrific Abuse, One Victim of Many
              • R... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many
                • ... Undiscussed Horrific Abuse, One Victim of Many

Reply via email to