dbsxdbsx opened a new issue #13397: Gluon: name issue about example"Actor 
Critic"
URL: https://github.com/apache/incubator-mxnet/issues/13397
 
 
   I am referring to the [gluon example:actor 
critic](https://github.com/apache/incubator-mxnet/blob/master/example/gluon/actor_critic/actor_critic.py).
   
   According to the code in `actor_critic.py`, the true returns of each states 
is calculated as:
   ```
           # reverse accumulate and normalize rewards
           running_reward = running_reward * 0.99 + t * 0.01
           R = 0
           for i in range(len(rewards)-1, -1, -1):
               R = rewards[i] + args.gamma * R
               rewards[i] = R
   ```
   ,which is an Monte Carlo method without bootstrapping.
   So I think the name should be `REINOFRCE with Baseline` but not `Actor 
Critic`. As stated in Section 13.5 of book [Reinforcement Learning: An 
Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf):
   ```
   Although the REINFORCE-with-baseline method learns both a policy and a 
state-value function, we
   do not consider it to be an actor–critic method because its state-value 
function is used only as a
   baseline, not as a critic. That is, it is not used for bootstrapping 
(updating the value estimate for
   a state from the estimated values of subsequent states), but only as a 
baseline for the state whose
   estimate is being updated.
   ```
   And I also found pytorch has the same issue with their examples. But anyway, 
it is  just a naming problem. If almost people think this should be also 
treated as `Actor Critic`. Then never mind~
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to