dbsxdbsx opened a new issue #13397: Gluon: name issue about example"Actor Critic" URL: https://github.com/apache/incubator-mxnet/issues/13397 I am referring to the [gluon example:actor critic](https://github.com/apache/incubator-mxnet/blob/master/example/gluon/actor_critic/actor_critic.py). According to the code in `actor_critic.py`, the true returns of each states is calculated as: ``` # reverse accumulate and normalize rewards running_reward = running_reward * 0.99 + t * 0.01 R = 0 for i in range(len(rewards)-1, -1, -1): R = rewards[i] + args.gamma * R rewards[i] = R ``` ,which is an Monte Carlo method without bootstrapping. So I think the name should be `REINOFRCE with Baseline` but not `Actor Critic`. As stated in Section 13.5 of book [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf): ``` Although the REINFORCE-with-baseline method learns both a policy and a state-value function, we do not consider it to be an actor–critic method because its state-value function is used only as a baseline, not as a critic. That is, it is not used for bootstrapping (updating the value estimate for a state from the estimated values of subsequent states), but only as a baseline for the state whose estimate is being updated. ``` And I also found pytorch has the same issue with their examples. But anyway, it is just a naming problem. If almost people think this should be also treated as `Actor Critic`. Then never mind~
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
