yifeim edited a comment on issue #10563: Suboptimal performance implementing PPO with Adam Optimizer URL: https://github.com/apache/incubator-mxnet/issues/10563#issuecomment-424978630 The PPO paper primarily depended on SGD and used Adam only as an alternative for better performance. Given he online nature of the problem, I would be surprised if SGD makes a fundamental difference. Also, while the KL term stabilizes the objective and good to have, PPO may be too conservative if there is no explicit exploration. Weight divergence is expected in the end: any optimal policies must be deterministic, i.e. saturate (except in adversarial bandits). There were some reproducibility discussions around PPO and TRPO. You may want to try a few more seeds on the original baseline as well. My 2cents.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
