FoConrad commented on issue #10563: Suboptimal performance implementing PPO with Adam Optimizer URL: https://github.com/apache/incubator-mxnet/issues/10563#issuecomment-425615751 The way I debugged the implementation is similar to the code I posted above. I ran the OpenAI baselines code with my implementation of PPO, made sure they were initialized the same, and stepped through comparing the weights and gradients. I found immediately my value function was incorrect. Also, double check your initialization to begin with. Sometimes PPO was very sensitive to the weight initialisation. Hope this helps!
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
