The way I debugged the implementation is similar to the code I posted above. I 
ran the OpenAI baselines code with my implementation of PPO, made sure they 
were initialized the same, and stepped through comparing the weights and 
gradients.

I found immediately my value function was incorrect.

Also, double check your initialization to begin with. Sometimes PPO was very 
sensitive to the weight initialisation.

Hope this helps!

[ Full content available at: 
https://github.com/apache/incubator-mxnet/issues/10563 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to