FoConrad commented on issue #10563: Suboptimal  performance implementing PPO 
with Adam Optimizer
URL: 
https://github.com/apache/incubator-mxnet/issues/10563#issuecomment-425206192
 
 
   It turns out the primary cause for our performance problem was solved after 
making this post, but I was still stuck on tracing down the cause of the weight 
divergence. Digging in to both implementation of Adam, it seemed that, at least 
algebraically, both were computing the same thing (and, in the example above, 
all hyper-parameters were set the same). 
   
   My best guess for the weight divergence is simply the order of operations in 
which things are calculated. Once weights start to diverge (and they diverge 
between MXNet and TF even after a single tanh) to a large enough amount, then 
they will continue to diverge as the gradients of each will be different.
   
   This is not a very satisfying answer, but seems to be the case.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to