anirudhacharya commented on issue #14568: NAG Optimizer with multi-precision 
support
URL: https://github.com/apache/incubator-mxnet/pull/14568#issuecomment-481925196
 
 
   @lupesko 
   Nesterov Accelerated Gradient or NAG is an improvement over SGD with 
Momentum optimizer. 
   
   As we know SGD with Momentum helps to accelerate the optimizers descent to 
the desired minima by reducing the oscillations in the parameter updates by 
adding a fraction of the update of the past time step to the update of the 
current time step. This enables faster convergence of the model. 
   
   But this can also cause the optimizer to overshoot the global minima due to 
larger parameter updates. NAG optimizer fixes this by changing the update rules 
to decelerate the optimizer as it nears the global minima. This has shown 
better performance while training RNNs as described here - 
https://arxiv.org/abs/1212.0901. The following diagram will illustrate the 
difference -
   
   ![](https://i.stack.imgur.com/YrpGA.png)
   (image source is a stack exchange thread)
   
   This PR also adds multi-precision support to the NAG optimizer, which is 
very useful while training in fp16 because multi-precision optimizers keep a 
copy of the weights in fp32 but performs backward pass and parameter updates in 
fp16. This technique prevents any loss in accuracy of the model while giving us 
significant benefits in improved memory and time taken for training. For more 
details on mixed precision training, please see here - 
https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to