liuzh91 commented on issue #16900: introduce  gradient update handler to the  
base estimator
URL: https://github.com/apache/incubator-mxnet/pull/16900#issuecomment-558516425
 
 
   > Thank you for the improvement! 2 concerns.
   > Also could you point an example which require custom gradient handler? 
(gradient clipping or aggregation)
   
   Thank u for the review. 
   
   For the gradient update example, one use case of using gradient accumulation 
appears when training a transformer. 
(https://github.com/dmlc/gluon-nlp/blob/master/scripts/machine_translation/train_transformer.py#L320)
 Because the size of parameters in the transformer network is too large, we can 
compute gradient for a small batch of data examples. In this case, the gradient 
is updated periodically on the weight parameters. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to