liuzh91 edited a comment on issue #16900: introduce  gradient update handler to 
the  base estimator
URL: https://github.com/apache/incubator-mxnet/pull/16900#issuecomment-558516425
 
 
   > Thank you for the improvement! 2 concerns.
   > Also could you point an example which require custom gradient handler? 
(gradient clipping or aggregation)
   
   Thank u for the review. 
   
   For the gradient update example, one use case of using gradient accumulation 
appears when training a transformer. 
(https://github.com/dmlc/gluon-nlp/blob/master/scripts/machine_translation/train_transformer.py#L320)
 Because the size of parameters in the transformer network is too large, we can 
compute gradient for a small batch of data examples during each iteration. In 
this case, the gradient is updated periodically on the weight parameters. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to