liuzh91 edited a comment on issue #16900: introduce gradient update handler to the base estimator URL: https://github.com/apache/incubator-mxnet/pull/16900#issuecomment-558516425 > Thank you for the improvement! 2 concerns. > Also could you point an example which require custom gradient handler? (gradient clipping or aggregation) Thank u for the review. For the gradient update example, one use case of using gradient accumulation appears when training a transformer. (https://github.com/dmlc/gluon-nlp/blob/master/scripts/machine_translation/train_transformer.py#L320) Because the size of parameters in the transformer network is too large, we can compute gradient for a small batch of data examples during each iteration. In this case, the gradient is updated periodically on the weight parameters.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
