yukang2017 opened a new issue #11683: time cost in grad clipping URL: https://github.com/apache/incubator-mxnet/issues/11683 I compare the speed of grad clipping between pytorch(0.3.1) and mxnet-gluon(1.2.0) on a Titan X. Pytorch gets nearly 10 times faster than mxnet-gluon. (The network I use is [nasnet](https://arxiv.org/abs/1707.07012).) mxnet: ``` grads = [i.grad(ctx) for i in model.collect_params().values() if i._grad is not None] gluon.utils.clip_global_norm(grads, args.grad_clip) ``` pytorch: `nn.utils.clip_grad_norm(model.parameters(), args.grad_clip) ` However, when I test 'clip_gradient' in mxnet.gluon.Trainer and torch.optim, following thomelane's reply in the [issue](https://github.com/apache/incubator-mxnet/issues/11508#issuecomment-404675656). The time cost is similar. ``` mxnet.gluon.Trainer(net.collect_params(), optimizer='sgd', optimizer_params={'learning_rate': 0.1, 'clip_gradient':5}, kvstore='device') ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services