szha opened a new issue #12859: proper support for multi-gpu training in clip_global_norm URL: https://github.com/apache/incubator-mxnet/issues/12859 currently the multi-gpu training is not well supported in clip_global_norm, as the most straightforward way of using clip_global_norm (i.e. `clip_global_norm([p.grad(ctx) for p in params for ctx in contexts]`) is incorrect, because it treats the different NDArrays on different contexts for the same parameter as different and independent parameters. imo clip_global_norm should take parameters instead of ndarrays so that the information of multiple gradient copies on different contexts is preserved.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
