shesung opened a new issue #16297: incorrect grad of gluon.nn.BatchNorm when scale=False URL: https://github.com/apache/incubator-mxnet/issues/16297 When using gluon.nn.BatchNorm(scale=False) on gpu, the computed grad for beta is not correct. The grad of beta seem to be accumulated between iterations. When setting scale=True or running on cpu, it goes correctly. This problem may make network hard to converge during trainning. ## Environment info (Required) CentOS Linux release 7.2.1511 (Core) GTX 1080Ti Driver Version: 384.69 CUDA Version 9.0.176 installed with pip: numpy 1.17.2 mxnet-cu90 1.5.0 ## Code In this example, the grad of beta shuold be [1, 1, 1] at each iteration. ```python import mxnet as mx from mxnet import gluon, autograd ctx = mx.gpu() x = mx.nd.ones((1,3,1,1), ctx=ctx) net = gluon.nn.BatchNorm(scale=False, epsilon=2e-5, momentum=0.0) net.initialize(ctx=ctx) trainer = gluon.Trainer(params=net.collect_params(), optimizer='sgd', optimizer_params={'learning_rate': 0.01, 'wd': 0.0005, 'momentum': 0.9}) net.hybridize() for i in range(10): with autograd.record(): out = net(x) out.backward() trainer.step(x.shape[0]) for name, param in net.collect_params().items(): if 'beta' in name: print(name, param.grad(ctx).asnumpy()) ``` output: ``` batchnorm0_beta [1. 1. 1.] batchnorm0_beta [2. 2. 2.] batchnorm0_beta [3. 3. 3.] batchnorm0_beta [4. 4. 4.] batchnorm0_beta [5. 5. 5.] batchnorm0_beta [6. 6. 6.] batchnorm0_beta [7. 7. 7.] batchnorm0_beta [8. 8. 8.] batchnorm0_beta [9. 9. 9.] batchnorm0_beta [10. 10. 10.] ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
