x110 opened a new issue #12899: Gradient of BatchNorm layer URL: https://github.com/apache/incubator-mxnet/issues/12899 I have a network consisting only of a batchnorm layer. The gradient that I get for batchnorm0_gamma after running a backward pass is different than the one I computed manually. Please advise. ``` import mxnet as mx import numpy as np X = mx.nd.array([[ 0.18527887],[-1.23678724]]) Y = mx.nd.array([[ 2.57767984],[-1.55019435]]) #define network source = mx.sym.Variable("data") target = mx.sym.Variable("softmax_label") network = mx.sym.BatchNorm(source) network=mx.sym.LinearRegressionOutput(network,target) input_shapes = {'data': (2, 1), 'softmax_label': (2, 1)} exe = network.simple_bind(ctx=mx.cpu(), **input_shapes) arg_arrays = dict(zip(network.list_arguments(), exe.arg_arrays)) x = arg_arrays['data'] t = arg_arrays['softmax_label'] #forward pass x[:] = X t[:] = Y y = exe.forward(is_train=True) #backwardpass exe.backward() exe.grad_dict['batchnorm0_beta'],exe.grad_dict['batchnorm0_gamma'] ``` The output I get is: ( [-1.0274856] <NDArray 1 @cpu(0)>, [0.] <NDArray 1 @cpu(0)>) When calculating the gradient manually, the output i get is: ``` xi = x.asnumpy() a = np.mean(xi) b = np.var(xi) xn = (xi-a)/np.sqrt(b+1e-5) beta, alpha = exe.arg_dict['batchnorm0_beta'].asnumpy(),exe.arg_dict['batchnorm0_gamma'].asnumpy() ynorm = alpha * xn+beta #backwardpass manually 2*np.mean((ynorm-t.asnumpy())),2*np.mean((ynorm-t.asnumpy())*xn) ``` (-1.0274856090545654, -2.127872943878174) The first gradient is same but the second is not.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
