## Description
batchnorm from scratch with autograd gives very different gradient from
mx.nd.BatchNorm. Forward results are OK.
## Environment info (Required)
macOS 10.13 with mxnet 1.2.1 cpu version from pip.
Package used (Python/R/Scala/Julia):
Python
## Error Message:
(Paste the complete error message, including stack trace.)
## Minimum reproducible example
```python
import mxnet as mx
def batch_norm_nd(x, gamma, beta, eps=1e-5):
mean = mx.nd.mean(x, axis=(0, 2, 3), keepdims=True)
var = mx.nd.mean((x - mean) ** 2, axis=(0, 2, 3), keepdims=True)
x_hat = (x - mean) / mx.nd.sqrt(var + eps)
return x_hat * gamma + beta
if __name__ == "__main__":
x = mx.nd.random_uniform(low=1, high=2, shape=(2, 16, 4, 4))
gamma = mx.nd.ones(shape=(1, 16, 1, 1))
beta = mx.nd.zeros(shape=(1, 16, 1, 1))
mmean = mx.nd.zeros(shape=(1, 16, 1, 1))
mvar = mx.nd.zeros(shape=(1, 16, 1, 1))
x.attach_grad()
gamma.attach_grad()
beta.attach_grad()
with mx.autograd.record(train_mode=True):
y = mx.nd.BatchNorm(x, gamma, beta, mmean, mvar, fix_gamma=False,
use_global_stats=False)
y.backward(mx.nd.ones_like(y))
y2 = y.copy()
x2_grad = x.grad.copy()
with mx.autograd.record(train_mode=True):
y = batch_norm_nd(x, gamma, beta)
y.backward(mx.nd.ones_like(y))
y1 = y.copy()
x1_grad = x.grad.copy()
print((y2 / y1)[0, 1])
print((x2_grad / x1_grad)[0, 1])
```
results:
```
[[0.99354386 0.9935453 0.993546 0.9935485 ]
[0.99354345 0.9935435 0.993581 0.9935487 ]
[0.9935372 0.99354607 0.9935438 0.9935436 ]
[0.9935449 0.9935456 0.993545 0.9935423 ]]
<NDArray 4x4 @cpu(0)>
[[-3.6692393 -3.6692448 -3.669247 -3.669256 ]
[-3.6692376 -3.6692383 -3.6693766 -3.6692567]
[-3.6692145 -3.6692476 -3.669239 -3.6692383]
[-3.669243 -3.6692457 -3.6692433 -3.6692333]]
<NDArray 4x4 @cpu(0)>
```
## Steps to reproduce
(Paste the commands you ran that produced the error.)
1.
2.
## What have you tried to solve it?
1.
2.
[ Full content available at:
https://github.com/apache/incubator-mxnet/issues/12369 ]
This message was relayed via gitbox.apache.org for [email protected]