gzuchow commented on issue #19065:
URL:
https://github.com/apache/incubator-mxnet/issues/19065#issuecomment-709995786
Thanks @access2rohit for reporting this issue.
I have tried to run your script to reproduce issue, with flag
USE_INT64_TENSOR_SIZE.
With this changes you can run it:
```
import mxnet as mx
from mxnet import np, npx
Size0 = 2
Size1 = 1000000000
A = np.ones((Size0, Size1))
gamma = np.ones((Size1))
beta = np.zeros((Size1))
mov_mean = np.ones((Size1))
mov_var = np.ones((Size1))
A.attach_grad()
with mx.autograd.record():
B = npx.batch_norm(A, gamma, beta, mov_mean, mov_var, axis = 1)
print("output={}".format(B))
B.backward()
print("gradient={}".format(A.grad))
```
By this tweaks with a code there are no ```malloc_consolidate``` errors.
Additional memory is required to execute calculation by oneDNN to split the
calculation between threads. It is equal number of threads * size of input
tensor along "axis". In case above it is ```Size1``` * "number of threads". So
you can decrease decrease memory consumption by using less thread, by setting
```OMP_NUM_THREADS```.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]