gzuchow commented on issue #19065:
URL: 
https://github.com/apache/incubator-mxnet/issues/19065#issuecomment-709995786


   Thanks @access2rohit  for reporting this issue.
   
   I have tried to run your script to reproduce issue, with flag 
USE_INT64_TENSOR_SIZE.
   
   With this changes you can run it:
   
   ```
   import mxnet as mx
   from mxnet import np, npx
   Size0 = 2
   Size1 = 1000000000
   A = np.ones((Size0, Size1))
   gamma = np.ones((Size1))
   beta = np.zeros((Size1))
   mov_mean = np.ones((Size1))
   mov_var = np.ones((Size1))
   A.attach_grad() 
   with mx.autograd.record():
       B = npx.batch_norm(A, gamma, beta, mov_mean, mov_var, axis = 1)
   print("output={}".format(B))
   B.backward()
   print("gradient={}".format(A.grad))
   ```
   
   By this tweaks with a code there are no ```malloc_consolidate``` errors.  
Additional memory is required to execute calculation by oneDNN to split the 
calculation between threads. It is equal number of threads * size of input 
tensor along "axis". In case above it is ```Size1``` * "number of threads". So 
you can decrease decrease memory consumption by using less thread, by setting 
```OMP_NUM_THREADS```.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to