sxjscience opened a new issue #18102:
URL: https://github.com/apache/incubator-mxnet/issues/18102


   The gradient of einsum is not reliable. The following is just one example. 
There are actually **multiple scenarios** in which the gradient is wrong. This 
operator has both performance issues as stated in 
https://github.com/apache/incubator-mxnet/issues/18043 and numeric problems.
   
   We should recommend the users **not to use the einsum in MXNet** util these 
issues are fixed.
   
   ```python
   
   import numpy as np
   import mxnet as mx
   from numpy.testing import assert_allclose
   mx.npx.set_np()
   
   ctx = mx.cpu()
   
   A = mx.np.random.normal(0, 1, (1, 1, 5, 3), ctx=ctx)
   B = mx.np.random.normal(0, 1, (1, 1, 3, 2), ctx=ctx)
   out_grad = mx.np.random.normal(0, 1, (1, 1, 5, 2), ctx=ctx)
   
   A.attach_grad()
   B.attach_grad()
   
   with mx.autograd.record():
       out = mx.np.einsum('bnij,bnjc->bnic', A, B)
       out.backward(out_grad)
   
   out_gt = A.asnumpy()[0, 0].dot(B.asnumpy()[0, 0])
   A_gt_grad = out_grad.asnumpy()[0, 0].dot(B.asnumpy()[0, 0].T)
   B_gt_grad = A.asnumpy()[0, 0].T.dot(out_grad.asnumpy()[0, 0])
   A_einsum_grad = A.grad.asnumpy()
   B_einsum_grad = B.grad.asnumpy()
   
   A.grad[:] = 0
   B.grad[:] = 0
   with mx.autograd.record():
       out = mx.np.matmul(A, B)
       out.backward(out_grad)
   A_matmul_grad = A.grad.asnumpy()
   B_matmul_grad = B.grad.asnumpy()
   
   
   assert_allclose(A_gt_grad, A_matmul_grad[0, 0], 1E-5, 1E-5)
   assert_allclose(B_gt_grad, B_matmul_grad[0, 0], 1E-5, 1E-5)
   assert_allclose(A_gt_grad, A_einsum_grad[0, 0], 1E-5, 1E-5)
   assert_allclose(B_gt_grad, B_einsum_grad[0, 0], 1E-5, 1E-5)
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to