anirudh2290 opened a new issue #11568: Issues with spatial transformer op when 
cudnn disabled
URL: https://github.com/apache/incubator-mxnet/issues/11568
 
 
   ## Description
   as part of PR: #11470, it was found that spatial transformer op without 
cudnn enabled doesn't pass tests. 
   To reproduce try one of the two scripts below:
   
   Script 1:
   ```
   import numpy as np
   import mxnet as mx
   from mxnet.test_utils import assert_almost_equal, default_context
   
   np.set_printoptions(threshold=np.nan)
   num_filter = 2  # conv of loc net
   kernel = (3, 3)  # conv of loc net
   num_hidden = 6  # fc of loc net
   for n in [1, 2, 3, 4]:
       for c in [1, 2, 3, 4]:
           for h in [5, 9, 13, 17]:  # for convenience test, this third and 
forth input dim should be 4x + 1
               for w in [5, 9, 13, 17]:
                   data_shape = (n, c, h, w)
                   target_shape = (int((data_shape[2]+1)/2), 
int((data_shape[3]+1)/2))
                   data = mx.sym.Variable(name="data")
                   loc = mx.sym.Convolution(data=data, kernel=kernel, pad=(1, 
1), num_filter=num_filter, name="loc_conv")
                   loc = mx.sym.Flatten(data=loc)
                   loc = mx.sym.FullyConnected(data=loc, num_hidden=num_hidden, 
name="loc_fc")
                   stn = mx.sym.SpatialTransformer(data=data, loc=loc, 
target_shape=target_shape,
                                                   transform_type="affine", 
sampler_type="bilinear")
                   arg_names = stn.list_arguments()
                   arg_shapes, out_shapes, _ = stn.infer_shape(data=data_shape)
                   # check shape
                   assert out_shapes[0] == (data_shape[0], data_shape[1], 
target_shape[0], target_shape[1])
                   #dev = default_context()
                   dev = mx.gpu(0)
                   args = {}
                   args['data'] = mx.random.normal(0, 1, data_shape, 
ctx=mx.cpu()).copyto(dev)
                   args['loc_conv_weight'] = mx.nd.zeros((num_filter, 
data_shape[1], kernel[0], kernel[1]), ctx=dev)
                   args['loc_conv_bias'] = mx.nd.zeros((num_filter,), ctx=dev)
                   args['loc_fc_weight'] = mx.nd.zeros((6, 
num_filter*data_shape[2]*data_shape[3]), ctx=dev)
                   args['loc_fc_bias'] = mx.nd.array([0.5, 0, 0, 0, 0.5, 0], 
ctx=dev)
                   grad_grad = [mx.nd.zeros(shape, ctx=dev) for shape in 
arg_shapes]
                   exe = stn.bind(dev, args=args, args_grad=grad_grad)
                   exe.forward(is_train=True)
                   out = exe.outputs[0].asnumpy()
                   # check forward
                   assert_almost_equal(out, args['data'].asnumpy()[:, :, 
h//4:h-h//4, w//4:w-w//4], rtol=1e-2, atol=1e-4)
                   out_grad = mx.nd.ones(out.shape, ctx=dev)
                   exe.backward([out_grad])
                   # check backward
                   assert_almost_equal(out_grad.asnumpy(), 
grad_grad[0].asnumpy()[:, :, h//4:h-h//4, w//4:w-w//4], rtol=1e-2, atol=1e-4)
   ```
   
   Result:
   
   ```
   AssertionError:
   Items are not equal:
   Error 9999.758789 exceeds tolerance rtol=0.010000, atol=0.000100.  Location 
of maximum error:(0, 0, 0, 0), a=1.000000, b=0.000000
    a: array([[[[1., 1., 1., ..., 1., 1., 1.],
            [1., 1., 1., ..., 1., 1., 1.],
            [1., 1., 1., ..., 1., 1., 1.]]]], dtype=float32)
    b: array([[[[0.00000024, 0.99999976, 1.        , ..., 1.        ,
             1.        , 1.        ],
            [0.00000024, 0.99999976, 1.        , ..., 1.        ,...
   ```
   
   Script 2:
   
   ```
   import mxnet as mx
   import numpy as np
   from mxnet.test_utils import check_consistency
   
   data = mx.sym.Variable('data')
   loc = mx.sym.Flatten(data)
   loc = mx.sym.FullyConnected(data=loc, num_hidden=10)
   loc = mx.sym.Activation(data=loc, act_type='relu')
   loc = mx.sym.FullyConnected(data=loc, num_hidden=6)
   sym = mx.sym.SpatialTransformer(data=data, loc=loc, target_shape=(10, 10),
                                   transform_type="affine", 
sampler_type="bilinear")
   ctx_list = [{'ctx': mx.gpu(0), 'data': (1, 5, 10, 10), 'type_dict': {'data': 
np.float64}},
               {'ctx': mx.cpu(0), 'data': (1, 5, 10, 10), 'type_dict': {'data': 
np.float64}}]
   check_consistency(sym, ctx_list)
   check_consistency(sym, ctx_list, grad_req="add")
   ```
   Result:
   
   ```
   Traceback (most recent call last):
     File "test_spatial_transformer.py", line 14, in <module>
       check_consistency(sym, ctx_list)
     File "/home/ubuntu/sparse_support/mxnet/python/mxnet/test_utils.py", line 
1356, in check_consistency
       gtarr = gt[name].astype(dtypes[i]).asnumpy()
     File "/home/ubuntu/sparse_support/mxnet/python/mxnet/ndarray/ndarray.py", 
line 1910, in asnumpy
       ctypes.c_size_t(data.size)))
     File "/home/ubuntu/sparse_support/mxnet/python/mxnet/base.py", line 210, 
in check_call
       raise MXNetError(py_str(_LIB.MXGetLastError()))
   mxnet.base.MXNetError: [21:50:56] 
/home/ubuntu/sparse_support/mxnet/3rdparty/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh:167:
 Check failed: err == cudaSuccess (7 vs. 0) Name: MapRedKeepLowestKernel 
ErrStr:too many resources requested for launch
   
   Stack trace returned 10 entries:
   [bt] (0) 
/home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x54)
 [0x7feab9a7b97d]
   [bt] (1) 
/home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x2a)
 [0x7feab9a7bc64]
   [bt] (2) 
/home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(void 
mshadow::cuda::MapReduceKeepLowest<mshadow::sv::saveto, mshadow::red::sum, 
mshadow::Tensor<mshadow::gpu, 1, double>, mshadow::Tensor<mshadow::gpu, 2, 
double>, double>(mshadow::expr::Plan<mshadow::Tensor<mshadow::gpu, 1, double>, 
double>, mshadow::expr::Plan<mshadow::Tensor<mshadow::gpu, 2, double>, double> 
const&, double, mshadow::Shape<2>, CUstream_st*)+0x2ca) [0x7feaba0b9007]
   [bt] (3) 
/home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(void 
mshadow::MapReduceKeepLowest<mshadow::sv::saveto, mshadow::red::sum, 
mshadow::Tensor<mshadow::gpu, 1, double>, double, mshadow::Tensor<mshadow::gpu, 
2, double>, 0>(mshadow::TRValue<mshadow::Tensor<mshadow::gpu, 1, double>, 
mshadow::gpu, 1, double>*, mshadow::expr::Exp<mshadow::Tensor<mshadow::gpu, 2, 
double>, double, 0> const&, double)+0x39b) [0x7feaba0b8249]
   [bt] (4) 
/home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(mshadow::expr::ExpComplexEngine<mshadow::sv::saveto,
 mshadow::Tensor<mshadow::gpu, 1, double>, 
mshadow::expr::ReduceTo1DExp<mshadow::Tensor<mshadow::gpu, 2, double>, double, 
mshadow::red::sum, 1>, double>::Eval(mshadow::Tensor<mshadow::gpu, 1, double>*, 
mshadow::expr::ReduceTo1DExp<mshadow::Tensor<mshadow::gpu, 2, double>, double, 
mshadow::red::sum, 1> const&)+0x37) [0x7feaba0b729b]
   [bt] (5) 
/home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(void 
mshadow::expr::ExpEngine<mshadow::sv::saveto, mshadow::Tensor<mshadow::gpu, 1, 
double>, 
double>::Eval<mshadow::expr::ReduceTo1DExp<mshadow::Tensor<mshadow::gpu, 2, 
double>, double, mshadow::red::sum, 1> >(mshadow::Tensor<mshadow::gpu, 1, 
double>*, 
mshadow::expr::Exp<mshadow::expr::ReduceTo1DExp<mshadow::Tensor<mshadow::gpu, 
2, double>, double, mshadow::red::sum, 1>, double, 7> const&)+0x37) 
[0x7feaba0b5a1c]
   [bt] (6) 
/home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(mshadow::Tensor<mshadow::gpu,
 1, double>& mshadow::expr::RValueExp<mshadow::Tensor<mshadow::gpu, 1, double>, 
double>::__assign<mshadow::expr::ReduceTo1DExp<mshadow::Tensor<mshadow::gpu, 2, 
double>, double, mshadow::red::sum, 1>, 
7>(mshadow::expr::Exp<mshadow::expr::ReduceTo1DExp<mshadow::Tensor<mshadow::gpu,
 2, double>, double, mshadow::red::sum, 1>, double, 7> const&)+0x37) 
[0x7feaba0b4d49]
   [bt] (7) 
/home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(mshadow::Tensor<mshadow::gpu,
 1, double>& mshadow::Tensor<mshadow::gpu, 1, 
double>::operator=<mshadow::expr::ReduceTo1DExp<mshadow::Tensor<mshadow::gpu, 
2, double>, double, mshadow::red::sum, 1>, 
7>(mshadow::expr::Exp<mshadow::expr::ReduceTo1DExp<mshadow::Tensor<mshadow::gpu,
 2, double>, double, mshadow::red::sum, 1>, double, 7> const&)+0x23) 
[0x7feaba0b465b]
   [bt] (8) 
/home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(void 
mxnet::op::FCBackward<mshadow::gpu, double>(mxnet::OpContext const&, 
mxnet::op::FullyConnectedParam const&, std::vector<mxnet::TBlob, 
std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::TBlob, 
std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, 
std::allocator<mxnet::TBlob> > const&)+0xafd) [0x7feaba0b2f99]
   [bt] (9) 
/home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(void 
mxnet::op::FullyConnectedGradCompute<mshadow::gpu>(nnvm::NodeAttrs const&, 
mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> 
> const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > 
const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0x4b0) 
[0x7feaba0ad474]
   
   ```
   
   
   ## Environment info (Required)
   
   ```
   What to do:
   1. Download the diagnosis script from 
https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
   2. Run the script using `python diagnose.py` and paste its output here.
   
   ```
   
   Package used (Python/R/Scala/Julia):
   (I'm using ...)
   
   For Scala user, please provide:
   1. Java version: (`java -version`)
   2. Maven version: (`mvn -version`)
   3. Scala runtime if applicable: (`scala -version`)
   
   For R user, please provide R `sessionInfo()`:
   
   ## Build info (Required if built from source)
   
   Compiler (gcc/clang/mingw/visual studio):
   
   MXNet commit hash:
   (Paste the output of `git rev-parse HEAD` here.)
   
   Build config:
   (Paste the content of config.mk, or the build command.)
   
   ## Error Message:
   (Paste the complete error message, including stack trace.)
   
   ## Minimum reproducible example
   (If you are using your own code, please provide a short script that 
reproduces the error. Otherwise, please provide link to the existing example.)
   
   ## Steps to reproduce
   (Paste the commands you ran that produced the error.)
   
   1.
   2.
   
   ## What have you tried to solve it?
   
   1.
   2.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to