TaoLv opened a new issue #13421: MKL-DNN deconvolution runs into crash
URL: https://github.com/apache/incubator-mxnet/issues/13421
 
 
   Note: Providing complete information in the most concise form is the best 
way to get help. This issue template serves as the checklist for essential 
information to most of the technical issues and bug reports. For non-technical 
issues and feature requests, feel free to present the information in what you 
believe is the best form.
   
   For Q & A and discussion, please start a discussion thread at 
https://discuss.mxnet.io 
   
   ## Description
   MKL-DNN deconvolution might run into crash since commit 
91c536d2b3fe14fb84dff568a4a2ea240ea5ab31. I think it should be fixed before 
1.4.0 code freezing.
   
   ## Environment info (Required)
   
   ```
   What to do:
   1. Download the diagnosis script from 
https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
   2. Run the script using `python diagnose.py` and paste its output here.
   
   ```
   
   Package used (Python/R/Scala/Julia):
   Python
   
   For Scala user, please provide:
   1. Java version: (`java -version`)
   2. Maven version: (`mvn -version`)
   3. Scala runtime if applicable: (`scala -version`)
   
   For R user, please provide R `sessionInfo()`:
   
   ## Build info (Required if built from source)
   
   Compiler (gcc/clang/mingw/visual studio): gcc 4.8.5
   
   MXNet commit hash:
   (Paste the output of `git rev-parse HEAD` here.)
   91c536d2b3fe14fb84dff568a4a2ea240ea5ab31
   
   Build config:
   (Paste the content of config.mk, or the build command.)
   make -j20 USE_OPENCV=1 USE_MKLDNN=1 USE_BLAS=mkl USE_PROFILER=1 DEBUG=1
   
   ## Error Message:
   $ python deconv.py
   Traceback (most recent call last):
     File "deconv.py", line 27, in <module>
       t = o.asnumpy()
     File 
"/home/lvtao/Workspace/mxnet-official/python/mxnet/ndarray/ndarray.py", line 
1980, in asnumpy
       ctypes.c_size_t(data.size)))
     File "/home/lvtao/Workspace/mxnet-official/python/mxnet/base.py", line 
252, in check_call
       raise MXNetError(py_str(_LIB.MXGetLastError()))
   mxnet.base.MXNetError: [22:32:01] 
src/operator/nn/mkldnn/mkldnn_deconvolution.cc:270: Check failed: 
weight_mem->get_primitive_desc() == fwd_pd.weights_primitive_desc()
   
   Stack trace returned 10 entries:
   [bt] (0) 
/home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace()+0x42)
 [0x7ff5d971eb75]
   [bt] (1) 
/home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1b)
 [0x7ff5d971edf3]
   [bt] (2) 
/home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(mxnet::op::MKLDNNDeconvForward::SetDataHandle(mxnet::op::DeconvolutionParam
 const&, mxnet::OpContext const&, mxnet::NDArray const&, mxnet::NDArray const&, 
std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x246) 
[0x7ff5d97d3196]
   [bt] (3) 
/home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(mxnet::op::MKLDNNDeconvolutionForward(nnvm::NodeAttrs
 const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&)+0x255) [0x7ff5d97d3b44]
   [bt] (4) 
/home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(+0x3a3e76a)
 [0x7ff5dbbbc76a]
   [bt] (5) 
/home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void
 (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&), void (*)(nnvm::NodeAttrs const&, 
mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&)>::_M_invoke(std::_Any_data const&, 
nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&)+0x91) [0x7ff5d97f0e72]
   [bt] (6) 
/home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(std::function<void
 (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&)>::operator()(nnvm::NodeAttrs const&, 
mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&) const+0xa6) [0x7ff5dbee1488]
   [bt] (7) 
/home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(mxnet::exec::FComputeExExecutor::Run(mxnet::RunContext,
 bool)+0x185) [0x7ff5dc6b87b7]
   [bt] (8) 
/home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(+0x4513278)
 [0x7ff5dc691278]
   [bt] (9) 
/home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(+0x4516e0f)
 [0x7ff5dc694e0f]
   
   ## Minimum reproducible example
   ```
   import mxnet as mx
   import numpy as np
   from mxnet import Context
   np.random.seed(12345)
   
   num_filter = 256
   num_group = 1
   kernel = (3, 3)
   pad = (1, 1)
   shape = (1, 256, 200, 233)
   
   x = mx.sym.Variable('x')
   w = mx.sym.Variable('w')
   
   y = mx.sym.Deconvolution(data=x, weight=w, num_filter=num_filter, 
num_group=num_group, kernel=kernel, no_bias=True, pad=pad)
   exe = y.simple_bind(ctx=mx.cpu(), x=shape, grad_req='null')
   
   exe.arg_arrays[0][:] = np.random.normal(size=exe.arg_arrays[0].shape)
   exe.arg_arrays[1][:] = np.random.normal(size=exe.arg_arrays[1].shape)
   
   for i in range(10):
       exe.forward(is_train=False)
       o = exe.outputs[0]
       t = o.asnumpy()
   ```
   
   ## Steps to reproduce
   (Paste the commands you ran that produced the error.)
   python deconv.py
   
   ## What have you tried to solve it?
   
   1.
   2.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to