TaoLv opened a new issue #13421: MKL-DNN deconvolution runs into crash URL: https://github.com/apache/incubator-mxnet/issues/13421 Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form. For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io ## Description MKL-DNN deconvolution might run into crash since commit 91c536d2b3fe14fb84dff568a4a2ea240ea5ab31. I think it should be fixed before 1.4.0 code freezing. ## Environment info (Required) ``` What to do: 1. Download the diagnosis script from https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py 2. Run the script using `python diagnose.py` and paste its output here. ``` Package used (Python/R/Scala/Julia): Python For Scala user, please provide: 1. Java version: (`java -version`) 2. Maven version: (`mvn -version`) 3. Scala runtime if applicable: (`scala -version`) For R user, please provide R `sessionInfo()`: ## Build info (Required if built from source) Compiler (gcc/clang/mingw/visual studio): gcc 4.8.5 MXNet commit hash: (Paste the output of `git rev-parse HEAD` here.) 91c536d2b3fe14fb84dff568a4a2ea240ea5ab31 Build config: (Paste the content of config.mk, or the build command.) make -j20 USE_OPENCV=1 USE_MKLDNN=1 USE_BLAS=mkl USE_PROFILER=1 DEBUG=1 ## Error Message: $ python deconv.py Traceback (most recent call last): File "deconv.py", line 27, in <module> t = o.asnumpy() File "/home/lvtao/Workspace/mxnet-official/python/mxnet/ndarray/ndarray.py", line 1980, in asnumpy ctypes.c_size_t(data.size))) File "/home/lvtao/Workspace/mxnet-official/python/mxnet/base.py", line 252, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [22:32:01] src/operator/nn/mkldnn/mkldnn_deconvolution.cc:270: Check failed: weight_mem->get_primitive_desc() == fwd_pd.weights_primitive_desc() Stack trace returned 10 entries: [bt] (0) /home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace()+0x42) [0x7ff5d971eb75] [bt] (1) /home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1b) [0x7ff5d971edf3] [bt] (2) /home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(mxnet::op::MKLDNNDeconvForward::SetDataHandle(mxnet::op::DeconvolutionParam const&, mxnet::OpContext const&, mxnet::NDArray const&, mxnet::NDArray const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x246) [0x7ff5d97d3196] [bt] (3) /home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(mxnet::op::MKLDNNDeconvolutionForward(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x255) [0x7ff5d97d3b44] [bt] (4) /home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(+0x3a3e76a) [0x7ff5dbbbc76a] [bt] (5) /home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), void (*)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)>::_M_invoke(std::_Any_data const&, nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x91) [0x7ff5d97f0e72] [bt] (6) /home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)>::operator()(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&) const+0xa6) [0x7ff5dbee1488] [bt] (7) /home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(mxnet::exec::FComputeExExecutor::Run(mxnet::RunContext, bool)+0x185) [0x7ff5dc6b87b7] [bt] (8) /home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(+0x4513278) [0x7ff5dc691278] [bt] (9) /home/lvtao/Workspace/mxnet-official/python/mxnet/../../lib/libmxnet.so(+0x4516e0f) [0x7ff5dc694e0f] ## Minimum reproducible example ``` import mxnet as mx import numpy as np from mxnet import Context np.random.seed(12345) num_filter = 256 num_group = 1 kernel = (3, 3) pad = (1, 1) shape = (1, 256, 200, 233) x = mx.sym.Variable('x') w = mx.sym.Variable('w') y = mx.sym.Deconvolution(data=x, weight=w, num_filter=num_filter, num_group=num_group, kernel=kernel, no_bias=True, pad=pad) exe = y.simple_bind(ctx=mx.cpu(), x=shape, grad_req='null') exe.arg_arrays[0][:] = np.random.normal(size=exe.arg_arrays[0].shape) exe.arg_arrays[1][:] = np.random.normal(size=exe.arg_arrays[1].shape) for i in range(10): exe.forward(is_train=False) o = exe.outputs[0] t = o.asnumpy() ``` ## Steps to reproduce (Paste the commands you ran that produced the error.) python deconv.py ## What have you tried to solve it? 1. 2.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
