dwSun opened a new issue #10809: Check failed: format != mkl_mem_->GetFormat() (5 vs. 5) URL: https://github.com/apache/incubator-mxnet/issues/10809 ## Description Crashed when training a model. With code from [this tutorial](http://mxnet.incubator.apache.org/tutorials/gluon/datasets.html), I try to train my own model with MobileNetV2. But it crashed with mxnet-mkl-1.2.0b20180503 from pypi. On mxnet-mkl-1.1.0 from pypi, this code works. Batch size 32 and 16 can reproduce this error, others like 8 or 32 seems can't. Smaller network can't reproduce this error. Not sure this error related to pr #10317 or not. And maybe this is a same error like issue #10807. ## Environment info (Required) This is the code [crash.zip](https://github.com/apache/incubator-mxnet/files/1973878/crash.zip) Run with ```py python3 fashion.py ``` Package used (Python/R/Scala/Julia): ``` % pip3 list Package Version --------------- -------------- certifi 2018.4.16 chardet 3.0.4 graphviz 0.8.3 idna 2.6 mxnet-mkl 1.2.0b20180503 numpy 1.14.3 pandas 0.22.0 pip 10.0.1 pkg-resources 0.0.0 python-dateutil 2.7.2 pytz 2018.4 requests 2.18.4 setuptools 39.1.0 six 1.11.0 urllib3 1.22 wheel 0.31.0 ``` ## Error Message: ``` % python3 fashion.py [17:28:49] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly [17:28:49] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 4096 bytes with malloc directly [17:28:49] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 4096 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly Epoch 0, training loss: 2.55, validation loss: 2.31 [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 1638400 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 1638400 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 4096 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly Epoch 1, training loss: 2.56, validation loss: 2.35 [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 1638400 bytes with malloc directly [17:28:50] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 1638400 bytes with malloc directly [17:28:51] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 57344 bytes with malloc directly [17:28:51] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 4096 bytes with malloc directly [17:28:51] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 172032 bytes with malloc directly Traceback (most recent call last): File "fashion.py", line 71, in <module> valid_loss = cumulative_valid_loss.asscalar()/valid_samples File "/home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 1894, in asscalar return self.asnumpy()[0] File "/home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 1876, in asnumpy ctypes.c_size_t(data.size))) File "/home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/base.py", line 149, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [17:28:51] src/ndarray/ndarray.cc:351: Check failed: format != mkl_mem_->GetFormat() (5 vs. 5) Stack trace returned 10 entries: [bt] (0) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x17009d) [0x7fba25e2f09d] [bt] (1) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x170468) [0x7fba25e2f468] [bt] (2) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2a4a1b8) [0x7fba287091b8] [bt] (3) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2a4a29e) [0x7fba2870929e] [bt] (4) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2899644) [0x7fba28558644] [bt] (5) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x289d151) [0x7fba2855c151] [bt] (6) /home/david/.virtualenvs/mkl-dnn/local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2899d0b) [0x7fba28558d0b] [bt] (7) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbbc90) [0x7fba1ba04c90] [bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x75aa) [0x7fba37df35aa] [bt] (9) /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fba36f3ecbf] ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
