mseth10 opened a new issue #20766: URL: https://github.com/apache/incubator-mxnet/issues/20766
## Description When building MXNet for AArch64 with MKL-DNN with ACL enabled, the build works but the binary fails on some tests, for eg, `test_deconv` in `test_gluon.py`. Here's the pipeline running the build: https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fsethman-test-cd-release-job/detail/sethman-test-cd-release-job/131/pipeline/82 ### Error Message ``` test_gluon.test_deconv ... python3: ../3rdparty/mkldnn/src/common/primitive.hpp:220: const T* dnnl::impl::resource_mapper_t::get(dnnl::impl::resource_mapper_t::key_t*) const [with T = dnnl::impl::cpu::aarch64::acl_indirect_gemm_resource_t; dnnl::impl::resource_mapper_t::key_t = const dnnl::impl::primitive_t]: Assertion `primitive_to_resource_.count(p)' failed. ``` Here's the backtrace: ``` #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x0000fffff7e7ed54 in __GI_abort () at abort.c:79 #2 0x0000fffff7e8b61c in __assert_fail_base (fmt=0xfffff7f84c48 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0xffff55f88aa8 "primitive_to_resource_.count(p)", file=file@entry=0xffff55f88a78 "../3rdparty/mkldnn/src/common/primitive.hpp", line=line@entry=220, function=function@entry=0xffff55f88990 "const T* dnnl::impl::resource_mapper_t::get(dnnl::impl::resource_mapper_t::key_t*) const [with T = dnnl::impl::cpu::aarch64::acl_indirect_gemm_resource_t; dnnl::impl::resource_mapper_t::key_t = const "...) at assert.c:92 #3 0x0000fffff7e8b684 in __GI___assert_fail (assertion=0xffff55f88aa8 "primitive_to_resource_.count(p)", file=0xffff55f88a78 "../3rdparty/mkldnn/src/common/primitive.hpp", line=220, function=0xffff55f88990 "const T* dnnl::impl::resource_mapper_t::get(dnnl::impl::resource_mapper_t::key_t*) const [with T = dnnl::impl::cpu::aarch64::acl_indirect_gemm_resource_t; dnnl::impl::resource_mapper_t::key_t = const "...) at assert.c:101 #4 0x0000ffff55459e10 in ?? () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #5 0x0000ffff554599a4 in ?? () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #6 0x0000ffff552f9850 in ?? () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #7 0x0000ffff55423734 in ?? () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #8 0x0000ffff5473a884 in ?? () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #9 0x0000ffff5475e5f4 in ?? () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #10 0x0000ffff54739ef0 in ?? () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #11 0x0000ffff5473a128 in ?? () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #12 0x0000ffff4b8c0414 in dnnl::primitive::execute(dnnl::stream const&, std::unordered_map<int, dnnl::memory, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, dnnl::memory> > > const&) const () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #13 0x0000ffff4b8c05d4 in mxnet::MKLDNNStream::Submit(bool) () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #14 0x0000ffff4ca53908 in mxnet::op::MKLDNNDeconvBwd::Execute(unsigned int, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, mxnet::op::MKLDNNDeconvBwd::ReadTensors const&, mxnet::op::MKLDNNDeconvBwd::WriteTensors const&) const () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #15 0x0000ffff4ca52fa0 in mxnet::op::MKLDNNDeconvolutionBackward(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&) () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #16 0x0000ffff4bbdcfec in void std::__invoke_impl<void, void (*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&>(std::__invoke_other, void (*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const& , std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&) () --Type <RET> for more, q to quit, c to continue without paging-- untu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #17 0x0000ffff4bbd8dac in std::enable_if<std::__and_<std::is_void<void>, std::__is_invocable<void (*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&> >::value, void>::type std::__invoke_r<void, void (*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), nnvm::NodeAttrs const&, mxnet::OpContext const&, std::ve ctor<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&>(void (*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&) () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #18 0x0000ffff4bbd38c4 in std::_Function_handler<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), void (*)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)>::_M_invoke(std::_Any_data const&, nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&) () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #19 0x0000ffff4b988848 in std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)>::operator()(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&) const () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #20 0x0000ffff4ca2c784 in mxnet::MKLDNNRun(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)>, nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&) () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #21 0x0000ffff4c6205ec in ?? () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #22 0x0000ffff4bbdcfec in void std::__invoke_impl<void, void (*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&>(std::__invoke_other, void (*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const& , std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&) () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so #23 0x0000ffff4bbd8dac in std::enable_if<std::__and_<std::is_void<void>, std::__is_invocable<void (*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&> >::value, void>::type std::__invoke_r<void, void (*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), nnvm::NodeAttrs const&, mxnet::OpContext const&, std::ve ctor<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&>(void (*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&) () from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so ``` ## To Reproduce (If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.) ### Steps to reproduce (Paste the commands you ran that produced the error.) Working on a docker image for ease of reproduction ## What have you tried to solve it? 1. Tried building with different OneDNN and ACL versions, but it did not help ## Environment ***We recommend using our script for collecting the diagnostic information with the following command*** `curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python3` <details> <summary>Environment Information</summary> ``` ----------Python Info---------- Version : 3.6.8 Compiler : GCC 4.8.5 20150623 (Red Hat 4.8.5-44) Build : ('default', 'Nov 16 2020 16:33:14') Arch : ('64bit', '') ------------Pip Info----------- Version : 9.0.3 Directory : /usr/lib/python3.6/site-packages/pip ----------MXNet Info----------- No MXNet installed. ----------System Info---------- Platform : Linux-5.11.0-1022-aws-aarch64-with-centos-7.9.2009-AltArch system : Linux node : c18cd793fd55 release : 5.11.0-1022-aws version : #23~20.04.1-Ubuntu SMP Mon Nov 15 14:04:48 UTC 2021 ----------Hardware Info---------- machine : aarch64 processor : aarch64 Architecture: aarch64 Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 1 Core(s) per socket: 64 Socket(s): 1 NUMA node(s): 1 Model: 1 BogoMIPS: 243.75 L1d cache: 64K L1i cache: 64K L2 cache: 1024K L3 cache: 32768K NUMA node0 CPU(s): 0-63 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0014 sec, LOAD: 0.4363 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1806 sec, LOAD: 0.1674 sec. Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:877)>, DNS finished in 0.2053844928741455 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0147 sec, LOAD: 0.0943 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0374 sec, LOAD: 5.3653 sec. Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.018440961837768555 sec. ----------Environment---------- ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
