mseth10 opened a new issue #20766:
URL: https://github.com/apache/incubator-mxnet/issues/20766


   ## Description
   When building MXNet for AArch64 with MKL-DNN with ACL enabled, the build 
works but the binary fails on some tests, for eg, `test_deconv` in 
`test_gluon.py`. Here's the pipeline running the build:
   
https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fsethman-test-cd-release-job/detail/sethman-test-cd-release-job/131/pipeline/82
   
   ### Error Message
   ```
   test_gluon.test_deconv ... python3: 
../3rdparty/mkldnn/src/common/primitive.hpp:220: const T* 
dnnl::impl::resource_mapper_t::get(dnnl::impl::resource_mapper_t::key_t*) const 
[with T = dnnl::impl::cpu::aarch64::acl_indirect_gemm_resource_t; 
dnnl::impl::resource_mapper_t::key_t = const dnnl::impl::primitive_t]: 
Assertion `primitive_to_resource_.count(p)' failed.
   ```
   Here's the backtrace:
   ```
   #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
   #1  0x0000fffff7e7ed54 in __GI_abort () at abort.c:79
   #2  0x0000fffff7e8b61c in __assert_fail_base (fmt=0xfffff7f84c48 "%s%s%s:%u: 
%s%sAssertion `%s' failed.\n%n", 
       assertion=assertion@entry=0xffff55f88aa8 
"primitive_to_resource_.count(p)", file=file@entry=0xffff55f88a78 
"../3rdparty/mkldnn/src/common/primitive.hpp", 
       line=line@entry=220, 
       function=function@entry=0xffff55f88990 "const T* 
dnnl::impl::resource_mapper_t::get(dnnl::impl::resource_mapper_t::key_t*) const 
[with T = dnnl::impl::cpu::aarch64::acl_indirect_gemm_resource_t; 
dnnl::impl::resource_mapper_t::key_t = const "...) at assert.c:92
   #3  0x0000fffff7e8b684 in __GI___assert_fail (assertion=0xffff55f88aa8 
"primitive_to_resource_.count(p)", 
       file=0xffff55f88a78 "../3rdparty/mkldnn/src/common/primitive.hpp", 
line=220, 
       function=0xffff55f88990 "const T* 
dnnl::impl::resource_mapper_t::get(dnnl::impl::resource_mapper_t::key_t*) const 
[with T = dnnl::impl::cpu::aarch64::acl_indirect_gemm_resource_t; 
dnnl::impl::resource_mapper_t::key_t = const "...) at assert.c:101
   #4  0x0000ffff55459e10 in ?? () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #5  0x0000ffff554599a4 in ?? () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #6  0x0000ffff552f9850 in ?? () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #7  0x0000ffff55423734 in ?? () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #8  0x0000ffff5473a884 in ?? () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #9  0x0000ffff5475e5f4 in ?? () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #10 0x0000ffff54739ef0 in ?? () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #11 0x0000ffff5473a128 in ?? () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #12 0x0000ffff4b8c0414 in dnnl::primitive::execute(dnnl::stream const&, 
std::unordered_map<int, dnnl::memory, std::hash<int>, std::equal_to<int>, 
std::allocator<std::pair<int const, dnnl::memory> > > const&) const () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #13 0x0000ffff4b8c05d4 in mxnet::MKLDNNStream::Submit(bool) () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #14 0x0000ffff4ca53908 in mxnet::op::MKLDNNDeconvBwd::Execute(unsigned int, 
std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
mxnet::op::MKLDNNDeconvBwd::ReadTensors const&, 
mxnet::op::MKLDNNDeconvBwd::WriteTensors const&) const () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #15 0x0000ffff4ca52fa0 in 
mxnet::op::MKLDNNDeconvolutionBackward(nnvm::NodeAttrs const&, mxnet::OpContext 
const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, 
std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&) ()
      from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #16 0x0000ffff4bbdcfec in void std::__invoke_impl<void, void 
(*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, 
std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), 
nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&>(std::__invoke_other, void 
(*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, 
std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), 
nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&
 , std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&) ()
   --Type <RET> for more, q to quit, c to continue without paging--
     untu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #17 0x0000ffff4bbd8dac in std::enable_if<std::__and_<std::is_void<void>, 
std::__is_invocable<void (*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, 
std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), 
nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&> >::value, void>::type 
std::__invoke_r<void, void (*&)(nnvm::NodeAttrs const&, mxnet::OpContext 
const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, 
std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), 
nnvm::NodeAttrs const&, mxnet::OpContext const&, std::ve
 ctor<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, 
std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&>(void 
(*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, 
std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), 
nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&) () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #18 0x0000ffff4bbd38c4 in std::_Function_handler<void (nnvm::NodeAttrs 
const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&), void (*)(nnvm::NodeAttrs const&, 
mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&)>::_M_invoke(std::_Any_data const&, 
nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&) ()
      from /home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #19 0x0000ffff4b988848 in std::function<void (nnvm::NodeAttrs const&, 
mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&)>::operator()(nnvm::NodeAttrs const&, 
mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&) const () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #20 0x0000ffff4ca2c784 in mxnet::MKLDNNRun(std::function<void 
(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&)>, nnvm::NodeAttrs const&, 
mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&) () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #21 0x0000ffff4c6205ec in ?? () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #22 0x0000ffff4bbdcfec in void std::__invoke_impl<void, void 
(*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, 
std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), 
nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&>(std::__invoke_other, void 
(*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, 
std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), 
nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&
 , std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&) () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   #23 0x0000ffff4bbd8dac in std::enable_if<std::__and_<std::is_void<void>, 
std::__is_invocable<void (*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, 
std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), 
nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&> >::value, void>::type 
std::__invoke_r<void, void (*&)(nnvm::NodeAttrs const&, mxnet::OpContext 
const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, 
std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), 
nnvm::NodeAttrs const&, mxnet::OpContext const&, std::ve
 ctor<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, 
std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&>(void 
(*&)(nnvm::NodeAttrs const&, mxnet::OpContext const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, 
std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, 
std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&), 
nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, 
std::allocator<mxnet::NDArray> > const&) () from 
/home/ubuntu/.local/lib/python3.8/site-packages/mxnet/libmxnet.so
   ```
   
   ## To Reproduce
   (If you developed your own code, please provide a short script that 
reproduces the error. For existing examples, please provide link.)
   
   ### Steps to reproduce
   (Paste the commands you ran that produced the error.)
   Working on a docker image for ease of reproduction
   
   ## What have you tried to solve it?
   
   1. Tried building with different OneDNN and ACL versions, but it did not help
   
   ## Environment
   
   ***We recommend using our script for collecting the diagnostic information 
with the following command***
   `curl --retry 10 -s 
https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
 | python3`
   
   <details>
   <summary>Environment Information</summary>
   
   ```
   ----------Python Info----------
   Version      : 3.6.8
   Compiler     : GCC 4.8.5 20150623 (Red Hat 4.8.5-44)
   Build        : ('default', 'Nov 16 2020 16:33:14')
   Arch         : ('64bit', '')
   ------------Pip Info-----------
   Version      : 9.0.3
   Directory    : /usr/lib/python3.6/site-packages/pip
   ----------MXNet Info-----------
   No MXNet installed.
   ----------System Info----------
   Platform     : Linux-5.11.0-1022-aws-aarch64-with-centos-7.9.2009-AltArch
   system       : Linux
   node         : c18cd793fd55
   release      : 5.11.0-1022-aws
   version      : #23~20.04.1-Ubuntu SMP Mon Nov 15 14:04:48 UTC 2021
   ----------Hardware Info----------
   machine      : aarch64
   processor    : aarch64
   Architecture:          aarch64
   Byte Order:            Little Endian
   CPU(s):                64
   On-line CPU(s) list:   0-63
   Thread(s) per core:    1
   Core(s) per socket:    64
   Socket(s):             1
   NUMA node(s):          1
   Model:                 1
   BogoMIPS:              243.75
   L1d cache:             64K
   L1i cache:             64K
   L2 cache:              1024K
   L3 cache:              32768K
   NUMA node0 CPU(s):     0-63
   Flags:                 fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics 
fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0014 
sec, LOAD: 0.4363 sec.
   Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1806 sec, LOAD: 
0.1674 sec.
   Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [SSL: 
CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:877)>, DNS 
finished in 0.2053844928741455 sec.
   Timing for FashionMNIST: 
https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz,
 DNS: 0.0147 sec, LOAD: 0.0943 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0374 sec, LOAD: 
5.3653 sec.
   Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: 
Forbidden, DNS finished in 0.018440961837768555 sec.
   ----------Environment----------
   ```
   
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to