safrooze opened a new issue #11909: MKLDNN not used for 3d tensors URL: https://github.com/apache/incubator-mxnet/issues/11909 ## Description When using a MKL build, if the tensor is not 2d or 4d, the default CPU implementation is used, which in some cases compared to MKLDNN is extremely inefficient (for example 20x in case of `concat` operator). Examples are convolution and concat operators. ## Environment info (Required) ``` ----------Python Info---------- Version : 3.4.5 Compiler : GCC 4.4.7 20120313 (Red Hat 4.4.7-1) Build : ('default', 'Jul 2 2016 17:47:47') Arch : ('64bit', 'ELF') ------------Pip Info----------- Version : 18.0 Directory : /home/ec2-user/anaconda3/envs/mxnet_p34/lib/python3.4/site-packages/pip ----------MXNet Info----------- Version : 1.3.0 Directory : /home/ec2-user/anaconda3/envs/mxnet_p34/lib/python3.4/site-packages/mxnet Commit Hash : f5b95b090815e879b57dca233604dcb3f1df967a ----------System Info---------- Platform : Linux-4.9.93-41.60.amzn1.x86_64-x86_64-with-glibc2.2.5 system : Linux node : ip-172-31-73-235 release : 4.9.93-41.60.amzn1.x86_64 version : #1 SMP Fri Apr 13 21:58:27 UTC 2018 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz Stepping: 1 CPU MHz: 2698.120 BogoMIPS: 4600.11 Hypervisor vendor: Xen Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 46080K NUMA node0 CPU(s): 0-7 ----------Network Test---------- Setting timeout: 10 Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0150 sec, LOAD: 0.3634 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0028 sec, LOAD: 0.0405 sec. Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0768 sec, LOAD: 0.5932 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0466 sec, LOAD: 0.3405 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0031 sec, LOAD: 0.1442 sec. Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0032 sec, LOAD: 0.4106 sec. ``` I'm using Python package. ## Minimum reproducible example ``` def test(make_4d): ctx = mx.cpu() num_iter = 1000 start = time() for i in range(num_iter): extra_dim = (1, ) if make_4d else tuple() cdim = 3 if make_4d else 2 a_shape = extra_dim + (1, 512, 120 * 120) b_shape = extra_dim + (1, 512, 1) a = nd.empty(a_shape, ctx=ctx) b = nd.empty(b_shape, ctx=ctx) c = nd.concat(a, b, dim=cdim) if make_4d: c = c.reshape(c.shape[1:]) nd.waitall() print('\telapsed: {:.2f}'.format(time() - start)) if __name__ == '__main__': print("4D Test") test(True) print("3D Test") test(False) ``` Output: ``` 4D Test elapsed: 2.18 3D Test elapsed: 39.02 ``` ## What have you tried to solve it? Looking at the implementation, the reason is that `SupportMKLDNNConcat()` returns false if the input tensor is not 2d or 4d.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
