TaoLv commented on issue #17980: When compiled with MKL, fully_connected calls DNNL while dot and batch_dot call MKL URL: https://github.com/apache/incubator-mxnet/issues/17980#issuecomment-609885325 Yes, `USE_MKLDNN` is ON by default on Linux. See https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L44. If want to disable DNNL optimizations, you need set `USE_MKLDNN` to OFF explicitly in the cmake line. > How do I achieve the fastest combination of DNNL softmax and MKL matrix multiply for FullyConnected using only documented options? I think the real question is how to improve the performance of FullyConnected when DNNL is used. In fact, I see the performance of DNNL primitive kernel is good, compared with the output of the script. Take the last shape as an example (on my machine, not ec2): ``` dnnl_verbose,exec,cpu,inner_product,gemm:jit,forward_inference,src_f32::blocked:ab:f0 wei_f32::blocked:ab:f0 bia_undef::undef::f0 dst_f32::blocked:ab:f0,,,mb4ic512oc512,0.0490723 0.0001380 seconds for fullyconnected (DNNL) 0.0000556 seconds for dot (MKL) ``` You can set DNNL_VERBOSE=1 when running the script to get the verbose output.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
