TaoLv commented on issue #17980: When compiled with MKL, fully_connected calls 
DNNL while dot and batch_dot call MKL
URL: 
https://github.com/apache/incubator-mxnet/issues/17980#issuecomment-609885325
 
 
   Yes, `USE_MKLDNN` is ON by default on Linux. See 
https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L44. If 
want to disable DNNL optimizations, you need set `USE_MKLDNN` to OFF explicitly 
in the cmake line.
   
   > How do I achieve the fastest combination of DNNL softmax and MKL matrix 
multiply for FullyConnected using only documented options?
   
   I think the real question is how to improve the performance of 
FullyConnected when DNNL is used. In fact, I see the performance of DNNL 
primitive kernel is good, compared with the output of the script. Take the last 
shape as an example (on my machine, not ec2):
   ```
   
dnnl_verbose,exec,cpu,inner_product,gemm:jit,forward_inference,src_f32::blocked:ab:f0
 wei_f32::blocked:ab:f0 bia_undef::undef::f0 
dst_f32::blocked:ab:f0,,,mb4ic512oc512,0.0490723
   0.0001380 seconds for fullyconnected (DNNL)
   0.0000556 seconds for dot (MKL)
   ```
   
   You can set DNNL_VERBOSE=1 when running the script to get the verbose output.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to