cfRod commented on issue #20265:
URL: 
https://github.com/apache/incubator-mxnet/issues/20265#issuecomment-874751442


   Hi @Zha0q1 ,
   
   This is my WIP analysis so far (see verbose logs to support this)
   The convolution with grouping is calling into gemm reference kernels, which 
is expected as grouped convolutions is unsupported by Compute library at the 
moment.
   The case of manually slicing the input and running the convolution operation 
on the two slices of input and weights, here it is calling gemm:acl and the 
first half of the output is matches the output from the reference kernel but 
the second half doesn't. My initial analysis is that the primitive is created 
once and reused twice with different inputs and weights (see logs below). 
However Compute library assumes weights are immutable, which is because it does 
some preprocessing internally of the weights once for every new primitive 
creation and but in this case the second time around the new sets of weights 
are not used by ACL. 
   ```
   /home/ubuntu/mxnet/mxnet/src/executor/graph_executor.cc:1992: Subgraph 
backend MKLDNN is activated.
   [14:52:42] /home/ubuntu/mxnet/mxnet/src/executor/graph_executor.cc:1992: 
Subgraph backend MKLDNN is activated.
   dnnl_verbose,info,oneDNN v2.0.0 (commit 
83ebc40d86bc54f0f23e947235e53570eeacf254)
   dnnl_verbose,info,cpu,runtime:OpenMP
   dnnl_verbose,info,cpu,isa:Generic
   dnnl_verbose,info,gpu,runtime:none
   
   # THIS IS GEMM REFFERENCE
   
   
dnnl_verbose,create:cache_miss,cpu,convolution,gemm:ref,forward_training,src_f32::blocked:abcd:f0
 wei_f32::blocked:abcde:f0 bia_f32::blocked:a:f0 
dst_f32::blocked:abcd:f0,,alg:convolution_direct,mb1_g2ic4oc4_ih9oh7kh3sh1dh0ph0_iw9ow7kw3sw1dw0pw0,0.130859
   
dnnl_verbose,exec,cpu,convolution,gemm:ref,forward_training,src_f32::blocked:abcd:f0
 wei_f32::blocked:abcde:f0 bia_f32::blocked:a:f0 
dst_f32::blocked:abcd:f0,,alg:convolution_direct,mb1_g2ic4oc4_ih9oh7kh3sh1dh0ph0_iw9ow7kw3sw1dw0pw0,19.759
   
   # THIS IS GEMM ACL with sliced inputs
   
dnnl_verbose,create:cache_miss,cpu,convolution,gemm:acl,forward_training,src_f32::blocked:abcd:f0
 wei_f32::blocked:abcd:f0 bia_f32::blocked:a:f0 
dst_f32::blocked:abcd:f0,,alg:convolution_direct,mb1_ic2oc2_ih9oh7kh3sh1dh0ph0_iw9ow7kw3sw1dw0pw0,0.0969238
   
dnnl_verbose,exec,cpu,convolution,gemm:acl,forward_training,src_f32::blocked:abcd:f0
 wei_f32::blocked:abcd:f0 bia_f32::blocked:a:f0 
dst_f32::blocked:abcd:f0,,alg:convolution_direct,mb1_ic2oc2_ih9oh7kh3sh1dh0ph0_iw9ow7kw3sw1dw0pw0,0.415039
   
dnnl_verbose,exec,cpu,convolution,gemm:acl,forward_training,src_f32::blocked:abcd:f0
 wei_f32::blocked:abcd:f0 bia_f32::blocked:a:f0 
dst_f32::blocked:abcd:f0,,alg:convolution_direct,mb1_ic2oc2_ih9oh7kh3sh1dh0ph0_iw9ow7kw3sw1dw0pw0,0.321045
   
   # THIS IS THE CONCAT STAGE
   
dnnl_verbose,create:cache_miss,cpu,concat,simple:any,undef,src_f32::blocked:abcd:f0
 src_f32::blocked:abcd:f0 dst_f32::blocked:abcd:f0,,axis:1,1x2x7x7:1x2x7x7 
1x4x7x7,0.0700684
   dnnl_verbose,exec,cpu,concat,simple:any,undef,src_f32::blocked:abcd:f0 
src_f32::blocked:abcd:f0 dst_f32::blocked:abcd:f0,,axis:1,1x2x7x7:1x2x7x7 
1x4x7x7,0.468994
   ```
   We've seen a similar issue with primitive caching in TensorFlow and I am 
looking to confirm the hypothesis, possibly by modifying the test. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org

Reply via email to